- The 25 Most Influential Psychological Experiments in History
While each year thousands and thousands of studies are completed in the many specialty areas of psychology, there are a handful that, over the years, have had a lasting impact in the psychological community as a whole. Some of these were dutifully conducted, keeping within the confines of ethical and practical guidelines. Others pushed the boundaries of human behavior during their psychological experiments and created controversies that still linger to this day. And still others were not designed to be true psychological experiments, but ended up as beacons to the psychological community in proving or disproving theories.
This is a list of the 25 most influential psychological experiments still being taught to psychology students of today.
1. A Class Divided
Study conducted by: jane elliott.
Study Conducted in 1968 in an Iowa classroom
Experiment Details: Jane Elliott’s famous experiment was inspired by the assassination of Dr. Martin Luther King Jr. and the inspirational life that he led. The third grade teacher developed an exercise, or better yet, a psychological experiment, to help her Caucasian students understand the effects of racism and prejudice.
Elliott divided her class into two separate groups: blue-eyed students and brown-eyed students. On the first day, she labeled the blue-eyed group as the superior group and from that point forward they had extra privileges, leaving the brown-eyed children to represent the minority group. She discouraged the groups from interacting and singled out individual students to stress the negative characteristics of the children in the minority group. What this exercise showed was that the children’s behavior changed almost instantaneously. The group of blue-eyed students performed better academically and even began bullying their brown-eyed classmates. The brown-eyed group experienced lower self-confidence and worse academic performance. The next day, she reversed the roles of the two groups and the blue-eyed students became the minority group.
At the end of the experiment, the children were so relieved that they were reported to have embraced one another and agreed that people should not be judged based on outward appearances. This exercise has since been repeated many times with similar outcomes.
For more information click here
2. Asch Conformity Study
Study conducted by: dr. solomon asch.
Study Conducted in 1951 at Swarthmore College
Experiment Details: Dr. Solomon Asch conducted a groundbreaking study that was designed to evaluate a person’s likelihood to conform to a standard when there is pressure to do so.
A group of participants were shown pictures with lines of various lengths and were then asked a simple question: Which line is longest? The tricky part of this study was that in each group only one person was a true participant. The others were actors with a script. Most of the actors were instructed to give the wrong answer. Strangely, the one true participant almost always agreed with the majority, even though they knew they were giving the wrong answer.
The results of this study are important when we study social interactions among individuals in groups. This study is a famous example of the temptation many of us experience to conform to a standard during group situations and it showed that people often care more about being the same as others than they do about being right. It is still recognized as one of the most influential psychological experiments for understanding human behavior.
3. Bobo Doll Experiment
Study conducted by: dr. alburt bandura.
Study Conducted between 1961-1963 at Stanford University
In his groundbreaking study he separated participants into three groups:
- one was exposed to a video of an adult showing aggressive behavior towards a Bobo doll
- another was exposed to video of a passive adult playing with the Bobo doll
- the third formed a control group
Children watched their assigned video and then were sent to a room with the same doll they had seen in the video (with the exception of those in the control group). What the researcher found was that children exposed to the aggressive model were more likely to exhibit aggressive behavior towards the doll themselves. The other groups showed little imitative aggressive behavior. For those children exposed to the aggressive model, the number of derivative physical aggressions shown by the boys was 38.2 and 12.7 for the girls.
The study also showed that boys exhibited more aggression when exposed to aggressive male models than boys exposed to aggressive female models. When exposed to aggressive male models, the number of aggressive instances exhibited by boys averaged 104. This is compared to 48.4 aggressive instances exhibited by boys who were exposed to aggressive female models.
While the results for the girls show similar findings, the results were less drastic. When exposed to aggressive female models, the number of aggressive instances exhibited by girls averaged 57.7. This is compared to 36.3 aggressive instances exhibited by girls who were exposed to aggressive male models. The results concerning gender differences strongly supported Bandura’s secondary prediction that children will be more strongly influenced by same-sex models. The Bobo Doll Experiment showed a groundbreaking way to study human behavior and it’s influences.
4. Car Crash Experiment
Study conducted by: elizabeth loftus and john palmer.
Study Conducted in 1974 at The University of California in Irvine
The participants watched slides of a car accident and were asked to describe what had happened as if they were eyewitnesses to the scene. The participants were put into two groups and each group was questioned using different wording such as “how fast was the car driving at the time of impact?” versus “how fast was the car going when it smashed into the other car?” The experimenters found that the use of different verbs affected the participants’ memories of the accident, showing that memory can be easily distorted.
This research suggests that memory can be easily manipulated by questioning technique. This means that information gathered after the event can merge with original memory causing incorrect recall or reconstructive memory. The addition of false details to a memory of an event is now referred to as confabulation. This concept has very important implications for the questions used in police interviews of eyewitnesses.
5. Cognitive Dissonance Experiment
Study conducted by: leon festinger and james carlsmith.
Study Conducted in 1957 at Stanford University
Experiment Details: The concept of cognitive dissonance refers to a situation involving conflicting:
This conflict produces an inherent feeling of discomfort leading to a change in one of the attitudes, beliefs or behaviors to minimize or eliminate the discomfort and restore balance.
Cognitive dissonance was first investigated by Leon Festinger, after an observational study of a cult that believed that the earth was going to be destroyed by a flood. Out of this study was born an intriguing experiment conducted by Festinger and Carlsmith where participants were asked to perform a series of dull tasks (such as turning pegs in a peg board for an hour). Participant’s initial attitudes toward this task were highly negative.
They were then paid either $1 or $20 to tell a participant waiting in the lobby that the tasks were really interesting. Almost all of the participants agreed to walk into the waiting room and persuade the next participant that the boring experiment would be fun. When the participants were later asked to evaluate the experiment, the participants who were paid only $1 rated the tedious task as more fun and enjoyable than the participants who were paid $20 to lie.
Being paid only $1 is not sufficient incentive for lying and so those who were paid $1 experienced dissonance. They could only overcome that cognitive dissonance by coming to believe that the tasks really were interesting and enjoyable. Being paid $20 provides a reason for turning pegs and there is therefore no dissonance.
6. Fantz’s Looking Chamber
Study conducted by: robert l. fantz.
Study Conducted in 1961 at the University of Illinois
Experiment Details: The study conducted by Robert L. Fantz is among the simplest, yet most important in the field of infant development and vision. In 1961, when this experiment was conducted, there very few ways to study what was going on in the mind of an infant. Fantz realized that the best way was to simply watch the actions and reactions of infants. He understood the fundamental factor that if there is something of interest near humans, they generally look at it.
To test this concept, Fantz set up a display board with two pictures attached. On one was a bulls-eye. On the other was the sketch of a human face. This board was hung in a chamber where a baby could lie safely underneath and see both images. Then, from behind the board, invisible to the baby, he peeked through a hole to watch what the baby looked at. This study showed that a two-month old baby looked twice as much at the human face as it did at the bulls-eye. This suggests that human babies have some powers of pattern and form selection. Before this experiment it was thought that babies looked out onto a chaotic world of which they could make little sense.
7. Hawthorne Effect
Study conducted by: henry a. landsberger.
Study Conducted in 1955 at Hawthorne Works in Chicago, Illinois
Landsberger performed the study by analyzing data from experiments conducted between 1924 and 1932, by Elton Mayo, at the Hawthorne Works near Chicago. The company had commissioned studies to evaluate whether the level of light in a building changed the productivity of the workers. What Mayo found was that the level of light made no difference in productivity. The workers increased their output whenever the amount of light was switched from a low level to a high level, or vice versa.
The researchers noticed a tendency that the workers’ level of efficiency increased when any variable was manipulated. The study showed that the output changed simply because the workers were aware that they were under observation. The conclusion was that the workers felt important because they were pleased to be singled out. They increased productivity as a result. Being singled out was the factor dictating increased productivity, not the changing lighting levels, or any of the other factors that they experimented upon.
The Hawthorne Effect has become one of the hardest inbuilt biases to eliminate or factor into the design of any experiment in psychology and beyond.
8. Kitty Genovese Case
Study conducted by: new york police force.
Study Conducted in 1964 in New York City
Experiment Details: The murder case of Kitty Genovese was never intended to be a psychological experiment, however it ended up having serious implications for the field.
According to a New York Times article, almost 40 neighbors witnessed Kitty Genovese being savagely attacked and murdered in Queens, New York in 1964. Not one neighbor called the police for help. Some reports state that the attacker briefly left the scene and later returned to “finish off” his victim. It was later uncovered that many of these facts were exaggerated. (There were more likely only a dozen witnesses and records show that some calls to police were made).
What this case later become famous for is the “Bystander Effect,” which states that the more bystanders that are present in a social situation, the less likely it is that anyone will step in and help. This effect has led to changes in medicine, psychology and many other areas. One famous example is the way CPR is taught to new learners. All students in CPR courses learn that they must assign one bystander the job of alerting authorities which minimizes the chances of no one calling for assistance.
9. Learned Helplessness Experiment
Study conducted by: martin seligman.
Study Conducted in 1967 at the University of Pennsylvania
Seligman’s experiment involved the ringing of a bell and then the administration of a light shock to a dog. After a number of pairings, the dog reacted to the shock even before it happened. As soon as the dog heard the bell, he reacted as though he’d already been shocked.
During the course of this study something unexpected happened. Each dog was placed in a large crate that was divided down the middle with a low fence. The dog could see and jump over the fence easily. The floor on one side of the fence was electrified, but not on the other side of the fence. Seligman placed each dog on the electrified side and administered a light shock. He expected the dog to jump to the non-shocking side of the fence. In an unexpected turn, the dogs simply laid down.
The hypothesis was that as the dogs learned from the first part of the experiment that there was nothing they could do to avoid the shocks, they gave up in the second part of the experiment. To prove this hypothesis the experimenters brought in a new set of animals and found that dogs with no history in the experiment would jump over the fence.
This condition was described as learned helplessness. A human or animal does not attempt to get out of a negative situation because the past has taught them that they are helpless.
10. Little Albert Experiment
Study conducted by: john b. watson and rosalie rayner.
Study Conducted in 1920 at Johns Hopkins University
The experiment began by placing a white rat in front of the infant, who initially had no fear of the animal. Watson then produced a loud sound by striking a steel bar with a hammer every time little Albert was presented with the rat. After several pairings (the noise and the presentation of the white rat), the boy began to cry and exhibit signs of fear every time the rat appeared in the room. Watson also created similar conditioned reflexes with other common animals and objects (rabbits, Santa beard, etc.) until Albert feared them all.
This study proved that classical conditioning works on humans. One of its most important implications is that adult fears are often connected to early childhood experiences.
11. Magical Number Seven
Study conducted by: george a. miller.
Study Conducted in 1956 at Princeton University
Experiment Details: Frequently referred to as “ Miller’s Law,” the Magical Number Seven experiment purports that the number of objects an average human can hold in working memory is 7 ± 2. This means that the human memory capacity typically includes strings of words or concepts ranging from 5-9. This information on the limits to the capacity for processing information became one of the most highly cited papers in psychology.
The Magical Number Seven Experiment was published in 1956 by cognitive psychologist George A. Miller of Princeton University’s Department of Psychology in Psychological Review . In the article, Miller discussed a concurrence between the limits of one-dimensional absolute judgment and the limits of short-term memory.
In a one-dimensional absolute-judgment task, a person is presented with a number of stimuli that vary on one dimension (such as 10 different tones varying only in pitch). The person responds to each stimulus with a corresponding response (learned before).
Performance is almost perfect up to five or six different stimuli but declines as the number of different stimuli is increased. This means that a human’s maximum performance on one-dimensional absolute judgment can be described as an information store with the maximum capacity of approximately 2 to 3 bits of information There is the ability to distinguish between four and eight alternatives.
12. Pavlov’s Dog Experiment
Study conducted by: ivan pavlov.
Study Conducted in the 1890s at the Military Medical Academy in St. Petersburg, Russia
Pavlov began with the simple idea that there are some things that a dog does not need to learn. He observed that dogs do not learn to salivate when they see food. This reflex is “hard wired” into the dog. This is an unconditioned response (a stimulus-response connection that required no learning).
Pavlov outlined that there are unconditioned responses in the animal by presenting a dog with a bowl of food and then measuring its salivary secretions. In the experiment, Pavlov used a bell as his neutral stimulus. Whenever he gave food to his dogs, he also rang a bell. After a number of repeats of this procedure, he tried the bell on its own. What he found was that the bell on its own now caused an increase in salivation. The dog had learned to associate the bell and the food. This learning created a new behavior. The dog salivated when he heard the bell. Because this response was learned (or conditioned), it is called a conditioned response. The neutral stimulus has become a conditioned stimulus.
This theory came to be known as classical conditioning.
13. Robbers Cave Experiment
Study conducted by: muzafer and carolyn sherif.
Study Conducted in 1954 at the University of Oklahoma
Experiment Details: This experiment, which studied group conflict, is considered by most to be outside the lines of what is considered ethically sound.
In 1954 researchers at the University of Oklahoma assigned 22 eleven- and twelve-year-old boys from similar backgrounds into two groups. The two groups were taken to separate areas of a summer camp facility where they were able to bond as social units. The groups were housed in separate cabins and neither group knew of the other’s existence for an entire week. The boys bonded with their cabin mates during that time. Once the two groups were allowed to have contact, they showed definite signs of prejudice and hostility toward each other even though they had only been given a very short time to develop their social group. To increase the conflict between the groups, the experimenters had them compete against each other in a series of activities. This created even more hostility and eventually the groups refused to eat in the same room. The final phase of the experiment involved turning the rival groups into friends. The fun activities the experimenters had planned like shooting firecrackers and watching movies did not initially work, so they created teamwork exercises where the two groups were forced to collaborate. At the end of the experiment, the boys decided to ride the same bus home, demonstrating that conflict can be resolved and prejudice overcome through cooperation.
Many critics have compared this study to Golding’s Lord of the Flies novel as a classic example of prejudice and conflict resolution.
14. Ross’ False Consensus Effect Study
Study conducted by: lee ross.
Study Conducted in 1977 at Stanford University
Experiment Details: In 1977, a social psychology professor at Stanford University named Lee Ross conducted an experiment that, in lay terms, focuses on how people can incorrectly conclude that others think the same way they do, or form a “false consensus” about the beliefs and preferences of others. Ross conducted the study in order to outline how the “false consensus effect” functions in humans.
Featured Programs
In the first part of the study, participants were asked to read about situations in which a conflict occurred and then were told two alternative ways of responding to the situation. They were asked to do three things:
- Guess which option other people would choose
- Say which option they themselves would choose
- Describe the attributes of the person who would likely choose each of the two options
What the study showed was that most of the subjects believed that other people would do the same as them, regardless of which of the two responses they actually chose themselves. This phenomenon is referred to as the false consensus effect, where an individual thinks that other people think the same way they do when they may not. The second observation coming from this important study is that when participants were asked to describe the attributes of the people who will likely make the choice opposite of their own, they made bold and sometimes negative predictions about the personalities of those who did not share their choice.
15. The Schachter and Singer Experiment on Emotion
Study conducted by: stanley schachter and jerome e. singer.
Study Conducted in 1962 at Columbia University
Experiment Details: In 1962 Schachter and Singer conducted a ground breaking experiment to prove their theory of emotion.
In the study, a group of 184 male participants were injected with epinephrine, a hormone that induces arousal including increased heartbeat, trembling, and rapid breathing. The research participants were told that they were being injected with a new medication to test their eyesight. The first group of participants was informed the possible side effects that the injection might cause while the second group of participants were not. The participants were then placed in a room with someone they thought was another participant, but was actually a confederate in the experiment. The confederate acted in one of two ways: euphoric or angry. Participants who had not been informed about the effects of the injection were more likely to feel either happier or angrier than those who had been informed.
What Schachter and Singer were trying to understand was the ways in which cognition or thoughts influence human emotion. Their study illustrates the importance of how people interpret their physiological states, which form an important component of your emotions. Though their cognitive theory of emotional arousal dominated the field for two decades, it has been criticized for two main reasons: the size of the effect seen in the experiment was not that significant and other researchers had difficulties repeating the experiment.
16. Selective Attention / Invisible Gorilla Experiment
Study conducted by: daniel simons and christopher chabris.
Study Conducted in 1999 at Harvard University
Experiment Details: In 1999 Simons and Chabris conducted their famous awareness test at Harvard University.
Participants in the study were asked to watch a video and count how many passes occurred between basketball players on the white team. The video moves at a moderate pace and keeping track of the passes is a relatively easy task. What most people fail to notice amidst their counting is that in the middle of the test, a man in a gorilla suit walked onto the court and stood in the center before walking off-screen.
The study found that the majority of the subjects did not notice the gorilla at all, proving that humans often overestimate their ability to effectively multi-task. What the study set out to prove is that when people are asked to attend to one task, they focus so strongly on that element that they may miss other important details.
17. Stanford Prison Study
Study conducted by philip zimbardo.
Study Conducted in 1971 at Stanford University
The Stanford Prison Experiment was designed to study behavior of “normal” individuals when assigned a role of prisoner or guard. College students were recruited to participate. They were assigned roles of “guard” or “inmate.” Zimbardo played the role of the warden. The basement of the psychology building was the set of the prison. Great care was taken to make it look and feel as realistic as possible.
The prison guards were told to run a prison for two weeks. They were told not to physically harm any of the inmates during the study. After a few days, the prison guards became very abusive verbally towards the inmates. Many of the prisoners became submissive to those in authority roles. The Stanford Prison Experiment inevitably had to be cancelled because some of the participants displayed troubling signs of breaking down mentally.
Although the experiment was conducted very unethically, many psychologists believe that the findings showed how much human behavior is situational. People will conform to certain roles if the conditions are right. The Stanford Prison Experiment remains one of the most famous psychology experiments of all time.
18. Stanley Milgram Experiment
Study conducted by stanley milgram.
Study Conducted in 1961 at Stanford University
Experiment Details: This 1961 study was conducted by Yale University psychologist Stanley Milgram. It was designed to measure people’s willingness to obey authority figures when instructed to perform acts that conflicted with their morals. The study was based on the premise that humans will inherently take direction from authority figures from very early in life.
Participants were told they were participating in a study on memory. They were asked to watch another person (an actor) do a memory test. They were instructed to press a button that gave an electric shock each time the person got a wrong answer. (The actor did not actually receive the shocks, but pretended they did).
Participants were told to play the role of “teacher” and administer electric shocks to “the learner,” every time they answered a question incorrectly. The experimenters asked the participants to keep increasing the shocks. Most of them obeyed even though the individual completing the memory test appeared to be in great pain. Despite these protests, many participants continued the experiment when the authority figure urged them to. They increased the voltage after each wrong answer until some eventually administered what would be lethal electric shocks.
This experiment showed that humans are conditioned to obey authority and will usually do so even if it goes against their natural morals or common sense.
19. Surrogate Mother Experiment
Study conducted by: harry harlow.
Study Conducted from 1957-1963 at the University of Wisconsin
Experiment Details: In a series of controversial experiments during the late 1950s and early 1960s, Harry Harlow studied the importance of a mother’s love for healthy childhood development.
In order to do this he separated infant rhesus monkeys from their mothers a few hours after birth and left them to be raised by two “surrogate mothers.” One of the surrogates was made of wire with an attached bottle for food. The other was made of soft terrycloth but lacked food. The researcher found that the baby monkeys spent much more time with the cloth mother than the wire mother, thereby proving that affection plays a greater role than sustenance when it comes to childhood development. They also found that the monkeys that spent more time cuddling the soft mother grew up to healthier.
This experiment showed that love, as demonstrated by physical body contact, is a more important aspect of the parent-child bond than the provision of basic needs. These findings also had implications in the attachment between fathers and their infants when the mother is the source of nourishment.
20. The Good Samaritan Experiment
Study conducted by: john darley and daniel batson.
Study Conducted in 1973 at The Princeton Theological Seminary (Researchers were from Princeton University)
Experiment Details: In 1973, an experiment was created by John Darley and Daniel Batson, to investigate the potential causes that underlie altruistic behavior. The researchers set out three hypotheses they wanted to test:
- People thinking about religion and higher principles would be no more inclined to show helping behavior than laymen.
- People in a rush would be much less likely to show helping behavior.
- People who are religious for personal gain would be less likely to help than people who are religious because they want to gain some spiritual and personal insights into the meaning of life.
Student participants were given some religious teaching and instruction. They were then were told to travel from one building to the next. Between the two buildings was a man lying injured and appearing to be in dire need of assistance. The first variable being tested was the degree of urgency impressed upon the subjects, with some being told not to rush and others being informed that speed was of the essence.
The results of the experiment were intriguing, with the haste of the subject proving to be the overriding factor. When the subject was in no hurry, nearly two-thirds of people stopped to lend assistance. When the subject was in a rush, this dropped to one in ten.
People who were on the way to deliver a speech about helping others were nearly twice as likely to help as those delivering other sermons,. This showed that the thoughts of the individual were a factor in determining helping behavior. Religious beliefs did not appear to make much difference on the results. Being religious for personal gain, or as part of a spiritual quest, did not appear to make much of an impact on the amount of helping behavior shown.
21. The Halo Effect Experiment
Study conducted by: richard e. nisbett and timothy decamp wilson.
Study Conducted in 1977 at the University of Michigan
Experiment Details: The Halo Effect states that people generally assume that people who are physically attractive are more likely to:
- be intelligent
- be friendly
- display good judgment
To prove their theory, Nisbett and DeCamp Wilson created a study to prove that people have little awareness of the nature of the Halo Effect. They’re not aware that it influences:
- their personal judgments
- the production of a more complex social behavior
In the experiment, college students were the research participants. They were asked to evaluate a psychology instructor as they view him in a videotaped interview. The students were randomly assigned to one of two groups. Each group was shown one of two different interviews with the same instructor. The instructor is a native French-speaking Belgian who spoke English with a noticeable accent. In the first video, the instructor presented himself as someone:
- respectful of his students’ intelligence and motives
- flexible in his approach to teaching
- enthusiastic about his subject matter
In the second interview, he presented himself as much more unlikable. He was cold and distrustful toward the students and was quite rigid in his teaching style.
After watching the videos, the subjects were asked to rate the lecturer on:
- physical appearance
His mannerisms and accent were kept the same in both versions of videos. The subjects were asked to rate the professor on an 8-point scale ranging from “like extremely” to “dislike extremely.” Subjects were also told that the researchers were interested in knowing “how much their liking for the teacher influenced the ratings they just made.” Other subjects were asked to identify how much the characteristics they just rated influenced their liking of the teacher.
After responding to the questionnaire, the respondents were puzzled about their reactions to the videotapes and to the questionnaire items. The students had no idea why they gave one lecturer higher ratings. Most said that how much they liked the lecturer had not affected their evaluation of his individual characteristics at all.
The interesting thing about this study is that people can understand the phenomenon, but they are unaware when it is occurring. Without realizing it, humans make judgments. Even when it is pointed out, they may still deny that it is a product of the halo effect phenomenon.
22. The Marshmallow Test
Study conducted by: walter mischel.
Study Conducted in 1972 at Stanford University
In his 1972 Marshmallow Experiment, children ages four to six were taken into a room where a marshmallow was placed in front of them on a table. Before leaving each of the children alone in the room, the experimenter informed them that they would receive a second marshmallow if the first one was still on the table after they returned in 15 minutes. The examiner recorded how long each child resisted eating the marshmallow and noted whether it correlated with the child’s success in adulthood. A small number of the 600 children ate the marshmallow immediately and one-third delayed gratification long enough to receive the second marshmallow.
In follow-up studies, Mischel found that those who deferred gratification were significantly more competent and received higher SAT scores than their peers. This characteristic likely remains with a person for life. While this study seems simplistic, the findings outline some of the foundational differences in individual traits that can predict success.
23. The Monster Study
Study conducted by: wendell johnson.
Study Conducted in 1939 at the University of Iowa
Experiment Details: The Monster Study received this negative title due to the unethical methods that were used to determine the effects of positive and negative speech therapy on children.
Wendell Johnson of the University of Iowa selected 22 orphaned children, some with stutters and some without. The children were in two groups. The group of children with stutters was placed in positive speech therapy, where they were praised for their fluency. The non-stutterers were placed in negative speech therapy, where they were disparaged for every mistake in grammar that they made.
As a result of the experiment, some of the children who received negative speech therapy suffered psychological effects and retained speech problems for the rest of their lives. They were examples of the significance of positive reinforcement in education.
The initial goal of the study was to investigate positive and negative speech therapy. However, the implication spanned much further into methods of teaching for young children.
24. Violinist at the Metro Experiment
Study conducted by: staff at the washington post.
Study Conducted in 2007 at a Washington D.C. Metro Train Station
During the study, pedestrians rushed by without realizing that the musician playing at the entrance to the metro stop was Grammy-winning musician, Joshua Bell. Two days before playing in the subway, he sold out at a theater in Boston where the seats average $100. He played one of the most intricate pieces ever written with a violin worth 3.5 million dollars. In the 45 minutes the musician played his violin, only 6 people stopped and stayed for a while. Around 20 gave him money, but continued to walk their normal pace. He collected $32.
The study and the subsequent article organized by the Washington Post was part of a social experiment looking at:
- the priorities of people
Gene Weingarten wrote about the social experiment: “In a banal setting at an inconvenient time, would beauty transcend?” Later he won a Pulitzer Prize for his story. Some of the questions the article addresses are:
- Do we perceive beauty?
- Do we stop to appreciate it?
- Do we recognize the talent in an unexpected context?
As it turns out, many of us are not nearly as perceptive to our environment as we might like to think.
25. Visual Cliff Experiment
Study conducted by: eleanor gibson and richard walk.
Study Conducted in 1959 at Cornell University
Experiment Details: In 1959, psychologists Eleanor Gibson and Richard Walk set out to study depth perception in infants. They wanted to know if depth perception is a learned behavior or if it is something that we are born with. To study this, Gibson and Walk conducted the visual cliff experiment.
They studied 36 infants between the ages of six and 14 months, all of whom could crawl. The infants were placed one at a time on a visual cliff. A visual cliff was created using a large glass table that was raised about a foot off the floor. Half of the glass table had a checker pattern underneath in order to create the appearance of a ‘shallow side.’
In order to create a ‘deep side,’ a checker pattern was created on the floor; this side is the visual cliff. The placement of the checker pattern on the floor creates the illusion of a sudden drop-off. Researchers placed a foot-wide centerboard between the shallow side and the deep side. Gibson and Walk found the following:
- Nine of the infants did not move off the centerboard.
- All of the 27 infants who did move crossed into the shallow side when their mothers called them from the shallow side.
- Three of the infants crawled off the visual cliff toward their mother when called from the deep side.
- When called from the deep side, the remaining 24 children either crawled to the shallow side or cried because they could not cross the visual cliff and make it to their mother.
What this study helped demonstrate is that depth perception is likely an inborn train in humans.
Among these experiments and psychological tests, we see boundaries pushed and theories taking on a life of their own. It is through the endless stream of psychological experimentation that we can see simple hypotheses become guiding theories for those in this field. The greater field of psychology became a formal field of experimental study in 1879, when Wilhelm Wundt established the first laboratory dedicated solely to psychological research in Leipzig, Germany. Wundt was the first person to refer to himself as a psychologist. Since 1879, psychology has grown into a massive collection of:
- methods of practice
It’s also a specialty area in the field of healthcare. None of this would have been possible without these and many other important psychological experiments that have stood the test of time.
- 20 Most Unethical Experiments in Psychology
- What Careers are in Experimental Psychology?
- 10 Things to Know About the Psychology of Psychotherapy
About Education: Psychology
Explorable.com
Mental Floss.com
About the Author
After earning a Bachelor of Arts in Psychology from Rutgers University and then a Master of Science in Clinical and Forensic Psychology from Drexel University, Kristen began a career as a therapist at two prisons in Philadelphia. At the same time she volunteered as a rape crisis counselor, also in Philadelphia. After a few years in the field she accepted a teaching position at a local college where she currently teaches online psychology courses. Kristen began writing in college and still enjoys her work as a writer, editor, professor and mother.
- 5 Best Online Ph.D. Marriage and Family Counseling Programs
- Top 5 Online Doctorate in Educational Psychology
- 5 Best Online Ph.D. in Industrial and Organizational Psychology Programs
- Top 10 Online Master’s in Forensic Psychology
- 10 Most Affordable Counseling Psychology Online Programs
- 10 Most Affordable Online Industrial Organizational Psychology Programs
- 10 Most Affordable Online Developmental Psychology Online Programs
- 15 Most Affordable Online Sport Psychology Programs
- 10 Most Affordable School Psychology Online Degree Programs
- Top 50 Online Psychology Master’s Degree Programs
- Top 25 Online Master’s in Educational Psychology
- Top 25 Online Master’s in Industrial/Organizational Psychology
- Top 10 Most Affordable Online Master’s in Clinical Psychology Degree Programs
- Top 6 Most Affordable Online PhD/PsyD Programs in Clinical Psychology
- 50 Great Small Colleges for a Bachelor’s in Psychology
- 50 Most Innovative University Psychology Departments
- The 30 Most Influential Cognitive Psychologists Alive Today
- Top 30 Affordable Online Psychology Degree Programs
- 30 Most Influential Neuroscientists
- Top 40 Websites for Psychology Students and Professionals
- Top 30 Psychology Blogs
- 25 Celebrities With Animal Phobias
- Your Phobias Illustrated (Infographic)
- 15 Inspiring TED Talks on Overcoming Challenges
- 10 Fascinating Facts About the Psychology of Color
- 15 Scariest Mental Disorders of All Time
- 15 Things to Know About Mental Disorders in Animals
- 13 Most Deranged Serial Killers of All Time
Site Information
- About Online Psychology Degree Guide
Experimental Psychology: 10 Examples & Definition
Dave Cornell (PhD)
Dr. Cornell has worked in education for more than 20 years. His work has involved designing teacher certification for Trinity College in London and in-service training for state governments in the United States. He has trained kindergarten teachers in 8 countries and helped businessmen and women open baby centers and kindergartens in 3 countries.
Learn about our Editorial Process
Chris Drew (PhD)
This article was peer-reviewed and edited by Chris Drew (PhD). The review process on Helpful Professor involves having a PhD level expert fact check, edit, and contribute to articles. Reviewers ensure all content reflects expert academic consensus and is backed up with reference to academic studies. Dr. Drew has published over 20 academic articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education and holds a PhD in Education from ACU.
Experimental psychology refers to studying psychological phenomena using scientific methods. Originally, the primary scientific method involved manipulating one variable and observing systematic changes in another variable.
Today, psychologists utilize several types of scientific methodologies.
Experimental psychology examines a wide range of psychological phenomena, including: memory, sensation and perception, cognitive processes, motivation, emotion, developmental processes, in addition to the neurophysiological concomitants of each of these subjects.
Studies are conducted on both animal and human participants, and must comply with stringent requirements and controls regarding the ethical treatment of both.
Definition of Experimental Psychology
Experimental psychology is a branch of psychology that utilizes scientific methods to investigate the mind and behavior.
It involves the systematic and controlled study of human and animal behavior through observation and experimentation .
Experimental psychologists design and conduct experiments to understand cognitive processes, perception, learning, memory, emotion, and many other aspects of psychology. They often manipulate variables ( independent variables ) to see how this affects behavior or mental processes (dependent variables).
The findings from experimental psychology research are often used to better understand human behavior and can be applied in a range of contexts, such as education, health, business, and more.
Experimental Psychology Examples
1. The Puzzle Box Studies (Thorndike, 1898) Placing different cats in a box that can only be escaped by pulling a cord, and then taking detailed notes on how long it took for them to escape allowed Edward Thorndike to derive the Law of Effect: actions followed by positive consequences are more likely to occur again, and actions followed by negative consequences are less likely to occur again (Thorndike, 1898).
2. Reinforcement Schedules (Skinner, 1956) By placing rats in a Skinner Box and changing when and how often the rats are rewarded for pressing a lever, it is possible to identify how each schedule results in different behavior patterns (Skinner, 1956). This led to a wide range of theoretical ideas around how rewards and consequences can shape the behaviors of both animals and humans.
3. Observational Learning (Bandura, 1980) Some children watch a video of an adult punching and kicking a Bobo doll. Other children watch a video in which the adult plays nicely with the doll. By carefully observing the children’s behavior later when in a room with a Bobo doll, researchers can determine if television violence affects children’s behavior (Bandura, 1980).
4. The Fallibility of Memory (Loftus & Palmer, 1974) A group of participants watch the same video of two cars having an accident. Two weeks later, some are asked to estimate the rate of speed the cars were going when they “smashed” into each other. Some participants are asked to estimate the rate of speed the cars were going when they “bumped” into each other. Changing the phrasing of the question changes the memory of the eyewitness.
5. Intrinsic Motivation in the Classroom (Dweck, 1990) To investigate the role of autonomy on intrinsic motivation, half of the students are told they are “free to choose” which tasks to complete. The other half of the students are told they “must choose” some of the tasks. Researchers then carefully observe how long the students engage in the tasks and later ask them some questions about if they enjoyed doing the tasks or not.
6. Systematic Desensitization (Wolpe, 1958) A clinical psychologist carefully documents his treatment of a patient’s social phobia with progressive relaxation. At first, the patient is trained to monitor, tense, and relax various muscle groups while viewing photos of parties. Weeks later, they approach a stranger to ask for directions, initiate a conversation on a crowded bus, and attend a small social gathering. The therapist’s notes are transcribed into a scientific report and published in a peer-reviewed journal.
7. Study of Remembering (Bartlett, 1932) Bartlett’s work is a seminal study in the field of memory, where he used the concept of “schema” to describe an organized pattern of thought or behavior. He conducted a series of experiments using folk tales to show that memory recall is influenced by cultural schemas and personal experiences.
8. Study of Obedience (Milgram, 1963) This famous study explored the conflict between obedience to authority and personal conscience. Milgram found that a majority of participants were willing to administer what they believed were harmful electric shocks to a stranger when instructed by an authority figure, highlighting the power of authority and situational factors in driving behavior.
9. Pavlov’s Dog Study (Pavlov, 1927) Ivan Pavlov, a Russian physiologist, conducted a series of experiments that became a cornerstone in the field of experimental psychology. Pavlov noticed that dogs would salivate when they saw food. He then began to ring a bell each time he presented the food to the dogs. After a while, the dogs began to salivate merely at the sound of the bell. This experiment demonstrated the principle of “classical conditioning.”
10, Piaget’s Stages of Development (Piaget, 1958) Jean Piaget proposed a theory of cognitive development in children that consists of four distinct stages: the sensorimotor stage (birth to 2 years), where children learn about the world through their senses and motor activities, through to the the formal operational stage (12 years and beyond), where abstract reasoning and hypothetical thinking develop. Piaget’s theory is an example of experimental psychology as it was developed through systematic observation and experimentation on children’s problem-solving behaviors .
Types of Research Methodologies in Experimental Psychology
Researchers utilize several different types of research methodologies since the early days of Wundt (1832-1920).
1. The Experiment
The experiment involves the researcher manipulating the level of one variable, called the Independent Variable (IV), and then observing changes in another variable, called the Dependent Variable (DV).
The researcher is interested in determining if the IV causes changes in the DV. For example, does television violence make children more aggressive?
So, some children in the study, called research participants, will watch a show with TV violence, called the treatment group. Others will watch a show with no TV violence, called the control group.
So, there are two levels of the IV: violence and no violence. Next, children will be observed to see if they act more aggressively. This is the DV.
If TV violence makes children more aggressive, then the children that watched the violent show will me more aggressive than the children that watched the non-violent show.
A key requirement of the experiment is random assignment . Each research participant is assigned to one of the two groups in a way that makes it a completely random process. This means that each group will have a mix of children: different personality types, diverse family backgrounds, and range of intelligence levels.
2. The Longitudinal Study
A longitudinal study involves selecting a sample of participants and then following them for years, or decades, periodically collecting data on the variables of interest.
For example, a researcher might be interested in determining if parenting style affects academic performance of children. Parenting style is called the predictor variable , and academic performance is called the outcome variable .
Researchers will begin by randomly selecting a group of children to be in the study. Then, they will identify the type of parenting practices used when the children are 4 and 5 years old.
A few years later, perhaps when the children are 8 and 9, the researchers will collect data on their grades. This process can be repeated over the next 10 years, including through college.
If parenting style has an effect on academic performance, then the researchers will see a connection between the predictor variable and outcome variable.
Children raised with parenting style X will have higher grades than children raised with parenting style Y.
3. The Case Study
The case study is an in-depth study of one individual. This is a research methodology often used early in the examination of a psychological phenomenon or therapeutic treatment.
For example, in the early days of treating phobias, a clinical psychologist may try teaching one of their patients how to relax every time they see the object that creates so much fear and anxiety, such as a large spider.
The therapist would take very detailed notes on how the teaching process was implemented and the reactions of the patient. When the treatment had been completed, those notes would be written in a scientific form and submitted for publication in a scientific journal for other therapists to learn from.
There are several other types of methodologies available which vary different aspects of the three described above. The researcher will select a methodology that is most appropriate to the phenomenon they want to examine.
They also must take into account various practical considerations such as how much time and resources are needed to complete the study. Conducting research always costs money.
People and equipment are needed to carry-out every study, so researchers often try to obtain funding from their university or a government agency.
Origins and Key Developments in Experimental Psychology
Wilhelm Maximilian Wundt (1832-1920) is considered one of the fathers of modern psychology. He was a physiologist and philosopher and helped establish psychology as a distinct discipline (Khaleefa, 1999).
In 1879 he established the world’s first psychology research lab at the University of Leipzig. This is considered a key milestone for establishing psychology as a scientific discipline. In addition to being the first person to use the term “psychologist,” to describe himself, he also founded the discipline’s first scientific journal Philosphische Studien in 1883.
Another notable figure in the development of experimental psychology is Ernest Weber . Trained as a physician, Weber studied sensation and perception and created the first quantitative law in psychology.
The equation denotes how judgments of sensory differences are relative to previous levels of sensation, referred to as the just-noticeable difference (jnd). This is known today as Weber’s Law (Hergenhahn, 2009).
Gustav Fechner , one of Weber’s students, published the first book on experimental psychology in 1860, titled Elemente der Psychophysik. His worked centered on the measurement of psychophysical facets of sensation and perception, with many of his methods still in use today.
The first American textbook on experimental psychology was Elements of Physiological Psychology, published in 1887 by George Trumball Ladd .
Ladd also established a psychology lab at Yale University, while Stanley Hall and Charles Sanders continued Wundt’s work at a lab at Johns Hopkins University.
In the late 1800s, Charles Pierce’s contribution to experimental psychology is especially noteworthy because he invented the concept of random assignment (Stigler, 1992; Dehue, 1997).
Go Deeper: 15 Random Assignment Examples
This procedure ensures that each participant has an equal chance of being placed in any of the experimental groups (e.g., treatment or control group). This eliminates the influence of confounding factors related to inherent characteristics of the participants.
Random assignment is a fundamental criterion for a study to be considered a valid experiment.
From there, experimental psychology flourished in the 20th century as a science and transformed into an approach utilized in cognitive psychology, developmental psychology, and social psychology .
Today, the term experimental psychology refers to the study of a wide range of phenomena and involves methodologies not limited to the manipulation of variables.
The Scientific Process and Experimental Psychology
The one thing that makes psychology a science and distinguishes it from its roots in philosophy is the reliance upon the scientific process to answer questions. This makes psychology a science was the main goal of its earliest founders such as Wilhelm Wundt.
There are numerous steps in the scientific process, outlined in the graphic below.
1. Observation
First, the scientist observes an interesting phenomenon that sparks a question. For example, are the memories of eyewitnesses really reliable, or are they subject to bias or unintentional manipulation?
2. Hypothesize
Next, this question is converted into a testable hypothesis. For instance: the words used to question a witness can influence what they think they remember.
3. Devise a Study
Then the researcher(s) select a methodology that will allow them to test that hypothesis. In this case, the researchers choose the experiment, which will involve randomly assigning some participants to different conditions.
In one condition, participants are asked a question that implies a certain memory (treatment group), while other participants are asked a question which is phrased neutrally and does not imply a certain memory (control group).
The researchers then write a proposal that describes in detail the procedures they want to use, how participants will be selected, and the safeguards they will employ to ensure the rights of the participants.
That proposal is submitted to an Institutional Review Board (IRB). The IRB is comprised of a panel of researchers, community representatives, and other professionals that are responsible for reviewing all studies involving human participants.
4. Conduct the Study
If the IRB accepts the proposal, then the researchers may begin collecting data. After the data has been collected, it is analyzed using a software program such as SPSS.
Those analyses will either support or reject the hypothesis. That is, either the participants’ memories were affected by the wording of the question, or not.
5. Publish the study
Finally, the researchers write a paper detailing their procedures and results of the statistical analyses. That paper is then submitted to a scientific journal.
The lead editor of that journal will then send copies of the paper to 3-5 experts in that subject. Each of those experts will read the paper and basically try to find as many things wrong with it as possible. Because they are experts, they are very good at this task.
After reading those critiques, most likely, the editor will send the paper back to the researchers and require that they respond to the criticisms, collect more data, or reject the paper outright.
In some cases, the study was so well-done that the criticisms were minimal and the editor accepts the paper. It then gets published in the scientific journal several months later.
That entire process can easily take 2 years, usually more. But, the findings of that study went through a very rigorous process. This means that we can have substantial confidence that the conclusions of the study are valid.
Experimental psychology refers to utilizing a scientific process to investigate psychological phenomenon.
There are a variety of methods employed today. They are used to study a wide range of subjects, including memory, cognitive processes, emotions and the neurophysiological basis of each.
The history of psychology as a science began in the 1800s primarily in Germany. As interest grew, the field expanded to the United States where several influential research labs were established.
As more methodologies were developed, the field of psychology as a science evolved into a prolific scientific discipline that has provided invaluable insights into human behavior.
Bartlett, F. C., & Bartlett, F. C. (1995). Remembering: A study in experimental and social psychology . Cambridge university press.
Dehue, T. (1997). Deception, efficiency, and random groups: Psychology and the gradual origination of the random group design. Isis , 88 (4), 653-673.
Ebbinghaus, H. (2013). Memory: A contribution to experimental psychology. Annals of neurosciences , 20 (4), 155.
Hergenhahn, B. R. (2009). An introduction to the history of psychology. Belmont. CA: Wadsworth Cengage Learning .
Khaleefa, O. (1999). Who is the founder of psychophysics and experimental psychology? American Journal of Islam and Society , 16 (2), 1-26.
Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of auto-mobile destruction : An example of the interaction between language and memory. Journal of Verbal Learning and Verbal behavior , 13, 585-589.
Pavlov, I.P. (1927). Conditioned reflexes . Dover, New York.
Piaget, J. (1959). The language and thought of the child (Vol. 5). Psychology Press.
Piaget, J., Fraisse, P., & Reuchlin, M. (2014). Experimental psychology its scope and method: Volume I (Psychology Revivals): History and method . Psychology Press.
Skinner, B. F. (1956). A case history in scientlfic method. American Psychologist, 11 , 221-233
Stigler, S. M. (1992). A historical view of statistical concepts in psychology and educational research. American Journal of Education , 101 (1), 60-70.
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplement 2 .
Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford, CA: Stanford University Press.
Appendix: Images reproduced as Text
Definition: Experimental psychology is a branch of psychology that focuses on conducting systematic and controlled experiments to study human behavior and cognition.
Overview: Experimental psychology aims to gather empirical evidence and explore cause-and-effect relationships between variables. Experimental psychologists utilize various research methods, including laboratory experiments, surveys, and observations, to investigate topics such as perception, memory, learning, motivation, and social behavior .
Example: The Pavlov’s Dog experimental psychology experiment used scientific methods to develop a theory about how learning and association occur in animals. The same concepts were subsequently used in the study of humans, wherein psychology-based ideas about learning were developed. Pavlov’s use of the empirical evidence was foundational to the study’s success.
Experimental Psychology Milestones:
1890: William James publishes “The Principles of Psychology”, a foundational text in the field of psychology.
1896: Lightner Witmer opens the first psychological clinic at the University of Pennsylvania, marking the beginning of clinical psychology.
1913: John B. Watson publishes “Psychology as the Behaviorist Views It”, marking the beginning of Behaviorism.
1920: Hermann Rorschach introduces the Rorschach inkblot test.
1938: B.F. Skinner introduces the concept of operant conditioning .
1967: Ulric Neisser publishes “Cognitive Psychology” , marking the beginning of the cognitive revolution.
1980: The third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III) is published, introducing a new classification system for mental disorders.
The Scientific Process
- Observe an interesting phenomenon
- Formulate testable hypothesis
- Select methodology and design study
- Submit research proposal to IRB
- Collect and analyzed data; write paper
- Submit paper for critical reviews
- Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 23 Achieved Status Examples
- Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 25 Defense Mechanisms Examples
- Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Theory of Planned Behavior Examples
- Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 18 Adaptive Behavior Examples
- Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 23 Achieved Status Examples
- Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 15 Ableism Examples
- Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 25 Defense Mechanisms Examples
- Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 15 Theory of Planned Behavior Examples
Leave a Comment Cancel Reply
Your email address will not be published. Required fields are marked *
- History & Society
- Science & Tech
- Biographies
- Animals & Nature
- Geography & Travel
- Arts & Culture
- Games & Quizzes
- On This Day
- One Good Fact
- New Articles
- Lifestyles & Social Issues
- Philosophy & Religion
- Politics, Law & Government
- World History
- Health & Medicine
- Browse Biographies
- Birds, Reptiles & Other Vertebrates
- Bugs, Mollusks & Other Invertebrates
- Environment
- Fossils & Geologic Time
- Entertainment & Pop Culture
- Sports & Recreation
- Visual Arts
- Demystified
- Image Galleries
- Infographics
- Top Questions
- Britannica Kids
- Saving Earth
- Space Next 50
- Student Center
experimental psychology
Our editors will review what you’ve submitted and determine whether to revise the article.
- Verywell Mind - How Does Experimental Psychology Study Behavior?
experimental psychology , a method of studying psychological phenomena and processes. The experimental method in psychology attempts to account for the activities of animals (including humans) and the functional organization of mental processes by manipulating variables that may give rise to behaviour; it is primarily concerned with discovering laws that describe manipulable relationships. The term generally connotes all areas of psychology that use the experimental method.
These areas include the study of sensation and perception , learning and memory , motivation , and biological psychology . There are experimental branches in many other areas, however, including child psychology , clinical psychology , educational psychology , and social psychology . Usually the experimental psychologist deals with normal, intact organisms; in biological psychology, however, studies are often conducted with organisms modified by surgery, radiation, drug treatment, or long-standing deprivations of various kinds or with organisms that naturally present organic abnormalities or emotional disorders. See also psychophysics .
Exploring Experimental Research: Methodologies, Designs, and Applications Across Disciplines
- SSRN Electronic Journal
- The National University of Cheasim Kamchaymear
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
- Khaoula Boussalham
- COMPUT COMMUN REV
- Debbie Rohwer
- Sorakrich Maneewan
- Ravinder Koul
- Int J Contemp Hospit Manag
- J EXP ANAL BEHAV
- Alan E. Kazdin
- Jimmie Leppink
- Keith Morrison
- Louis Cohen
- Lawrence Manion
- ACCOUNT ORG SOC
- Wim A. Van der Stede
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
6.2 Experimental Design
Learning objectives.
- Explain the difference between between-subjects and within-subjects experiments, list some of the pros and cons of each approach, and decide which approach to use to answer a particular research question.
- Define random assignment, distinguish it from random sampling, explain its purpose in experimental research, and use some simple strategies to implement it.
- Define what a control condition is, explain its purpose in research on treatment effectiveness, and describe some alternative types of control conditions.
- Define several types of carryover effect, give examples of each, and explain how counterbalancing helps to deal with them.
In this section, we look at some different ways to design an experiment. The primary distinction we will make is between approaches in which each participant experiences one level of the independent variable and approaches in which each participant experiences all levels of the independent variable. The former are called between-subjects experiments and the latter are called within-subjects experiments.
Between-Subjects Experiments
In a between-subjects experiment , each participant is tested in only one condition. For example, a researcher with a sample of 100 college students might assign half of them to write about a traumatic event and the other half write about a neutral event. Or a researcher with a sample of 60 people with severe agoraphobia (fear of open spaces) might assign 20 of them to receive each of three different treatments for that disorder. It is essential in a between-subjects experiment that the researcher assign participants to conditions so that the different groups are, on average, highly similar to each other. Those in a trauma condition and a neutral condition, for example, should include a similar proportion of men and women, and they should have similar average intelligence quotients (IQs), similar average levels of motivation, similar average numbers of health problems, and so on. This is a matter of controlling these extraneous participant variables across conditions so that they do not become confounding variables.
Random Assignment
The primary way that researchers accomplish this kind of control of extraneous variables across conditions is called random assignment , which means using a random process to decide which participants are tested in which conditions. Do not confuse random assignment with random sampling. Random sampling is a method for selecting a sample from a population, and it is rarely used in psychological research. Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too.
In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition (e.g., a 50% chance of being assigned to each of two conditions). The second is that each participant is assigned to a condition independently of other participants. Thus one way to assign participants to two conditions would be to flip a coin for each one. If the coin lands heads, the participant is assigned to Condition A, and if it lands tails, the participant is assigned to Condition B. For three conditions, one could use a computer to generate a random integer from 1 to 3 for each participant. If the integer is 1, the participant is assigned to Condition A; if it is 2, the participant is assigned to Condition B; and if it is 3, the participant is assigned to Condition C. In practice, a full sequence of conditions—one for each participant expected to be in the experiment—is usually created ahead of time, and each new participant is assigned to the next condition in the sequence as he or she is tested. When the procedure is computerized, the computer program often handles the random assignment.
One problem with coin flipping and other strict procedures for random assignment is that they are likely to result in unequal sample sizes in the different conditions. Unequal sample sizes are generally not a serious problem, and you should never throw away data you have already collected to achieve equal sample sizes. However, for a fixed number of participants, it is statistically most efficient to divide them into equal-sized groups. It is standard practice, therefore, to use a kind of modified random assignment that keeps the number of participants in each group as similar as possible. One approach is block randomization . In block randomization, all the conditions occur once in the sequence before any of them is repeated. Then they all occur again before any of them is repeated again. Within each of these “blocks,” the conditions occur in a random order. Again, the sequence of conditions is usually generated before any participants are tested, and each new participant is assigned to the next condition in the sequence. Table 6.2 “Block Randomization Sequence for Assigning Nine Participants to Three Conditions” shows such a sequence for assigning nine participants to three conditions. The Research Randomizer website ( http://www.randomizer.org ) will generate block randomization sequences for any number of participants and conditions. Again, when the procedure is computerized, the computer program often handles the block randomization.
Table 6.2 Block Randomization Sequence for Assigning Nine Participants to Three Conditions
Participant | Condition |
---|---|
4 | B |
5 | C |
6 | A |
Random assignment is not guaranteed to control all extraneous variables across conditions. It is always possible that just by chance, the participants in one condition might turn out to be substantially older, less tired, more motivated, or less depressed on average than the participants in another condition. However, there are some reasons that this is not a major concern. One is that random assignment works better than one might expect, especially for large samples. Another is that the inferential statistics that researchers use to decide whether a difference between groups reflects a difference in the population takes the “fallibility” of random assignment into account. Yet another reason is that even if random assignment does result in a confounding variable and therefore produces misleading results, this is likely to be detected when the experiment is replicated. The upshot is that random assignment to conditions—although not infallible in terms of controlling extraneous variables—is always considered a strength of a research design.
Treatment and Control Conditions
Between-subjects experiments are often used to determine whether a treatment works. In psychological research, a treatment is any intervention meant to change people’s behavior for the better. This includes psychotherapies and medical treatments for psychological disorders but also interventions designed to improve learning, promote conservation, reduce prejudice, and so on. To determine whether a treatment works, participants are randomly assigned to either a treatment condition , in which they receive the treatment, or a control condition , in which they do not receive the treatment. If participants in the treatment condition end up better off than participants in the control condition—for example, they are less depressed, learn faster, conserve more, express less prejudice—then the researcher can conclude that the treatment works. In research on the effectiveness of psychotherapies and medical treatments, this type of experiment is often called a randomized clinical trial .
There are different types of control conditions. In a no-treatment control condition , participants receive no treatment whatsoever. One problem with this approach, however, is the existence of placebo effects. A placebo is a simulated treatment that lacks any active ingredient or element that should make it effective, and a placebo effect is a positive effect of such a treatment. Many folk remedies that seem to work—such as eating chicken soup for a cold or placing soap under the bedsheets to stop nighttime leg cramps—are probably nothing more than placebos. Although placebo effects are not well understood, they are probably driven primarily by people’s expectations that they will improve. Having the expectation to improve can result in reduced stress, anxiety, and depression, which can alter perceptions and even improve immune system functioning (Price, Finniss, & Benedetti, 2008).
Placebo effects are interesting in their own right (see Note 6.28 “The Powerful Placebo” ), but they also pose a serious problem for researchers who want to determine whether a treatment works. Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions” shows some hypothetical results in which participants in a treatment condition improved more on average than participants in a no-treatment control condition. If these conditions (the two leftmost bars in Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions” ) were the only conditions in this experiment, however, one could not conclude that the treatment worked. It could be instead that participants in the treatment group improved more because they expected to improve, while those in the no-treatment control condition did not.
Figure 6.2 Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions
Fortunately, there are several solutions to this problem. One is to include a placebo control condition , in which participants receive a placebo that looks much like the treatment but lacks the active ingredient or element thought to be responsible for the treatment’s effectiveness. When participants in a treatment condition take a pill, for example, then those in a placebo control condition would take an identical-looking pill that lacks the active ingredient in the treatment (a “sugar pill”). In research on psychotherapy effectiveness, the placebo might involve going to a psychotherapist and talking in an unstructured way about one’s problems. The idea is that if participants in both the treatment and the placebo control groups expect to improve, then any improvement in the treatment group over and above that in the placebo control group must have been caused by the treatment and not by participants’ expectations. This is what is shown by a comparison of the two outer bars in Figure 6.2 “Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions” .
Of course, the principle of informed consent requires that participants be told that they will be assigned to either a treatment or a placebo control condition—even though they cannot be told which until the experiment ends. In many cases the participants who had been in the control condition are then offered an opportunity to have the real treatment. An alternative approach is to use a waitlist control condition , in which participants are told that they will receive the treatment but must wait until the participants in the treatment condition have already received it. This allows researchers to compare participants who have received the treatment with participants who are not currently receiving it but who still expect to improve (eventually). A final solution to the problem of placebo effects is to leave out the control condition completely and compare any new treatment with the best available alternative treatment. For example, a new treatment for simple phobia could be compared with standard exposure therapy. Because participants in both conditions receive a treatment, their expectations about improvement should be similar. This approach also makes sense because once there is an effective treatment, the interesting question about a new treatment is not simply “Does it work?” but “Does it work better than what is already available?”
The Powerful Placebo
Many people are not surprised that placebos can have a positive effect on disorders that seem fundamentally psychological, including depression, anxiety, and insomnia. However, placebos can also have a positive effect on disorders that most people think of as fundamentally physiological. These include asthma, ulcers, and warts (Shapiro & Shapiro, 1999). There is even evidence that placebo surgery—also called “sham surgery”—can be as effective as actual surgery.
Medical researcher J. Bruce Moseley and his colleagues conducted a study on the effectiveness of two arthroscopic surgery procedures for osteoarthritis of the knee (Moseley et al., 2002). The control participants in this study were prepped for surgery, received a tranquilizer, and even received three small incisions in their knees. But they did not receive the actual arthroscopic surgical procedure. The surprising result was that all participants improved in terms of both knee pain and function, and the sham surgery group improved just as much as the treatment groups. According to the researchers, “This study provides strong evidence that arthroscopic lavage with or without débridement [the surgical procedures used] is not better than and appears to be equivalent to a placebo procedure in improving knee pain and self-reported function” (p. 85).
Research has shown that patients with osteoarthritis of the knee who receive a “sham surgery” experience reductions in pain and improvement in knee function similar to those of patients who receive a real surgery.
Army Medicine – Surgery – CC BY 2.0.
Within-Subjects Experiments
In a within-subjects experiment , each participant is tested under all conditions. Consider an experiment on the effect of a defendant’s physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an unattractive defendant and asked to judge his guilt. In a within-subjects experiment, however, the same group of participants would judge the guilt of both an attractive and an unattractive defendant.
The primary advantage of this approach is that it provides maximum control of extraneous participant variables. Participants in all conditions have the same mean IQ, same socioeconomic status, same number of siblings, and so on—because they are the very same people. Within-subjects experiments also make it possible to use statistical procedures that remove the effect of these extraneous participant variables on the dependent variable and therefore make the data less “noisy” and the effect of the independent variable easier to detect. We will look more closely at this idea later in the book.
Carryover Effects and Counterbalancing
The primary disadvantage of within-subjects designs is that they can result in carryover effects. A carryover effect is an effect of being tested in one condition on participants’ behavior in later conditions. One type of carryover effect is a practice effect , where participants perform a task better in later conditions because they have had a chance to practice it. Another type is a fatigue effect , where participants perform a task worse in later conditions because they become tired or bored. Being tested in one condition can also change how participants perceive stimuli or interpret their task in later conditions. This is called a context effect . For example, an average-looking defendant might be judged more harshly when participants have just judged an attractive defendant than when they have just judged an unattractive defendant. Within-subjects experiments also make it easier for participants to guess the hypothesis. For example, a participant who is asked to judge the guilt of an attractive defendant and then is asked to judge the guilt of an unattractive defendant is likely to guess that the hypothesis is that defendant attractiveness affects judgments of guilt. This could lead the participant to judge the unattractive defendant more harshly because he thinks this is what he is expected to do. Or it could make participants judge the two defendants similarly in an effort to be “fair.”
Carryover effects can be interesting in their own right. (Does the attractiveness of one person depend on the attractiveness of other people that we have seen recently?) But when they are not the focus of the research, carryover effects can be problematic. Imagine, for example, that participants judge the guilt of an attractive defendant and then judge the guilt of an unattractive defendant. If they judge the unattractive defendant more harshly, this might be because of his unattractiveness. But it could be instead that they judge him more harshly because they are becoming bored or tired. In other words, the order of the conditions is a confounding variable. The attractive condition is always the first condition and the unattractive condition the second. Thus any difference between the conditions in terms of the dependent variable could be caused by the order of the conditions and not the independent variable itself.
There is a solution to the problem of order effects, however, that can be used in many situations. It is counterbalancing , which means testing different participants in different orders. For example, some participants would be tested in the attractive defendant condition followed by the unattractive defendant condition, and others would be tested in the unattractive condition followed by the attractive condition. With three conditions, there would be six different orders (ABC, ACB, BAC, BCA, CAB, and CBA), so some participants would be tested in each of the six orders. With counterbalancing, participants are assigned to orders randomly, using the techniques we have already discussed. Thus random assignment plays an important role in within-subjects designs just as in between-subjects designs. Here, instead of randomly assigning to conditions, they are randomly assigned to different orders of conditions. In fact, it can safely be said that if a study does not involve random assignment in one form or another, it is not an experiment.
There are two ways to think about what counterbalancing accomplishes. One is that it controls the order of conditions so that it is no longer a confounding variable. Instead of the attractive condition always being first and the unattractive condition always being second, the attractive condition comes first for some participants and second for others. Likewise, the unattractive condition comes first for some participants and second for others. Thus any overall difference in the dependent variable between the two conditions cannot have been caused by the order of conditions. A second way to think about what counterbalancing accomplishes is that if there are carryover effects, it makes it possible to detect them. One can analyze the data separately for each order to see whether it had an effect.
When 9 Is “Larger” Than 221
Researcher Michael Birnbaum has argued that the lack of context provided by between-subjects designs is often a bigger problem than the context effects created by within-subjects designs. To demonstrate this, he asked one group of participants to rate how large the number 9 was on a 1-to-10 rating scale and another group to rate how large the number 221 was on the same 1-to-10 rating scale (Birnbaum, 1999). Participants in this between-subjects design gave the number 9 a mean rating of 5.13 and the number 221 a mean rating of 3.10. In other words, they rated 9 as larger than 221! According to Birnbaum, this is because participants spontaneously compared 9 with other one-digit numbers (in which case it is relatively large) and compared 221 with other three-digit numbers (in which case it is relatively small).
Simultaneous Within-Subjects Designs
So far, we have discussed an approach to within-subjects designs in which participants are tested in one condition at a time. There is another approach, however, that is often used when participants make multiple responses in each condition. Imagine, for example, that participants judge the guilt of 10 attractive defendants and 10 unattractive defendants. Instead of having people make judgments about all 10 defendants of one type followed by all 10 defendants of the other type, the researcher could present all 20 defendants in a sequence that mixed the two types. The researcher could then compute each participant’s mean rating for each type of defendant. Or imagine an experiment designed to see whether people with social anxiety disorder remember negative adjectives (e.g., “stupid,” “incompetent”) better than positive ones (e.g., “happy,” “productive”). The researcher could have participants study a single list that includes both kinds of words and then have them try to recall as many words as possible. The researcher could then count the number of each type of word that was recalled. There are many ways to determine the order in which the stimuli are presented, but one common way is to generate a different random order for each participant.
Between-Subjects or Within-Subjects?
Almost every experiment can be conducted using either a between-subjects design or a within-subjects design. This means that researchers must choose between the two approaches based on their relative merits for the particular situation.
Between-subjects experiments have the advantage of being conceptually simpler and requiring less testing time per participant. They also avoid carryover effects without the need for counterbalancing. Within-subjects experiments have the advantage of controlling extraneous participant variables, which generally reduces noise in the data and makes it easier to detect a relationship between the independent and dependent variables.
A good rule of thumb, then, is that if it is possible to conduct a within-subjects experiment (with proper counterbalancing) in the time that is available per participant—and you have no serious concerns about carryover effects—this is probably the best option. If a within-subjects design would be difficult or impossible to carry out, then you should consider a between-subjects design instead. For example, if you were testing participants in a doctor’s waiting room or shoppers in line at a grocery store, you might not have enough time to test each participant in all conditions and therefore would opt for a between-subjects design. Or imagine you were trying to reduce people’s level of prejudice by having them interact with someone of another race. A within-subjects design with counterbalancing would require testing some participants in the treatment condition first and then in a control condition. But if the treatment works and reduces people’s level of prejudice, then they would no longer be suitable for testing in the control condition. This is true for many designs that involve a treatment meant to produce long-term change in participants’ behavior (e.g., studies testing the effectiveness of psychotherapy). Clearly, a between-subjects design would be necessary here.
Remember also that using one type of design does not preclude using the other type in a different study. There is no reason that a researcher could not use both a between-subjects design and a within-subjects design to answer the same research question. In fact, professional researchers often do exactly this.
Key Takeaways
- Experiments can be conducted using either between-subjects or within-subjects designs. Deciding which to use in a particular situation requires careful consideration of the pros and cons of each approach.
- Random assignment to conditions in between-subjects experiments or to orders of conditions in within-subjects experiments is a fundamental element of experimental research. Its purpose is to control extraneous variables so that they do not become confounding variables.
- Experimental research on the effectiveness of a treatment requires both a treatment condition and a control condition, which can be a no-treatment control condition, a placebo control condition, or a waitlist control condition. Experimental treatments can also be compared with the best available alternative.
Discussion: For each of the following topics, list the pros and cons of a between-subjects and within-subjects design and decide which would be better.
- You want to test the relative effectiveness of two training programs for running a marathon.
- Using photographs of people as stimuli, you want to see if smiling people are perceived as more intelligent than people who are not smiling.
- In a field experiment, you want to see if the way a panhandler is dressed (neatly vs. sloppily) affects whether or not passersby give him any money.
- You want to see if concrete nouns (e.g., dog ) are recalled better than abstract nouns (e.g., truth ).
- Discussion: Imagine that an experiment shows that participants who receive psychodynamic therapy for a dog phobia improve more than participants in a no-treatment control group. Explain a fundamental problem with this research design and at least two ways that it might be corrected.
Birnbaum, M. H. (1999). How to show that 9 > 221: Collect judgments in a between-subjects design. Psychological Methods, 4 , 243–249.
Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. The New England Journal of Medicine, 347 , 81–88.
Price, D. D., Finniss, D. G., & Benedetti, F. (2008). A comprehensive review of the placebo effect: Recent advances and current thought. Annual Review of Psychology, 59 , 565–590.
Shapiro, A. K., & Shapiro, E. (1999). The powerful placebo: From ancient priest to modern physician . Baltimore, MD: Johns Hopkins University Press.
Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Research Methods In Psychology
Saul McLeod, PhD
Editor-in-Chief for Simply Psychology
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
Learn about our Editorial Process
Olivia Guy-Evans, MSc
Associate Editor for Simply Psychology
BSc (Hons) Psychology, MSc Psychology of Education
Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.
Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.
Hypotheses are statements about the prediction of the results, that can be verified or disproved by some investigation.
There are four types of hypotheses :
- Null Hypotheses (H0 ) – these predict that no difference will be found in the results between the conditions. Typically these are written ‘There will be no difference…’
- Alternative Hypotheses (Ha or H1) – these predict that there will be a significant difference in the results between the two conditions. This is also known as the experimental hypothesis.
- One-tailed (directional) hypotheses – these state the specific direction the researcher expects the results to move in, e.g. higher, lower, more, less. In a correlation study, the predicted direction of the correlation can be either positive or negative.
- Two-tailed (non-directional) hypotheses – these state that a difference will be found between the conditions of the independent variable but does not state the direction of a difference or relationship. Typically these are always written ‘There will be a difference ….’
All research has an alternative hypothesis (either a one-tailed or two-tailed) and a corresponding null hypothesis.
Once the research is conducted and results are found, psychologists must accept one hypothesis and reject the other.
So, if a difference is found, the Psychologist would accept the alternative hypothesis and reject the null. The opposite applies if no difference is found.
Sampling techniques
Sampling is the process of selecting a representative group from the population under study.
A sample is the participants you select from a target population (the group you are interested in) to make generalizations about.
Representative means the extent to which a sample mirrors a researcher’s target population and reflects its characteristics.
Generalisability means the extent to which their findings can be applied to the larger population of which their sample was a part.
- Volunteer sample : where participants pick themselves through newspaper adverts, noticeboards or online.
- Opportunity sampling : also known as convenience sampling , uses people who are available at the time the study is carried out and willing to take part. It is based on convenience.
- Random sampling : when every person in the target population has an equal chance of being selected. An example of random sampling would be picking names out of a hat.
- Systematic sampling : when a system is used to select participants. Picking every Nth person from all possible participants. N = the number of people in the research population / the number of people needed for the sample.
- Stratified sampling : when you identify the subgroups and select participants in proportion to their occurrences.
- Snowball sampling : when researchers find a few participants, and then ask them to find participants themselves and so on.
- Quota sampling : when researchers will be told to ensure the sample fits certain quotas, for example they might be told to find 90 participants, with 30 of them being unemployed.
Experiments always have an independent and dependent variable .
- The independent variable is the one the experimenter manipulates (the thing that changes between the conditions the participants are placed into). It is assumed to have a direct effect on the dependent variable.
- The dependent variable is the thing being measured, or the results of the experiment.
Operationalization of variables means making them measurable/quantifiable. We must use operationalization to ensure that variables are in a form that can be easily tested.
For instance, we can’t really measure ‘happiness’, but we can measure how many times a person smiles within a two-hour period.
By operationalizing variables, we make it easy for someone else to replicate our research. Remember, this is important because we can check if our findings are reliable.
Extraneous variables are all variables which are not independent variable but could affect the results of the experiment.
It can be a natural characteristic of the participant, such as intelligence levels, gender, or age for example, or it could be a situational feature of the environment such as lighting or noise.
Demand characteristics are a type of extraneous variable that occurs if the participants work out the aims of the research study, they may begin to behave in a certain way.
For example, in Milgram’s research , critics argued that participants worked out that the shocks were not real and they administered them as they thought this was what was required of them.
Extraneous variables must be controlled so that they do not affect (confound) the results.
Randomly allocating participants to their conditions or using a matched pairs experimental design can help to reduce participant variables.
Situational variables are controlled by using standardized procedures, ensuring every participant in a given condition is treated in the same way
Experimental Design
Experimental design refers to how participants are allocated to each condition of the independent variable, such as a control or experimental group.
- Independent design ( between-groups design ): each participant is selected for only one group. With the independent design, the most common way of deciding which participants go into which group is by means of randomization.
- Matched participants design : each participant is selected for only one group, but the participants in the two groups are matched for some relevant factor or factors (e.g. ability; sex; age).
- Repeated measures design ( within groups) : each participant appears in both groups, so that there are exactly the same participants in each group.
- The main problem with the repeated measures design is that there may well be order effects. Their experiences during the experiment may change the participants in various ways.
- They may perform better when they appear in the second group because they have gained useful information about the experiment or about the task. On the other hand, they may perform less well on the second occasion because of tiredness or boredom.
- Counterbalancing is the best way of preventing order effects from disrupting the findings of an experiment, and involves ensuring that each condition is equally likely to be used first and second by the participants.
If we wish to compare two groups with respect to a given independent variable, it is essential to make sure that the two groups do not differ in any other important way.
Experimental Methods
All experimental methods involve an iv (independent variable) and dv (dependent variable)..
The researcher decides where the experiment will take place, at what time, with which participants, in what circumstances, using a standardized procedure.
- Field experiments are conducted in the everyday (natural) environment of the participants. The experimenter still manipulates the IV, but in a real-life setting. It may be possible to control extraneous variables, though such control is more difficult than in a lab experiment.
- Natural experiments are when a naturally occurring IV is investigated that isn’t deliberately manipulated, it exists anyway. Participants are not randomly allocated, and the natural event may only occur rarely.
Case studies are in-depth investigations of a person, group, event, or community. It uses information from a range of sources, such as from the person concerned and also from their family and friends.
Many techniques may be used such as interviews, psychological tests, observations and experiments. Case studies are generally longitudinal: in other words, they follow the individual or group over an extended period of time.
Case studies are widely used in psychology and among the best-known ones carried out were by Sigmund Freud . He conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.
Case studies provide rich qualitative data and have high levels of ecological validity. However, it is difficult to generalize from individual cases as each one has unique characteristics.
Correlational Studies
Correlation means association; it is a measure of the extent to which two variables are related. One of the variables can be regarded as the predictor variable with the other one as the outcome variable.
Correlational studies typically involve obtaining two different measures from a group of participants, and then assessing the degree of association between the measures.
The predictor variable can be seen as occurring before the outcome variable in some sense. It is called the predictor variable, because it forms the basis for predicting the value of the outcome variable.
Relationships between variables can be displayed on a graph or as a numerical score called a correlation coefficient.
- If an increase in one variable tends to be associated with an increase in the other, then this is known as a positive correlation .
- If an increase in one variable tends to be associated with a decrease in the other, then this is known as a negative correlation .
- A zero correlation occurs when there is no relationship between variables.
After looking at the scattergraph, if we want to be sure that a significant relationship does exist between the two variables, a statistical test of correlation can be conducted, such as Spearman’s rho.
The test will give us a score, called a correlation coefficient . This is a value between 0 and 1, and the closer to 1 the score is, the stronger the relationship between the variables. This value can be both positive e.g. 0.63, or negative -0.63.
A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.
Correlation does not always prove causation, as a third variable may be involved.
Interview Methods
Interviews are commonly divided into two types: structured and unstructured.
A fixed, predetermined set of questions is put to every participant in the same order and in the same way.
Responses are recorded on a questionnaire, and the researcher presets the order and wording of questions, and sometimes the range of alternative answers.
The interviewer stays within their role and maintains social distance from the interviewee.
There are no set questions, and the participant can raise whatever topics he/she feels are relevant and ask them in their own way. Questions are posed about participants’ answers to the subject
Unstructured interviews are most useful in qualitative research to analyze attitudes and values.
Though they rarely provide a valid basis for generalization, their main advantage is that they enable the researcher to probe social actors’ subjective point of view.
Questionnaire Method
Questionnaires can be thought of as a kind of written interview. They can be carried out face to face, by telephone, or post.
The choice of questions is important because of the need to avoid bias or ambiguity in the questions, ‘leading’ the respondent or causing offense.
- Open questions are designed to encourage a full, meaningful answer using the subject’s own knowledge and feelings. They provide insights into feelings, opinions, and understanding. Example: “How do you feel about that situation?”
- Closed questions can be answered with a simple “yes” or “no” or specific information, limiting the depth of response. They are useful for gathering specific facts or confirming details. Example: “Do you feel anxious in crowds?”
Its other practical advantages are that it is cheaper than face-to-face interviews and can be used to contact many respondents scattered over a wide area relatively quickly.
Observations
There are different types of observation methods :
- Covert observation is where the researcher doesn’t tell the participants they are being observed until after the study is complete. There could be ethical problems or deception and consent with this particular observation method.
- Overt observation is where a researcher tells the participants they are being observed and what they are being observed for.
- Controlled : behavior is observed under controlled laboratory conditions (e.g., Bandura’s Bobo doll study).
- Natural : Here, spontaneous behavior is recorded in a natural setting.
- Participant : Here, the observer has direct contact with the group of people they are observing. The researcher becomes a member of the group they are researching.
- Non-participant (aka “fly on the wall): The researcher does not have direct contact with the people being observed. The observation of participants’ behavior is from a distance
Pilot Study
A pilot study is a small scale preliminary study conducted in order to evaluate the feasibility of the key s teps in a future, full-scale project.
A pilot study is an initial run-through of the procedures to be used in an investigation; it involves selecting a few people and trying out the study on them. It is possible to save time, and in some cases, money, by identifying any flaws in the procedures designed by the researcher.
A pilot study can help the researcher spot any ambiguities (i.e. unusual things) or confusion in the information given to participants or problems with the task devised.
Sometimes the task is too hard, and the researcher may get a floor effect, because none of the participants can score at all or can complete the task – all performances are low.
The opposite effect is a ceiling effect, when the task is so easy that all achieve virtually full marks or top performances and are “hitting the ceiling”.
Research Design
In cross-sectional research , a researcher compares multiple segments of the population at the same time
Sometimes, we want to see how people change over time, as in studies of human development and lifespan. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time.
In cohort studies , the participants must share a common factor or characteristic such as age, demographic, or occupation. A cohort study is a type of longitudinal study in which researchers monitor and observe a chosen population over an extended period.
Triangulation means using more than one research method to improve the study’s validity.
Reliability
Reliability is a measure of consistency, if a particular measurement is repeated and the same result is obtained then it is described as being reliable.
- Test-retest reliability : assessing the same person on two different occasions which shows the extent to which the test produces the same answers.
- Inter-observer reliability : the extent to which there is an agreement between two or more observers.
Meta-Analysis
Meta-analysis is a statistical procedure used to combine and synthesize findings from multiple independent studies to estimate the average effect size for a particular research question.
Meta-analysis goes beyond traditional narrative reviews by using statistical methods to integrate the results of several studies, leading to a more objective appraisal of the evidence.
This is done by looking through various databases, and then decisions are made about what studies are to be included/excluded.
- Strengths : Increases the conclusions’ validity as they’re based on a wider range.
- Weaknesses : Research designs in studies can vary, so they are not truly comparable.
Peer Review
A researcher submits an article to a journal. The choice of the journal may be determined by the journal’s audience or prestige.
The journal selects two or more appropriate experts (psychologists working in a similar field) to peer review the article without payment. The peer reviewers assess: the methods and designs used, originality of the findings, the validity of the original research findings and its content, structure and language.
Feedback from the reviewer determines whether the article is accepted. The article may be: Accepted as it is, accepted with revisions, sent back to the author to revise and re-submit or rejected without the possibility of submission.
The editor makes the final decision whether to accept or reject the research report based on the reviewers comments/ recommendations.
Peer review is important because it prevent faulty data from entering the public domain, it provides a way of checking the validity of findings and the quality of the methodology and is used to assess the research rating of university departments.
Peer reviews may be an ideal, whereas in practice there are lots of problems. For example, it slows publication down and may prevent unusual, new work being published. Some reviewers might use it as an opportunity to prevent competing researchers from publishing work.
Some people doubt whether peer review can really prevent the publication of fraudulent research.
The advent of the internet means that a lot of research and academic comment is being published without official peer reviews than before, though systems are evolving on the internet where everyone really has a chance to offer their opinions and police the quality of research.
Types of Data
- Quantitative data is numerical data e.g. reaction time or number of mistakes. It represents how much or how long, how many there are of something. A tally of behavioral categories and closed questions in a questionnaire collect quantitative data.
- Qualitative data is virtually any type of information that can be observed and recorded that is not numerical in nature and can be in the form of written or verbal communication. Open questions in questionnaires and accounts from observational studies collect qualitative data.
- Primary data is first-hand data collected for the purpose of the investigation.
- Secondary data is information that has been collected by someone other than the person who is conducting the research e.g. taken from journals, books or articles.
Validity means how well a piece of research actually measures what it sets out to, or how well it reflects the reality it claims to represent.
Validity is whether the observed effect is genuine and represents what is actually out there in the world.
- Concurrent validity is the extent to which a psychological measure relates to an existing similar measure and obtains close results. For example, a new intelligence test compared to an established test.
- Face validity : does the test measure what it’s supposed to measure ‘on the face of it’. This is done by ‘eyeballing’ the measuring or by passing it to an expert to check.
- Ecological validit y is the extent to which findings from a research study can be generalized to other settings / real life.
- Temporal validity is the extent to which findings from a research study can be generalized to other historical times.
Features of Science
- Paradigm – A set of shared assumptions and agreed methods within a scientific discipline.
- Paradigm shift – The result of the scientific revolution: a significant change in the dominant unifying theory within a scientific discipline.
- Objectivity – When all sources of personal bias are minimised so not to distort or influence the research process.
- Empirical method – Scientific approaches that are based on the gathering of evidence through direct observation and experience.
- Replicability – The extent to which scientific procedures and findings can be repeated by other researchers.
- Falsifiability – The principle that a theory cannot be considered scientific unless it admits the possibility of being proved untrue.
Statistical Testing
A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation, or association in the variables tested.
If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis.
If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect.
In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug.
A type I error is when the null hypothesis is rejected when it should have been accepted (happens when a lenient significance level is used, an error of optimism).
A type II error is when the null hypothesis is accepted when it should have been rejected (happens when a stringent significance level is used, an error of pessimism).
Ethical Issues
- Informed consent is when participants are able to make an informed judgment about whether to take part. It causes them to guess the aims of the study and change their behavior.
- To deal with it, we can gain presumptive consent or ask them to formally indicate their agreement to participate but it may invalidate the purpose of the study and it is not guaranteed that the participants would understand.
- Deception should only be used when it is approved by an ethics committee, as it involves deliberately misleading or withholding information. Participants should be fully debriefed after the study but debriefing can’t turn the clock back.
- All participants should be informed at the beginning that they have the right to withdraw if they ever feel distressed or uncomfortable.
- It causes bias as the ones that stayed are obedient and some may not withdraw as they may have been given incentives or feel like they’re spoiling the study. Researchers can offer the right to withdraw data after participation.
- Participants should all have protection from harm . The researcher should avoid risks greater than those experienced in everyday life and they should stop the study if any harm is suspected. However, the harm may not be apparent at the time of the study.
- Confidentiality concerns the communication of personal information. The researchers should not record any names but use numbers or false names though it may not be possible as it is sometimes possible to work out who the researchers were.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
The PMC website is updating on October 15, 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- Sage Choice
The psychology of experimental psychologists: Overcoming cognitive constraints to improve research: The 47th Sir Frederic Bartlett Lecture
Like many other areas of science, experimental psychology is affected by a “replication crisis” that is causing concern in many fields of research. Approaches to tackling this crisis include better training in statistical methods, greater transparency and openness, and changes to the incentives created by funding agencies, journals, and institutions. Here, I argue that if proposed solutions are to be effective, we also need to take into account human cognitive constraints that can distort all stages of the research process, including design and execution of experiments, analysis of data, and writing up findings for publication. I focus specifically on cognitive schemata in perception and memory, confirmation bias, systematic misunderstanding of statistics, and asymmetry in moral judgements of errors of commission and omission. Finally, I consider methods that may help mitigate the effect of cognitive constraints: better training, including use of simulations to overcome statistical misunderstanding; specific programmes directed at inoculating against cognitive biases; adoption of Registered Reports to encourage more critical reflection in planning studies; and using methods such as triangulation and “pre mortem” evaluation of study design to foster a culture of dialogue and criticism.
Introduction
The past decade has been a bruising one for experimental psychology. The publication of a paper by Simmons, Nelson, and Simonsohn (2011) entitled “False-positive psychology” drew attention to problems with the way in which research was often conducted in our field, which meant that many results could not be trusted. Simmons et al. focused on “undisclosed flexibility in data collection and analysis,” which is now variously referred to as p -hacking, data dredging, noise mining, or asterisk hunting: exploring datasets with different selections of variables and different analyses to attain a p -value lower than .05 and, subsequently, reporting only the significant findings. Hard on the heels of their demonstration came a wealth of empirical evidence from the Open Science Collaboration (2015) . This showed that less than half the results reported in reputable psychological journals could be replicated in a new experiment.
The points made by Simmons et al. (2011) were not new: indeed, they were anticipated in 1830 by Charles Babbage, who described “cooking” of data:
This is an art of various forms, the object of which is to give ordinary observations the appearance and character of those of the highest degree of accuracy. One of its numerous processes is to make multitudes of observations, and out of these to select only those which agree, or very nearly agree. If a hundred observations are made, the cook must be very unhappy if he cannot pick out fifteen or twenty which will do for serving up. (p. 178–179)
P -hacking refers to biased selection of data or analyses from within an experiment. Bias also affects which studies get published in the form of publication bias—the tendency for positive results to be overrepresented in the published literature. This is problematic because it gives an impression that findings are more consistent than is the case, which means that false theories can attain a state of “canonisation,” where they are widely accepted as true ( Nissen, Magidson, Gross, & Bergstrom, 2016 ). Figure 1 illustrates this with a toy simulation of a set of studies testing a difference between means from two conditions. If we have results from a series of experiments, three of which found a statistically significant difference and three of which did not, this provides fairly strong evidence that the difference is real (panel a). However, if we add a further four experiments that were not reported because results were null, the evidence cumulates in the opposite direction. Thus, omission of null studies can drastically alter our impression of the overall support for a hypothesis.
The impact of publication bias demonstrated with plots of cumulative log odds in favour of true versus null effect over a series of experiments. The log odds for each experiment can be computed with knowledge of alpha (.05) and power (.8); 1 denotes an experiment with significant difference between means, and 0, a null result. The starting point is zero, indicating that we assume a 50:50 chance of a true effect. For each significant result, the log odds of it coming from a true effect versus a null effect is log(.8/.05) = 2.77. For a null result, the log odds is log (.2/.95) = −1.55. The selected set of studies in panel (a) concludes with a log odds greater than 3, indicating that the likelihood of a true effect is 20 times greater than a null effect. However, panel (b), which includes additional null results (labelled in grey), leads to the opposite conclusion.
Since the paper by Simmons et al. (2011) , there has been a dramatic increase in replication studies. As a result, a number of well-established phenomena in psychology have come into question. Often it is difficult to be certain whether the original reports were false positives, whether the replication was flawed, or whether the effect of interest is only evident under specific conditions—see, for example, Hobson and Bishop (2016) on mu suppression in response to observed actions; Sripada, Kesller, and Jonides (2016) on ego depletion; Lehtonen et al. (2018) on an advantage in cognitive control for bilinguals; O’Donnell et al. (2018) on the professor-priming effect; and Oostenbroek et al. (2016) on neonatal imitation. What is clear is that the size, robustness, and generalisability of many classic effects are lower than previously thought.
Selective reporting, through p -hacking and publication bias, is not the only blight on our science. A related problem is many editors place emphasis on reporting results in a way that “tells a good story,” even if that means retrofitting our hypothesis to the data, i.e., HARKing or “hypothesising after the results are known” ( Kerr, 1998 ). Oberauer and Lewandowsky (2019) drew parallels between HARKing and p -hacking: in HARKing, there is post hoc selection of hypotheses, rather than selection of results or an analytic method. They proposed that HARKing is most widely used in fields where theories are so underspecified that they can accommodate many hypotheses and where there is a lack of “disconfirmatory diagnosticity,” i.e., failure to support a prediction is uninformative.
A lack of statistical power is a further problem for psychology—one that has been recognised since 1969 , when Jacob Cohen exhorted psychologists not to waste time and effort doing experiments that had too few observations to show an effect of interest. In other fields, notably clinical trials and genetics, after a period where non-replicable results proliferated, underpowered studies died out quite rapidly when journals adopted stringent criteria for publication (e.g., Johnston, Lahey, & Matthys, 2013 ), and funders began to require power analysis in grant proposals. Psychology, however, has been slow to catch up.
It is not just experimental psychology that has these problems—studies attempting to link psychological traits and disorders to genetic and/or neurobiological variables are, if anything, subject to greater challenges. A striking example comes from a meta-analysis of links between the serotonin transporter gene, 5-HTTPLR, and depression. This postulated association has attracted huge research interest over the past 20 years, and the meta-analysis included 450 studies. Contrary to expectation, it concluded that there was no evidence of association. In a blog post summarising findings, Alexander (2019) wrote,
. . . what bothers me isn’t just that people said 5-HTTLPR mattered and it didn’t. It’s that we built whole imaginary edifices, whole castles in the air on top of this idea of 5-HTTLPR mattering. We “figured out” how 5-HTTLPR exerted its effects, what parts of the brain it was active in, what sorts of things it interacted with, how its effects were enhanced or suppressed by the effects of other imaginary depression genes. This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.
It is no exaggeration to say that our field is at a crossroads ( Pashler & Wagenmakers, 2012 ), and the 5-HTTLPR story is just a warning sign that practices that lead to bad science are widespread. If we continue to take the well-trodden path, using traditional methods for cooking data and asterisk hunting, we are in danger of losing attention, respect, and funding.
Much has been written about how we might tackle the so-called “replication crisis.” There have been four lines of attack. First, there have been calls for greater openness and transparency ( Nosek et al., 2015 ). Second, a case has been made for better training in methods (e.g., Rousselet, Pernet, & Wilcox, 2017 ). Third, it has been argued we need to change the way research has been conducted to incorporate pre-registration of research protocols, preferably in the format of Registered Reports, which are peer-reviewed prior to data collection ( Chambers, 2019 ). Fourth, it is recognised that for too long, the incentive structure of research has prioritised innovative, groundbreaking results over methodological quality. Indeed, Smaldino and McElreath (2016) suggested that one can model the success of scientists in a field as an evolutionary process, where prestigious publications lead to survival, leaving those whose work is less exciting to wither away and leave science. The common thread to these efforts is that they locate the mechanisms of bad science at the systemic level, in ways in which cultures and institutions reinforce norms and distribute resources. The solutions are, therefore, aimed at correcting these shortcomings by creating systems that make good behaviour easier and more rewarding and make poor behaviour more costly.
My view, however, is that institutional shortcomings are only part of the story: to improve scientific research, we also need to understand the mechanisms that maintain bad practices in individual humans. Bad science is usually done because somebody mistook it for good science. Understanding why individual scientists mistake bad science for good, and helping them to resist these errors, is a necessary component of the movement to improve psychology. I will argue that we need to understand how cognitive constraints lead to faulty reasoning if we are to get science back on course and persuade those who set the incentives to reform. Fortunately, as psychologists, we are uniquely well positioned to tackle this issue.
Experimental psychology has a rich tradition of studying human reasoning and decision-making, documenting the flaws and foibles that lead us to selectively process some types of information, make judgements on the basis of incomplete evidence, and sometimes behave in ways that seem frankly irrational. This line of work has had significant application to economics, politics, business studies, and law, but, with some notable exceptions (e.g., Hossenfelder, 2018 ; Mahoney, 1976 ), it has seldom been considered when studying the behaviour of research scientists. In what follows, I consider how our knowledge of human cognition can make sense of problematic scientific practices, and I propose ways we might use this information to find solutions.
Cognitive constraints that affect how psychological science is done
Table 1 lists four characteristics of human cognition that I focus on: I refer to these as “constraints” because they limit how we process, understand, or remember information, but it is important to note that they include some biases that can be beneficial in many contexts. The first constraint is confirmation bias. As Hahn and Harris (2014) noted, a range of definitions of “confirmation bias” exist—here, I will define it as the tendency to seek out evidence that supports our position. A further set of constraints has to do with understanding of probability. A lack of an intuitive grasp of probability contributes to both neglect of statistical power in study design and p -hacking in data analysis. Third, there is an asymmetry in moral reasoning that can lead us to treat errors of omission as less culpable than errors of commission, even when their consequences are equally serious ( Haidt & Baron, 1996 ). The final constraint featured in Bartlett’s (1932) work: reliance on cognitive schemata to fill in unstated information, leading to “reconstructive remembering,” which imbues memories with meaning while filtering out details that do not fit preconceptions.
Different types of cognitive constraints.
Cognitive constraint | Description |
---|---|
Confirmation bias | Tendency to seek out and remember evidence that supports a preferred viewpoint |
Misunderstanding of probability | (a) Failure to understand how estimation scales with sample size |
(b) Failure to understand that probability depends on context | |
Asymmetric moral reasoning | Errors of omission judged less seriously than errors of commission |
Reliance on schemata | Perceiving and/or remembering in line with pre-existing knowledge, leading to omission or distortion of irrelevant information |
In what follows, I illustrate how these constraints assume particular importance at different stages of the research process, as shown in Table 2 .
Cognitive constraints that operate at different stages of the research process.
Stage of research | Cognitive constraint |
---|---|
Experimental design | Confirmation bias: looking for evidence consistent with theory |
Statistical misunderstanding: power | |
Data analysis | Statistical misunderstanding: -hacking |
Moral asymmetry: omission and “paltering” deemed acceptable | |
Scientific reporting | Confirmation bias in reviewing literature |
Moral asymmetry: omission and “paltering” deemed acceptable | |
Cognitive schemata: need for narrative, HARKing |
HARKing: hypothesising after the results are known.
Bias in experimental design
Confirmation bias and the failure to consider alternative explanations.
Scientific discovery involves several phases: the researcher needs to (a) assemble evidence, (b) look for meaningful patterns and regularities in the data, (c) formulate a hypothesis, and (d) test it empirically by gathering informative new data. Steps (a)–(c) may be designated as exploratory and step (d) as hypothesis testing or confirmatory ( Wagenmakers, Wetzels, Borsboom, van der Mass, & Kievit, 2012 ). Importantly, the same experiment cannot be used to both formulate and confirm a hypothesis. In practice, however, the distinction between the two types of experiment is often blurred.
Our ability to see patterns in data is vital at the exploratory stage of research: indeed, seeing something that nobody else has observed is a pinnacle of scientific achievement. Nevertheless, new ideas are often slow to be accepted, precisely because they do not fit the views of the time. One such example is described by Zilles and Amunts (2010) : Brodmann’s cytoarchitectonic map of the brain, described in 1909. This has stood the test of time and is still used over 100 years later, but for several decades, it was questioned by those who could not see the fine distinctions made by Brodmann. Indeed, criticisms of poor reproducibility and lack of objectivity were levelled against him.
Brodmann’s case illustrates that we need to be cautious about dismissing findings that depend on special expertise or unique insight of the observer. However, there are plenty of other instances in the history of science where invalid ideas persisted, especially if proposed by an influential or charismatic figure. Entire edifices of pseudoscience have endured because we are very bad at discarding theories that do not work; as Bartlett (1932) would predict, new information that is consistent with the theory will strengthen its representation in our minds, but inconsistent information will be ignored. Examples from the history of science include the rete mirabile , a mass of intertwined arteries that is found in sheep but wrongly included in anatomical drawings of humans for over 1,000 years because of the significance attributed to this structure by Galen ( Bataille et al., 2007 ); the planet Vulcan, predicted by Newton’s laws and seen by many astronomers until its existence was disproved by Einstein’s discoveries ( Levenson, 2015 ); and N-rays, non-existent rays seen by at least 40 people and analysed in 3,090 papers by 100 scientists between 1903 and 1906 ( Nye, 1980 ).
Popper’s (1934/ 1959 ) goal was to find ways to distinguish science from pseudoscience, and his contribution to philosophy of science was to emphasise that we should be bold in developing ideas but ruthless in attempts to falsify them. In an early attempt to test scientists’ grasp of Popperian logic, Mahoney (1976) administered a classic task developed by Wason (1960) to 84 scientists (physicists, biologists, psychologists, and sociologists). In this deceptively simple task, people are shown four cards and told that each card has a number on one side and a patch of colour on the other side. The cards are placed to show number 3, number 8, red, and blue, respectively (see Figure 2 ). The task is to identify which cards need to be turned over to test the hypothesis that if an even number appears on one side, then the opposite side is red. The subject can pick any number of cards. The correct response is to name the two cards that could disconfirm the hypothesis—the number 8 and the blue card. Fewer than 10% of the scientists tested by Mahoney identified both critical cards, more often selecting the number 8 and the red card.
Wason’s (1960) task: The subject is told, “Each card has a number on one side and a patch of colour on the other. You are asked to test the hypothesis that—for these 4 cards—if an even number appears on one side, then the opposite side is red. Which card(s) would you turn over to test the hypothesis?”
Although this study was taken as evidence of unscientific reasoning by scientists, that conclusion has since been challenged by those who have criticised both Popperian logic, in general, and the Wason selection task, in particular, as providing an unrealistic test of human rationality. For a start, the Wason task uses a deterministic hypothesis that can be disproved by a single piece of evidence. This is not a realistic model of biological or behavioural sciences, where we seldom deal with deterministic phenomena. Consider the claim that smoking causes lung cancer. Most of us accept that this is so, even though we know there are people who smoke and who do not get lung cancer and people who get lung cancer but never smoked. When dealing with probabilistic phenomena, a Bayesian approach makes more sense, whereby we consider the accumulated evidence to determine the relative likelihood of one hypothesis over another (as illustrated in Figure 1 ). Theories are judged as more or less probable, rather than true or false. Oaksford and Chater (1994) showed that, from a Bayesian perspective, typical selections made on the Wason task would be rational in contexts where the antecedent and consequent of the hypothesis (an even number and red colour) were both rare. Subsequently, Perfors and Navarro (2009) concluded that in situations where rules are relevant only for a minority of entities, then confirmation bias is an efficient strategy.
This kind of analysis has shifted the focus to discussions about how far, and under what circumstances, people are rational decision-makers. However, it misses a key point about scientific reasoning, which is that it involves an active process of deciding which evidence to gather, rather than merely a passive evaluation of existing evidence. It seems reasonable to conclude that, when presented with a particular set of evidence, people generally make decisions that are rational when evaluated against Bayesian standards. However, history suggests that we are less good at identifying which new evidence needs to be gathered to evaluate a theory. In particular, people appear to have a tendency to accept a hypothesis on the basis of “good enough” evidence, rather than actively seeking evidence for alternative explanations. Indeed, an early study by Doherty, Mynatt, Tweney, and Schiavo (1979) found that, when given an opportunity to select evidence to help decide which of two hypotheses was true (in a task where a fictitious pot had to be assigned as originating from one of the two islands that differed in characteristic features), people seemed unable to identify which information would be diagnostic and tended, instead, to select information that could neither confirm nor disconfirm their hypothesis.
Perhaps the strongest evidence for our poor ability to consider alternative explanations comes from the history of the development of clinical trials. Although James Lind is credited with doing the first trials for treatment of scurvy in 1747, it was only in 1948 that the randomised controlled trial became the gold standard for evaluating medical interventions ( Vallier & Timmerman, 2008 ). The need for controls is not obvious, and people who are not trained in this methodology will often judge whether a treatment is effective on the basis of a comparison on an outcome measure between a pre-treatment baseline and a post-treatment evaluation. The logic is that if a group of patients given the treatment does not improve, the treatment did not work. If they do show meaningful gains, then it did work. And we can even embellish this comparison with a test of statistical significance. This reasoning can be seen as entirely rational, and this can explain why so many people are willing to accept that alternative medicine is effective.
The problem with this approach is that the pre–post intervention comparison allows important confounds to creep in. For instance, early years practitioners argue that we should identify language problems in toddlers so that we can intervene early. They find that if 18-month-old late talkers are given intervention, only a minority still have language problems at 2 years and, therefore, conclude the intervention was effective. However, if an untreated control group is studied over the same period, we find very similar rates of improvement ( Wake et al., 2011 )—presumably due to factors such a spontaneous resolution of problems or regression to the mean, which will lead to systematic bias in outcomes. Researchers need training to recognise causes of bias and to take steps to overcome them: thinking about possible alternative explanations of an observed phenomenon does not come naturally, especially when the preliminary evidence looks strong.
Intervention studies provide the clearest evidence of what I term “premature entrenchment” of a theory: some other examples are summarised in Table 3 . Note that these examples do not involve poor replicability, quite the opposite. They are all cases where an effect, typically an association between variables, is reliably observed, and researchers then converge on accepting the most obvious causal explanation, without considering lines of evidence that might point to alternative possibilities.
Premature entrenchment: examples where the most obvious explanation for an observed association is accepted for many years, without considering alternative explanations that could be tested with different evidence.
Observation | Favoured explanation | Alternative explanation | Evidence for alternative explanation |
---|---|---|---|
Home literacy environment predicts reading outcomes in children | Access to books at home affects children’s learning to read ( ) | Parents and children share genetic risk for reading problems | Children who are poor readers tend to have parents who are poor readers ( ) |
Speech sounds (phonemes) do not have consistent auditory correlates but can be identified by knowledge of articulatory configurations used to produce them | Motor theory of speech perception: we learn to recognise speech by mapping input to articulatory gestures ( ) | Correlations between perception and production reflect co-occurrence rather than causation | Children who are congenitally unable to speak can develop good speech perception, despite having no articulatory experience ( ) |
Dyslexics have atypical brain responses to speech when assessed using fMRI | Atypical brain organisation provides evidence that dyslexia is a “real disorder” with a neurobiological basis ( ) | Atypical responses to speech in the brain are a consequence of being a poor reader | Adults who had never been taught to read have atypical brain organisation for spoken language ( ) |
fMRI: functional magnetic resonance imaging.
Premature entrenchment may be regarded as evidence that humans adopt Bayesian reasoning: we form a prior belief about what is the case and then require considerably more evidence to overturn that belief than to support it. This would explain why, when presented with virtually identical studies that either provided support for or evidence against astrology, psychologists were more critical of the latter ( Goodstein & Brazis, 1970 ). The authors of that study expressed concern about the “double standard” shown by biased psychologists who made unusually harsh demands of research in borderline areas, but from a Bayesian perspective, it is reasonable to use prior knowledge so that extraordinary claims require extraordinary evidence. Bayesian reasoning is useful in many situations: it allows us to act decisively on the basis of our long-term experience, rather than being swayed by each new incoming piece of data. However, it can be disastrous if we converge on a solution too readily on the basis of incomplete or inaccurate information. This will be exacerbated by publication bias, which distorts the evidential landscape.
For many years, the only methods available to counteract the tendency for premature entrenchment were exhortations to be self-critical (e.g., Feynman, 1974 ) and peer review. The problem with peer review is that it typically comes too late to be useful, after research is completed. In the final section of this article, I will consider some alternative approaches that bring in external appraisal of experimental designs at an earlier stage in the research process.
Misunderstanding of probability leading to underpowered studies
Some 17 years after Cohen’s seminal work on statistical power, Newcombe (1987) wrote,
Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant. (p. 657)
In clinical medicine, things have changed, and the importance of adequate statistical power is widely recognised among those conducting clinical trials. But in psychology, the “blind hope” has persisted, and we have to ask ourselves why this is.
My evidence here is anecdotal, but the impression is that many psychologists simply do not believe advice about statistical power, perhaps because there are so many underpowered studies published in the literature. When a statistician is consulted about sample size for a study, he or she will ask the researcher to estimate the anticipated effect size. This usually leads to a sample size estimate that is far higher than the researcher anticipated or finds feasible, leading to a series of responses not unlike the first four of the five stages of grief: denial, anger, bargaining, and depression. The final stage, acceptance, may, however, not be reached.
Of course, there are situations where small sample sizes are perfectly adequate: the key issue is how large the effect of interest is in relation to the variance. In some fields, such as psychophysics, you may not even need statistics—the famous “interocular trauma” test (referring to a result so obvious and clear-cut that it hits you between the eyes) may suffice. Indeed, in such cases, recruitment of a large sample would just be wasteful.
There are, however, numerous instances in psychology where people have habitually used sample sizes that are too small to reliably detect an effect of interest: see, for instance, the analysis by Poldrack et al. (2017) of well-known effects in functional magnetic resonance imaging (fMRI) or Oakes (2017) on looking-time experiments in infants. Quite often, a line of research is started when a large effect is seen in a small sample, but over time, it becomes clear that this is a case of “winner’s curse,” a false positive that is published precisely because it looks impressive but then fails to replicate when much larger sample sizes are used. There are some recent examples from studies looking at neurobiological or genetic correlates of individual differences, where large-scale studies have failed to support previously published associations that had appeared to be solid (e.g., De Kovel & Francks, 2019 , on genetics of handedness; Traut et al., 2018 , on cerebellar volume in autism; Uddén et al., 2019 , on genetic correlates of fMRI language-based activation).
A clue to the persistence of underpowered psychology studies comes from early work by Tversky and Kahneman (1971 , 1974 ). They studied a phenomenon that they termed “belief in the law of small numbers,” an exaggerated confidence in the validity of conclusions based on small samples, and showed that even those with science training tended to have strong intuitions about random sampling that were simply wrong. They illustrated this with the following problem:
A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days? 1. The large hospital 2. The small hospital 3. About the same (that is, within 5% of each other)
Most people selected Option 3, whereas, as illustrated in Figure 3 , Option 2 is the correct answer—with only 15 births per day, the day-to-day variation in the proportion of boys will be much higher than with 45 births per day, and hence, more days will have more than 60% boys. One reason why our intuitions deceive us is because the sample size does not affect the average percentage of male births in the long run: this will be 50%, regardless of the hospital size. But sample size has a dramatic impact on the variability in the proportion of male births from day to day. More formally, if you have a big and small sample drawn from the same population, the expected estimate of the mean will be the same, but the standard error of that estimate will be greater for the small sample.
Simulated data showing proportions of males born in a small hospital with 15 births per day versus a large hospital with 45 births per day. The small hospital has more days where more than 60% of births are boys (points above red line).
Statistical power depends on the effect size, which, for a simple comparison of two means, can be computed as the difference in means divided by the pooled standard deviation. It follows that power is crucially dependent on the proportion of variance in observations that is associated with an effect of interest, relative to background noise. Where variance is high, it is much harder to detect the effect, and hence, small samples are often underpowered. Increasing the sample size is not the only way to improve power: other options include improving the precision of measurement, using more effective manipulations, or adopting statistical approaches to control noise ( Lazic, 2018 ). But in many situations, increasing the sample size is the preferred approach to enhance statistical power to detect an effect.
Bias in data analysis: p -hacking
P -hacking can take various forms, but they all involve a process of selective analysis. Suppose some researchers hypothesise that there is an association between executive function and implicit learning in a serial reaction time task, and they test this in a study using four measures of executive function. Even if there is only one established way of scoring each task, they have four correlations; this means that the probability that none of the correlations is significant at the .05 level is .95 4 —i.e., .815—and conversely, the probability that at least one is significant is .185. This probability can be massaged to even higher levels if the experimenters look at the data and then select an analytic approach that maximises the association: maybe by dropping outliers, by creating a new scoring method, combining measures in composites, and so on. Alternatively, the experimenters may notice that the strength of the correlation varies with the age or sex of participants and so subdivide the sample to coax at least a subset of data into significance. The key thing about p -hacking is that at the end of the process, the researchers selectively report the result that “worked,” with the implication that the p -value can be interpreted at face value. But it cannot: probability is meaningless if not defined in terms of a particular analytic context. P -hacking appears to be common in psychology ( John, Loewenstein, & Prelec, 2012 ). I argue here that this is because it arises from a conjunction of two cognitive constraints: failure to understand probability, coupled with a view that omission of information when reporting results is not a serious misdemeanour.
Failure to understand probability
In an influential career guide, published by the American Psychological Association, Bem (2004) explicitly recommended going against the “conventional view” of the research process, as this might lead us to miss exciting new findings. Instead readers were encouraged to
become intimately familiar with . . . the data. Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything—interesting. (p. 2)
For those who were concerned this might be inappropriate, Bem offered reassurance. Everything is fine because what you are doing is exploring your data. Indeed, he implied that anyone who follows the “conventional view” would be destined to do boring research that nobody will want to publish.
Of course, Bem (2004) was correct to say that we need exploratory research. The problem comes when exploratory research is repackaged as if it were hypothesis testing, with the hypothesis invented after observing the data (HARKing), and the paper embellished with p -values that are bound to be misleading because they were p -hacked from numerous possible values, rather than derived from testing an a priori hypothesis. If results from exploratory studies were routinely replicated, prior to publication, we would not have a problem, but they are not. So why did the American Psychological Association think it appropriate to publish Bem’s views as advice to young researchers? We can find some clues in the book overview, which explains that there is a distinction between the “formal” rules that students are taught and the “implicit” rules that are applied in everyday life, concluding that “This book provides invaluable guidance that will help new academics plan, play, and ultimately win the academic career game.” Note that the stated goal is not to do excellent research: it is to have “a lasting and vibrant career.” It seems, then, that there is recognition here that if you do things in the “conventional” way, your career will suffer. It is clear from Bem’s framing of his argument that he was aware that his advice was not “conventional,” but he did not think it was unethical—indeed, he implied it would be unfair on young researchers to do things conventionally as that will prevent them making exciting discoveries that will enable them to get published and rise up the academic hierarchy. While it is tempting to lament the corruption of a system that treats an academic career as a game, it is more important to consider why so many people genuinely believe that p -hacking is a valid, and indeed creative, approach to doing research.
The use of null-hypothesis significance testing has attracted a lot of criticism, with repeated suggestions over the years that p -values be banned. I favour the more nuanced view expressed by Lakens (2019) , who suggests that p -values have a place in science, provided they are correctly understood and used to address specific questions. There is no doubt, however, that many people do misunderstand the p -value. There are many varieties of misunderstanding, but perhaps the most common is to interpret the p -value as a measure of strength of evidence that can be attached to a given result, regardless of the context. It is easy to see how this misunderstanding arises: if we hold the sample size constant, then for a single comparison, there will be a linear relationship between the p -value and the effect size. However, whereas an effect size remains the same, regardless of the analytic context, a p -value is crucially context-dependent.
Suppose in the fictitious study of executive function described above, the researchers have 20 participants and four measures of executive function (A–D) that correlate with implicit learning with r values of .21, .47, .07, and −.01. The statistics package tells us that the corresponding two-tailed p -values are .374, .037, .769, and .966. A naive researcher may rejoice at having achieved significance with the second correlation. However, as noted above, the probability that at least one correlation of the four will have an associated p -value of less than .05 is 18%, not 5%. If we want to identify correlations that are unlikely under the null hypothesis, then we need to correct the alpha level (e.g., by doing a Bonferroni correction to adjust by the number of tests, i.e., .05/4 = .0125). At this point, the researchers see their significant result snatched from their grasp. This creates a strong temptation to just drop the three non-significant tests and not report them. Alternatively, one sometimes sees papers that report the original p -value but then state that it “did not survive” Bonferroni correction, but they, nevertheless, exhume it and interpret the uncorrected value. Researchers acting this way may not think that they are doing anything inappropriate, other than going against advice of pedantic statisticians, especially given Bem’s (2004) advice to follow the “implicit” rather than “formal” rules of research. However, this is simply wrong: as illustrated above, a p -value can only be interpreted in relation to the context in which it is computed.
One way of explaining the notion of p -hacking is to use the old-fashioned method of games of chance. I find this scenario helpful: we have a magician who claims he can use supernatural powers to deal a poker hand of “three of a kind” from an unbiased deck of cards. This type of hand will occur in around 1 of 50 draws from an unbiased deck. He points you to a man who, to his amazement, finds that his hand contains three of a kind. However, you then discover he actually tried his stunt with 50 people, and this man was the only one who got three of a kind. You are rightly disgruntled. This is analogous to p -hacking. The three-of-a-kind hand is real enough, but its unusualness, and hence its value as evidence of the supernatural, depends on the context of how many tests were done. The probability that needs to be computed here is not the probability of one specific result but rather the probability that specific result would come up at least once in 50 trials.
Asymmetry of sins of omission and commission
According to Greenwald (1975) “[I]t is a truly gross ethical violation for a researcher to suppress reporting of difficult-to-explain or embarrassing data to present a neat and attractive package to a journal editor” (p. 19).
However, this view is not universal.
Greenwald’s focus was on publication bias, i.e., failure to report an entire study, but the point he made about “prejudice” against null results also applies to cases of p -hacking where only “significant” results are reported, whereas others go unmentioned. It is easy to see why scientists might play down the inappropriateness of p -hacking, when it is so important to generate “significant” findings in a world with a strong prejudice against null results. But I suspect another reason why people tend to underrate the seriousness of p -hacking is because it involves an error of omission (failing to report the full context of a p -value), rather than an error of commission (making up data).
In studies of morality judgement, errors of omission are generally regarded as less culpable than errors of commission (see, e.g., Haidt & Baron, 1996 ). Furthermore, p -hacking may be seen to involve a particularly subtle kind of dishonesty because the statistics and their associated p -values are provided by the output of statistics software. They are mathematically correct when testing a specific, prespecified hypothesis: the problem is that, without the appropriate context, they imply stronger evidence than is justified. This is akin to what Rogers, Zeckhauser, Gino, Norton, and Schweitzer (2017) have termed “paltering,” i.e., the use of truthful statements to mislead, a topic they studied in the context of negotiations. An example was given of a person trying to sell a car that had twice needed a mechanic to fix it. Suppose the potential purchaser directly asks “Has the car ever had problems?” An error of commission is to deny the problems, but a paltering answer would be “This car drives very smoothly and is very responsive. Just last week it started up with no problems when the temperature was −5 degrees Fahrenheit.” Rogers et al. showed that negotiators were more willing to palter than to lie, although potential purchasers regarded paltering as only marginally less immoral than lying.
Regardless of the habitual behaviour of researchers, the general public does not find p -hacking acceptable. Pickett and Roche (2018) did an M-Turk experiment in which a community sample was asked to judge the morality of various scenarios, including this one:
A medical researcher is writing an article testing a new drug for high blood pressure. When she analyzes the data with either method A or B, the drug has zero effect on blood pressure, but when she uses method C, the drug seems to reduce blood pressure. She only reports the results of method C, which are the results that she wants to see.
Seventy-one percent of respondents thought this behaviour was immoral, 73% thought the researcher should receive a funding ban, and 63% thought the researcher should be fired.
Nevertheless, although selective reporting was generally deemed immoral, data fabrication was judged more harshly, confirming that in this context, as in those studied by Haidt and Baron (1996) , sins of commission are taken more seriously than errors of omission.
If we look at the consequences of a specific act of p -hacking, it can potentially be more serious than an act of data fabrication: this is most obvious in medical contexts, where suppression of trial results, either by omitting findings from within a study or by failing to publish studies with null results, can provide a badly distorted basis for clinical decision-making. In their simulations of evidence cumulation, Nissen et al. (2016) showed how p -hacking could compound the impact of publication bias and accelerate the premature “canonization” of theories; the alpha level that researchers assume applies to experimental results is distorted by p -hacking, and the expected rate of false positives is actually much higher. Furthermore, p -hacking is virtually undetectable because the data that are presented are real, but the necessary context for their interpretation is missing. This makes it harder to correct the scientific record.
Bias in writing up a study
Most writing on the “replication crisis” focuses on aspects of experimental design and observations, data analysis, and scientific reporting. The resumé of literature that is found in the introduction to empirical papers, as well as in literature review articles, is given less scrutiny. I will make the case that biased literature reviews are universal and have a major role in sustaining poor reproducibility because they lead to entrenchment of false theories, which are then used as the basis for further research.
It is common to see biased literature reviews that put a disproportionate focus on findings that are consistent with the author’s position. Researchers who know an area well may be aware of this, especially if their own work is omitted, but in general, cherry-picking of evidence is hard to detect. I will use a specific paper published in 2013 to illustrate my point, but I will not name the authors, as it would be invidious to single them out when the kinds of bias in their literature review are ubiquitous. In their paper, my attention was drawn to the following statement in the introduction:
Regardless of etiology, cerebellar neuropathology commonly occurs in autistic individuals. Cerebellar hypoplasia and reduced cerebellar Purkinje cell numbers are the most consistent neuropathologies linked to autism. … MRI studies report that autistic children have smaller cerebellar vermal volume in comparison to typically developing children.
I was surprised to read this because a few years ago, I had attended a meeting on neuroanatomical studies of autism and had come away with the impression that there were few consistent findings. I did a quick search for an up-to-date review, which turned up a meta-analysis ( Traut et al., 2018 ), that included 16 MRI studies published between 1997 and 2010, five of which reported larger cerebellar size in autism and one of which found smaller cerebellar size. In the article I was reading, one paper had been cited to support the MRI statement, but it referred to a study where the absolute size of the vermis did not differ from typically developing children but was relatively small in the autistic participants, after the overall (larger) size of the cerebellum had been controlled for.
Other papers cited to support the claims of cerebellar neuropathology included a couple of early post mortem neuroanatomical studies, as well as two reviews. The first of these ( DiCicco-Bloom et al., 2006 ) summarised presentations from a conference and supported the claims made by the authors. The other one, however ( Palmen, van Engeland, Hof, & Schmitz, 2004 ), expressed more uncertainty and noted a lack of correspondence between early neuroanatomical studies and subsequent MRI findings, concluding,
Although some consistent results emerge, the majority of the neuropathological data remain equivocal. This may be due to lack of statistical power, resulting from small sample sizes and from the heterogeneity of the disorder itself, to the inability to control for potential confounding variables such as gender, mental retardation, epilepsy and medication status, and, importantly, to the lack of consistent design in histopathological quantitative studies of autism published to date.
In sum, a confident statement “cerebellar neuropathology commonly occurs in autistic individuals,” accompanied by a set of references, converged to give the impression that there is consensus that the cerebellum is involved in autism. However, when we drill down, we find that the evidence is uncertain, with discrepancies between neuropathological studies and MRI and methodological concerns about the former. Meanwhile, this study forms part of a large body of research in which genetically modified mice with cerebellar dysfunction are used as an animal model of autism. My impression is that few of the researchers using these mouse models appreciate that the claim of cerebellar abnormality in autism is controversial among those working with humans because each paper builds on the prior literature. There is entrenchment of error, for two reasons. First, many researchers will take at face value the summary of previous work in a peer-reviewed paper, without going back to original cited sources. Second, even if a researcher is careful and scholarly and does read the cited work, they are unlikely to find relevant studies that were not included in the literature review.
It is easy to take an example like this and bemoan the lack of rigour in scientific writing, but this is to discount cognitive biases that make it inevitable that, unless we adopt specific safeguards against this, cherry-picking of evidence will be the norm. Three biases lead us in this direction: confirmation bias, moral asymmetry, and reliance on schemata.
Confirmation bias: cherry-picking prior literature
A personal example may serve to illustrate the way confirmation bias can operate subconsciously. I am interested in genetic effects on children’s language problems, and I was in the habit of citing three relevant twin studies when I gave talks on this topic. All these obtained similar results, namely that there was a strong genetic component to developmental language disorders, as evidenced by much higher concordance for disorder in pairs of monozygotic versus dizygotic twins. In 2005 , however, Hayiou-Thomas, Oliver, and Plomin published a twin study with very different findings, with low twin/co-twin concordance, regardless of zygosity. It was only when I came to write a review of this area and I checked the literature that I realised I had failed to mention the 2005 study in talks for a year or two, even though I had collaborated with the authors and was well aware of the findings. I had formed a clear view on heritability of language disorders, and so I had difficulty remembering results that did not agree. Subsequently, I realised we should try to understand why this study obtained different results and found a plausible explanation ( Bishop & Hayiou-Thomas, 2008 ). But I only went back for a further critical look at the study because I needed to make sense of the conflicting results. It is inevitable that we behave this way as we try to find generalisable results from a body of work, but it creates an asymmetry of attention and focus between work that we readily accept, because it fits, and work that is either forgotten or looked at more critically, because it does not.
A particularly rich analysis of citation bias comes from a case study by Greenberg (2009) , who took as his starting point papers concerned with claims that a protein, β amyloid, was involved in causing a specific form of muscle disease. Greenberg classified papers according to whether they were positive, negative, or neutral about this claim and carried out a network analysis to identify influential papers (those with many citations). He found that papers that were critical of the claim received far fewer citations than those that supported it, and this was not explained by lower quality. Animal model studies were almost exclusively justified by selective citation of positive studies. Consistent with the idea of “reconstructive remembering,” he also found instances where cited content was distorted, as well as cases where influential review papers amplified citation bias by focusing attention only on positive work. The net result was an information (perhaps better termed a disinformation) cascade that would lead to a lack of awareness of critical data, which never gets recognised. In effect, when we have agents that adopt Bayesian reasoning, if they are presented with distorted information, this creates a positive feedback loop that leads to increasing bias in the prior. Viewed this way, we can start to see how omission of relevant citations is not a minor peccadillo but a serious contributor to entrenchment of error. Further evidence of the cumulative impact of citation bias is shown in Figure 4 , which uses studies of intervention for depression. Because studies in this area are registered, it is possible to track the fate of unpublished as well as published studies. The researchers showed that studies with null results are far less likely to be published than those with positive findings, but even if the former are published, there is a bias against citing them.
The cumulative impact of reporting and citation biases on the evidence base for antidepressants. (a) Displays the initial, complete cohort of trials that were recorded in a registry, while (b) through (e) show the cumulative effect of biases. Each circle indicates a trial, while the colour indicates whether results were positive or negative or were reported to give a misleadingly positive impression(spin). Circles connected by a grey line indicate trials from the same publication. The progression from (a) to (b) shows that nearly all the positive trials but only half of those with null results were published, and reporting of null studies showed (c) bias or (d) spin in what was reported. In (e), the size of the circle indicates the (relative) number of citations received by that category of studies.
Source. Reprinted with permission from De Vries et al. (2018) .
While describing such cases of citation bias, it is worth pausing to consider one of the best-known examples of distorted thinking: experimenter bias. This is similar to confirmation bias, but rather than involving selective attention to specific aspects of a situation that fits with our preconceptions, it has a more active character, whereby the experimenter can unwittingly influence the outcome of a study. The best-known research on this topic was the original Rosenthal and Fode (1963) study, where students were informed that the rats they were studying were “maze-bright” or “maze-dull,” when in fact they did not differ. Nevertheless, the “maze-bright” group learned better, suggesting that the experimenter would try harder to train an animal thought to have potential. A related study by Rosenthal and Jacobson (1963) claimed that if teachers were told that a test had revealed that specific pupils were “ready to bloom,” they would do better on an IQ test administered at the end of the year, even though the children so designated were selected at random.
Both these studies are widely cited. It is less well known that work on experimenter bias was subjected to a scathing critique by Barber and Silver (1968) , entitled “Fact, fiction and the experimenter bias effect,” in which it was noted that work in this area suffered from poor methodological quality, in particular p -hacking. Barber and Silver did not deny that experimenter bias could affect results, but they concluded that these effects were far less common and smaller in magnitude than those implied by Rosenthal’s early work. Subsequently, Barber (1976) developed this critique further in his book Pitfalls in Human Research. Yet Rosenthal’s work is more highly cited and better remembered than that of Barber.
Rosenthal’s work provides a cautionary tale: although confirmation bias helps explain distorted patterns of citation, the evidence for maladaptive cognitive biases has been exaggerated. Furthermore, studies on confirmation bias often use artificial experiments, divorced from real life, and the criteria for deciding that reasoning is erroneous are often poorly justified ( Hahn & Harris, 2014 ). In future, it would be worthwhile doing more naturalistic explorations of people’s memory for studies that do and do not support a position when summarising scientific literature.
On a related point, in using confirmation bias as an explanation for persistence of weak theories, there is a danger that I am falling into exactly the trap that I am describing. For instance, I was delighted to find Greenberg’s (2009) paper, as it chimed very well with my experiences when reading papers about cerebellar deficits in autism. But would I have described and cited it here if it had shown no difference between citations for papers that did and did not support the β amyloid claim? Almost certainly not. Am I going to read all literature on citation bias to find out how common it is? That strategy would soon become impossible if I tried to do it for every idea I touch upon in this article.
Moral asymmetry between errors of omission and commission
The second bias that fortifies the distortions in a literature review is the asymmetry of moral judgement that I referred to when discussing p -hacking. To my knowledge, paltering has not been studied in the context of literature reviews, but my impression is that selective presentation of results that fit, while failing to mention important contextual factors (e.g., the vermis in those with autism is smaller but only when you have covaried for the total cerebellar size), is common. How far this is deliberate or due to reconstructive remembering, however, is impossible to establish.
It would also be of interest to conduct studies on people’s attitudes to the acceptability of cherry-picking of literature versus paltering (misleadingly selective reporting) or invention of a study. I would anticipate that most would regard cherry-picking as fairly innocuous, for several reasons: first, it could be an unintended omission; second, the consequences of omitting material from a review may be seen as less severe than introducing misinformation; and third, selective citation of papers that fit a narrative can have a positive benefit in terms of readability. There are also pragmatic concerns: some journals limit the word count for an introduction or reference list so that full citation of all relevant work is not possible and, finally, sanctioning people for harmful omissions would create apparently unlimited obligations ( Haidt & Baron, 1996 ). Quite simply, there is far too much literature for even the most diligent scholar to read.
Nevertheless, consequences of omission can be severe. The above examples of research on the serotonin transporter gene in depression, or cerebellar abnormality in autism, emphasise how failure to cite earlier null results can lead to a misplaced sense of confidence in a phenomenon, which is wasteful in time and money when others attempt to build on it. And the more we encounter a claim, the more likely it is to be judged as true, regardless of actual accuracy (see Pennycook, Cannon, & Rand, 2018 , for a topical example). As Ingelfinger (1976) put it, “faulty or inappropriate references . . . like weeds, tend to reproduce themselves and so permit even the weakest of allegations to acquire, with repeated citation, the guise of factuality” (p. 1076).
Reliance on schemata
Our brains cannot conceivably process all the information around us: we have to find a way to select what is important to function and survive. This involves a search for meaningful patterns, which once established, allow us to focus on what is relevant and ignore the rest. Scientific discovery may be seen as an elevated version of pattern discovery: we see the height of scientific achievement as discovering regularities in nature that allow us to make better predictions about how the world behaves and to create new technologies and interventions from the basic principles we have discovered.
Scientific progress is not a simple process of weighing up competing pieces of evidence in relation to a theory. Rather than simply choosing between one hypothesis and another, we try to understand a problem in terms of a schema. Bartlett (1932) was one of the first psychologists to study how our preconceptions, or schemata, create distortions in perception and memory. He introduced the idea of “reconstructive remembering,” demonstrating how people’s memory of a narrative changed over time in specific ways, to become more coherent and aligned with pre-existing schemata.
Bartlett’s (1932) work on reconstructive remembering can explain why we not only tend to ignore inconsistent evidence ( Duyx, Urlings, Swaen, Bouter, & Zeegers, 2017 ) but also are prone to distort the evidence that we do include ( Vicente & Brewer, 1993 ). If we put together the combined influence of confirmation bias and reconstructive remembering, it suggests that narrative literature reviews have a high probability of being inaccurate: both types of bias will lead to a picture of research converging on a compelling story, when the reality may be far less tidy ( Katz, 2013 ).
I have focused so far on bias in citing prior literature, but schemata also influence how researchers go about writing up results. If we just were to present a set of facts that did not cohere, our work would be difficult to understand and remember. As Chalmers, Hedges, and Cooper (2002) noted, this point was made in 1885 by Lord Raleigh in a presidential address to the British Association for the Advancement of Science:
If, as is sometimes supposed, science consisted in nothing but the laborious accumulation of facts, it would soon come to a standstill, crushed, as it were, under its own weight. The suggestion of a new idea, or the detection of a law, supersedes much that has previously been a burden on the memory, and by introducing order and coherence facilitates the retention of the remainder in an available form. ( Rayleigh, 1885 , p. 20)
Indeed, when we write up our research, we are exhorted to “tell a story,” which achieves the “order and coherence” that Rayleigh described. Since his time, ample literature on narrative comprehension has confirmed that people fill in gaps in unstated information and find texts easier to comprehend and memorise when they fit a familiar narrative structure ( Bower & Morrow, 1990 ; Van den Broek, 1994 ).
This resonates with Dawkins’ ( 1976 ) criteria for a meme, i.e., an idea that persists by being transmitted from person to person. Memes need to be easy to remember, understand, and communicate, and so narrative accounts make far better memes than dry lists of facts. From this perspective, narrative serves a useful function in providing a scaffolding that facilitates communication. However, while this is generally a useful, and indeed essential, aspect of human cognition, in scientific communication, it can lead to propagation of false information. Bartlett (1932) noted that remembering is hardly ever really exact, “and it is not at all important that it should be so.” He was thinking of the beneficial aspects of schemata, in allowing us to avoid information overload and to focus on what is meaningful. However, as Dawkins emphasised, survival of a meme does not depend on it being useful or true. An idea such as the claim that vaccination causes autism is a very effective meme, but it has led to resurgence of diseases that were close to being eradicated.
In communicating scientific results, we need to strike a fine balance between presenting a precis of findings that is easily communicated and moving towards an increase in knowledge. I would argue the pendulum may have swung too far in the direction of encouraging researchers to tell good narratives. Not just media outlets, but also many journal editors and reviewers, encourage authors to tell simple stories that are easy to understand, and those who can produce these may be rewarded with funding and promotion.
The clearest illustration of narrative supplanting accurate reporting comes from the widespread use of HARKing, which was encouraged by Bem (2004) when he wrote,
There are two possible articles you can write: (a) the article you planned to write when you designed your study or (b) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (b).
Of course, formulating a hypothesis on the basis of observed data is a key part of the scientific process. However, as noted above, it is not acceptable to use the same data to both formulate and test the hypothesis—replication in a new sample is needed to avoid being misled by the play of chance and littering literature with false positives ( Lazic, 2016 ; Wagenmakers et al., 2012 ).
Kerr (1998) considered why HARKing is a successful strategy and pointed out that it allowed the researcher to construct an account of an experiment that fits a good story script:
Positing a theory serves as an effective “initiating event.” It gives certain events significance and justifies the investigators’ subsequent purposeful activities directed at the goal of testing the hypotheses. And, when one HARKs, a “happy ending” (i.e., confirmation) is guaranteed. (p. 203)
In this regard, Bem’s advice makes perfect sense: “A journal article tells a straightforward tale of a circumscribed problem in search of a solution. It is not a novel with subplots, flashbacks, and literary allusions, but a short story with a single linear narrative line.”
We have, then, a serious tension in scientific writing. We are expected to be scholarly and honest, to report all our data and analyses and not to hide inconvenient truths. At the same time, if we want people to understand and remember our work, we should tell a coherent story from which unnecessary details have been expunged and where we cut out any part of the narrative that distracts from the main conclusions.
Kerr (1998) was clear that HARKing has serious costs. As well as translating type I errors into hard-to-eradicate theory, he noted that it presents a distorted view of science as a process which is far less difficult and unpredictable than is really the case. We never learn what did not work because inconvenient results are suppressed. For early career researchers, it can lead to cynicism when they learn that the rosy picture portrayed in the literature was achieved only by misrepresentation.
Overcoming cognitive constraints to do better science
One thing that is clear from this overview is that we have known about cognitive constraints for decades, yet they continue to affect scientific research. Finding ways to mitigate the impact of these constraints should be a high priority for experimental psychologists. Here, I draw together some general approaches that might be used to devise an agenda for research improvement. Many of these ideas have been suggested before but without much consideration of cognitive constraints that may affect their implementation. Some methods, such as training, attempt to overcome the constraints directly in individuals: others involve making structural changes to how science is done to counteract our human tendency towards unscientific thinking. None of these provides a total solution: rather, the goal is to tweak the dials that dictate how people think and behave, to move us closer to better scientific practices.
It is often suggested that better training is needed to improve replicability of scientific results, yet the focus tends to be on formal instruction in experimental design and statistics. Less attention has been given to engendering a more intuitive understanding of probability, or counteracting cognitive biases, though there are exceptions, such as the course by Steel, Liermann, and Guttorp (2018) , which starts with a consideration of “How the wiring of the human brain leads to incorrect conclusions from data.” One way of inducing a more intuitive sense of statistics and p -values is by using data simulations. Simulation is not routinely incorporated in statistics training, but free statistical software now makes this within the grasp of all ( Tintle et al., 2015 ). This is a powerful way to experience how easy it is to get a “significant” p -value when running multiple tests. Students are often surprised when they generate repeated runs of a correlation matrix of random numbers with, say, five variables and find at least one “significant” correlation in about one in four runs. Once you understand that there is a difference between the probability associated with getting a specific result on a single test, predicted in advance, versus the probability of that result coming up at least once in a multitude of tests, then the dangers of p -hacking become easier to grasp.
Data simulation could also help overcome the misplaced “belief in the law of small numbers” ( Tversky & Kahneman, 1974 ). By generating datasets with a known effect size, and then taking samples from these and subjecting them to statistical test, the student can learn to appreciate just how easy it is to miss a true effect (type II error) if the study is underpowered.
There is small literature evaluating attempts to specifically inoculate people against certain types of cognitive bias. For instance, Morewedge et al. (2015) developed instructional videos and computer games designed to reduce a series of cognitive biases, including confirmation bias, and found these to be effective over the longer term. Typically, however, such interventions focus on hypothetical scenarios outside the scope of experimental psychology. They might improve scientific quality of research projects if adjusted to make them relevant to conducting and appraising experiments.
Triangulation of methods in study design
I noted above that for science to progress, we need to overcome a tendency to settle on the first theory that seems “good enough” to account for observations. Any method that forces the researcher to actively search for alternative explanations is, therefore, likely to stimulate better research.
The notion of triangulation ( Munafò & Davey Smith, 2018 ) was developed in the field of epidemiology, where reliance is primarily on observational data, and experimental manipulation is not feasible. Inferring causality from correlational data is hazardous, but it is possible to adopt a strategic approach of combining complementary approaches to analysis, each of which has different assumptions, strengths, and weaknesses. Epidemiology progresses when different explanations for correlational data are explicitly identified and evaluated, and converging evidence is obtained ( Lawlor, Tilling, & Davey Smith, 2016 ). This approach could be extended to other disciplines, by explicitly requiring researchers to use at least two different methods with different potential biases when evaluating a specific hypothesis.
A “culture of criticism”
Smith (2006) described peer review as “a flawed process, full of easily identified defects with little evidence that it works” (p. 182). Yet peer review provides one way of forcing researchers to recognise when they are so focused on a favoured theory that they are unable to break away. Hossenfelder (2018) has argued that the field of particle physics has stagnated because of a reluctance to abandon theories that are deemed “beautiful.” We are accustomed to regarding physicists as superior to psychologists in terms of theoretical and methodological sophistication. In general, they place far less emphasis than we do on statistical criteria for evidence, and where they do use statistics, they understand probability theory and adopt very stringent levels of significance. Nevertheless, according to Hossenfelder, they are subject to cognitive and social biases that make them reluctant to discard theories. She concludes her book with an Appendix on “What you can do to help,” and as well as advocating better understanding of cognitive biases, she recommends some cultural changes to address these. These include building “a culture of criticism.” In principle, we already have this—talks and seminars should provide a forum for research to be challenged—but in practice, critiquing another’s work is often seen as clashing with social conventions of being supportive to others, especially when it is conducted in public.
Recently, two other approaches have been developed, with the potential to make a “culture of criticism” more useful and more socially acceptable. Registered Reports ( Chambers, 2019 ) is an approach that was devised to prevent publication bias, p -hacking, and HARKing. This format moves the peer review process to a point before data collection so that results cannot influence editorial decisions. An unexpected positive consequence is that peer review comes at a point when it can be acted upon to improve the experimental design. Where reviewers of Registered Reports ask “how could we disprove the hypothesis?” and “what other explanations should we consider?” this can generate more informative experiments.
A related idea is borrowed from business practices and is known as the “pre mortem” approach ( Klein, 2007 ). Project developers gather together and are asked to imagine that a proposed project has gone ahead and failed. They are then encouraged to write down reasons why this has happened, allowing people to voice misgivings that they may have been reluctant to state openly, so they can be addressed before the project has begun. It would be worth evaluating the effectiveness of pre-mortems for scientific projects. We could strengthen this approach by incorporating ideas from Bang and Frith (2017) , who noted that group decision-making is most likely to be effective when the group is diverse and people can express their views anonymously. With both Registered Reports and the study pre-mortem, reviewers can have a role as critical friends who can encourage researchers to identify ways to improve a project before it is conducted. This can be a more positive experience for the reviewer, who may otherwise have no option but to recommend rejection of a study with flawed methodology.
Counteracting cherry-picking of literature
Turning to cherry-picking of prior literature, the established solution is the systematic review, where clear criteria are laid out in advance so that a comprehensive search can be made of all relevant studies ( Siddaway, Wood, & Hedges, 2019 ). The systematic review is only as good as the data that go into it, however, and if a field suffers from substantial publication bias and/or p -hacking, then, rather than tackling error entrenchment, it may add to it. With the most scrupulous search strategy, relevant papers with null results can be missed because positive results are mentioned in titles and abstracts of papers, whereas null results are not ( Lazic, 2016 , p. 15). This can mean that, if a study is looking at many possible associations (e.g., with brain regions or with genes), studies that considered a specific association but failed to find support for it will be systematically disregarded. This may explain why it seems to take 30 or 40 years for some erroneous entrenched theories to be abandoned. The situation may improve with increasing availability of open data. Provided data are adequately documented and accessible, the problem of missing relevant studies may be reduced.
Ultimately, the problem of biased reviews may not be soluble just by changing people’s citation habits. Journal editors and reviewers could insist that abstracts follow a structured format and report all variables that were tested, not just those that gave significant results. A more radical approach by funders may be needed to disrupt this wasteful cycle. When a research team applies to test a new idea, they could first be required to (a) conduct a systematic review (unless one has been recently done) and (b) replicate the original findings on which the work is based: this is the opposite to what happens currently, where novelty and originality are major criteria for funding. In addition, it could be made mandatory for any newly funded research idea to be investigated by at least two independent laboratories and using at least two different approaches (triangulation). All these measures would drastically slow down science and may be unfeasible where research needs highly specialised equipment, facilities, or skills that are specific to one laboratory. Nevertheless, slower science may be preferable to the current system where there are so many examples of false leads being pursued for decades, with consequent waste of resources.
Reconciling storytelling with honesty
Perhaps the hardest problem is how to reconcile our need for narrative with a “warts and all” account of research. Consider this advice from Bem (2004) —which I suspect many journal editors would endorse:
Contrary to the conventional wisdom, science does not care how clever or clairvoyant you were at guessing your results ahead of time. Scientific integrity does not require you to lead your readers through all your wrongheaded hunches only to show—voila!—they were wrongheaded. A journal article should not be a personal history of your stillborn thoughts . . . Your overriding purpose is to tell the world what you have learned from your study. If your results suggest a compelling framework for their presentation, adopt it and make the most instructive findings your centerpiece . . . Think of your dataset as a jewel. Your task is to cut and polish it, to select the facets to highlight, and to craft the best setting for it.
As Kerr (1998) pointed out, HARKing gives a misleading impression of what was found, which can be particularly damaging for students, who on reading literature may form the impression that it is normal for scientists to have their predictions confirmed and think of themselves as incompetent when their own experiments do not work out that way. One of the goals of pre-registration is to ensure that researchers do not omit inconvenient facts when writing up a study—or if they do, at least make it possible to see that this has been done. In the field of clinical medicine, impressive progress has been made in methodology, with registration now a requirement for clinical trials ( International Committee of Medical Journal Editors, 2019 ). Yet, Goldacre et al. (2019) found that even when a trial was registered, it was common for researchers to change the primary outcome measure without explanation, and it has been similarly noted that pre-registrations in psychology are often too ambiguous to preclude p -hacking ( Veldkamp et al., 2018 ). Registered Reports ( Chambers, 2019 ) adopt stricter standards that should prevent HARKing, but the author may struggle to maintain a strong narrative because messy reality makes a less compelling story than a set of results subjected to Bem’s (2004) cutting and polishing process.
Rewarding credible research practices
A final set of recommendations has to do with changing the culture so that incentives are aligned with efforts to counteract unhelpful cognitive constraints, and researchers are rewarded for doing reproducible, replicable research, rather than for grant income or publications in high-impact journals ( Forstmeier, Wagenmakers, & Parker, 2016 ; Pulverer, 2015 ). There is already evidence that funders are concerned to address problems with credibility of biomedical research ( Academy of Medical Sciences, 2015 ), and rigour and reproducibility are increasingly mentioned in grant guidelines (e.g., https://grants.nih.gov/policy/reproducibility/index.htm ). One funder, Cancer Research UK, is innovating by incorporating Registered Reports in a two-stage funding model ( Munafò, 2017 ). We now need publishers and institutions to follow suit and ensure that researchers are not disadvantaged by adopting a self-critical mind-set and engaging in practices of open and reproducible science ( Poldrack, 2019 ).
Acknowledgments
My thanks to Kate Nation, Matt Jaquiery, Joe Chislett, Laura Fortunato, Uta Frith, Stefan Lewandowsky, and Karalyn Patterson for invaluable comments on an early draft of this manuscript.
Declaration of conflicting interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The author is supported by a Principal Research Fellowship from the Wellcome Trust (programme grant no. 082498) and European Research Council advanced grant no. 694189.
7 Famous Psychology Experiments
Many famous experiments studying human behavior have impacted our fundamental understanding of psychology. Though some could not be repeated today due to breaches in ethical boundaries, that does not diminish the significance of those psychological studies. Some of these important findings include a greater awareness of depression and its symptoms, how people learn behaviors through the process of association and how individuals conform to a group.
Below, we take a look at seven famous psychological experiments that greatly influenced the field of psychology and our understanding of human behavior.
The Little Albert Experiment, 1920
A John’s Hopkins University professor, Dr. John B. Watson, and a graduate student wanted to test a learning process called classical conditioning. Classical conditioning involves learning involuntary or automatic behaviors by association, and Dr. Watson thought it formed the bedrock of human psychology.
A nine-month-old toddler, dubbed “Albert B,” was volunteered for Dr. Watson and Rosalie Rayner ‘s experiment. Albert played with white furry objects, and at first, the toddler displayed joy and affection. Over time, as he played with the objects, Dr. Watson would make a loud noise behind the child’s head to frighten him. After numerous trials, Albert was conditioned to be afraid when he saw white furry objects.
The study proved that humans could be conditioned to enjoy or fear something, which many psychologists believe could explain why people have irrational fears and how they may have developed early in life. This is a great example of experimental study psychology.
Stanford Prison Experiment, 1971
Stanford professor Philip Zimbardo wanted to learn how individuals conformed to societal roles. He wondered, for example, whether the tense relationship between prison guards and inmates in jails had more to do with the personalities of each or the environment.
During Zimbardo’s experiment , 24 male college students were assigned to be either a prisoner or a guard. The prisoners were held in a makeshift prison inside the basement of Stanford’s psychology department. They went through a standard booking process designed to take away their individuality and make them feel anonymous. Guards were given eight-hour shifts and tasked to treat the prisoners just like they would in real life.
Zimbardo found rather quickly that both the guards and prisoners fully adapted to their roles; in fact, he had to shut down the experiment after six days because it became too dangerous. Zimbardo even admitted he began thinking of himself as a police superintendent rather than a psychologist. The study confirmed that people will conform to the social roles they’re expected to play, especially overly stereotyped ones such as prison guards.
“We realized how ordinary people could be readily transformed from the good Dr. Jekyll to the evil Mr. Hyde,” Zimbardo wrote.
The Asch Conformity Study, 1951
Solomon Asch, a Polish-American social psychologist, was determined to see whether an individual would conform to a group’s decision, even if the individual knew it was incorrect. Conformity is defined by the American Psychological Association as the adjustment of a person’s opinions or thoughts so that they fall closer in line with those of other people or the normative standards of a social group or situation.
In his experiment , Asch selected 50 male college students to participate in a “vision test.” Individuals would have to determine which line on a card was longer. However, the individuals at the center of the experiment did not know that the other people taking the test were actors following scripts, and at times selected the wrong answer on purpose. Asch found that, on average over 12 trials, nearly one-third of the naive participants conformed with the incorrect majority, and only 25 percent never conformed to the incorrect majority. In the control group that featured only the participants and no actors, less than one percent of participants ever chose the wrong answer.
Asch’s experiment showed that people will conform to groups to fit in (normative influence) because of the belief that the group was better informed than the individual. This explains why some people change behaviors or beliefs when in a new group or social setting, even when it goes against past behaviors or beliefs.
The Bobo Doll Experiment, 1961, 1963
Stanford University professor Albert Bandura wanted to put the social learning theory into action. Social learning theory suggests that people can acquire new behaviors “through direct experience or by observing the behavior of others.” Using a Bobo doll , which is a blow-up toy in the shape of a life-size bowling pin, Bandura and his team tested whether children witnessing acts of aggression would copy them.
Bandura and two colleagues selected 36 boys and 36 girls between the ages of 3 and 6 from the Stanford University nursery and split them into three groups of 24. One group watched adults behaving aggressively toward the Bobo doll. In some cases, the adult subjects hit the doll with a hammer or threw it in the air. Another group was shown an adult playing with the Bobo doll in a non-aggressive manner, and the last group was not shown a model at all, just the Bobo doll.
After each session, children were taken to a room with toys and studied to see how their play patterns changed. In a room with aggressive toys (a mallet, dart guns, and a Bobo doll) and non-aggressive toys (a tea set, crayons, and plastic farm animals), Bandura and his colleagues observed that children who watched the aggressive adults were more likely to imitate the aggressive responses.
Unexpectedly, Bandura found that female children acted more physically aggressive after watching a male subject and more verbally aggressive after watching a female subject. The results of the study highlight how children learn behaviors from observing others.
The Learned Helplessness Experiment, 1965
Martin Seligman wanted to research a different angle related to Dr. Watson’s study of classical conditioning. In studying conditioning with dogs, Seligman made an astute observation : the subjects, which had already been conditioned to expect a light electric shock if they heard a bell, would sometimes give up after another negative outcome, rather than searching for the positive outcome.
Under normal circumstances, animals will always try to get away from negative outcomes. When Seligman tested his experiment on animals who hadn’t been previously conditioned, the animals attempted to find a positive outcome. Oppositely, the dogs who had been already conditioned to expect a negative response assumed there would be another negative response waiting for them, even in a different situation.
The conditioned dogs’ behavior became known as learned helplessness, the idea that some subjects won’t try to get out of a negative situation because past experiences have forced them to believe they are helpless. The study’s findings shed light on depression and its symptoms in humans.
Is a Psychology Degree Right for You?
Develop you strength in psychology, communication, critical thinking, research, writing, and more.
The Milgram Experiment, 1963
In the wake of the horrific atrocities carried out by Nazi Germany during World War II, Stanley Milgram wanted to test the levels of obedience to authority. The Yale University professor wanted to study if people would obey commands, even when it conflicted with the person’s conscience.
Participants of the condensed study , 40 males between the ages of 20 and 50, were split into learners and teachers. Though it seemed random, actors were always chosen as the learners, and unsuspecting participants were always the teachers. A learner was strapped to a chair with electrodes in one room while the experimenter äóñ another actor äóñ and a teacher went into another.
The teacher and learner went over a list of word pairs that the learner was told to memorize. When the learner incorrectly paired a set of words together, the teacher would shock the learner. The teacher believed the shocks ranged from mild all the way to life-threatening. In reality, the learner, who intentionally made mistakes, was not being shocked.
As the voltage of the shocks increased and the teachers became aware of the believed pain caused by them, some refused to continue the experiment. After prodding by the experimenter, 65 percent resumed. From the study, Milgram devised the agency theory , which suggests that people allow others to direct their actions because they believe the authority figure is qualified and will accept responsibility for the outcomes. Milgram’s findings help explain how people can make decisions against their own conscience, such as when participating in a war or genocide.
The Halo Effect Experiment, 1977
University of Michigan professors Richard Nisbett and Timothy Wilson were interested in following up a study from 50 years earlier on a concept known as the halo effect . In the 1920s, American psychologist Edward Thorndike researched a phenomenon in the U.S. military that showed cognitive bias. This is an error in how we think that affects how we perceive people and make judgements and decisions based on those perceptions.
In 1977, Nisbett and Wilson tested the halo effect using 118 college students (62 males, 56 females). Students were divided into two groups and were asked to evaluate a male Belgian teacher who spoke English with a heavy accent. Participants were shown one of two videotaped interviews with the teacher on a television monitor. The first interview showed the teacher interacting cordially with students, and the second interview showed the teacher behaving inhospitably. The subjects were then asked to rate the teacher’s physical appearance, mannerisms, and accent on an eight-point scale from appealing to irritating.
Nisbett and Wilson found that on physical appearance alone, 70 percent of the subjects rated the teacher as appealing when he was being respectful and irritating when he was cold. When the teacher was rude, 80 percent of the subjects rated his accent as irritating, as compared to nearly 50 percent when he was being kind.
The updated study on the halo effect shows that cognitive bias isn’t exclusive to a military environment. Cognitive bias can get in the way of making the correct decision, whether it’s during a job interview or deciding whether to buy a product that’s been endorsed by a celebrity we admire.
How Experiments Have Impacted Psychology Today
Contemporary psychologists have built on the findings of these studies to better understand human behaviors, mental illnesses, and the link between the mind and body. For their contributions to psychology, Watson, Bandura, Nisbett and Zimbardo were all awarded Gold Medals for Life Achievement from the American Psychological Foundation. Become part of the next generation of influential psychologists with King University’s online bachelor’s in psychology . Take advantage of King University’s flexible online schedule and complete the major coursework of your degree in as little as 16 months. Plus, as a psychology major, King University will prepare you for graduate school with original research on student projects as you pursue your goal of being a psychologist.
- Bipolar Disorder
- Therapy Center
- When To See a Therapist
- Types of Therapy
- Best Online Therapy
- Best Couples Therapy
- Managing Stress
- Sleep and Dreaming
- Understanding Emotions
- Self-Improvement
- Healthy Relationships
- Student Resources
- Personality Types
- Guided Meditations
- Verywell Mind Insights
- 2024 Verywell Mind 25
- Mental Health in the Classroom
- Editorial Process
- Meet Our Review Board
- Crisis Support
Classic Psychology Experiments
The history of psychology is filled with fascinating studies and classic psychology experiments that helped change the way we think about ourselves and human behavior. Sometimes the results of these experiments were so surprising they challenged conventional wisdom about the human mind and actions. In other cases, these experiments were also quite controversial.
Some of the most famous examples include Milgram's obedience experiment and Zimbardo's prison experiment. Explore some of these classic psychology experiments to learn more about some of the best-known research in psychology history.
Harlow’s Rhesus Monkey Experiments
In a series of controversial experiments conducted in the late 1950s and early 1960s, psychologist Harry Harlow demonstrated the powerful effects of love on normal development. By showing the devastating effects of deprivation on young rhesus monkeys , Harlow revealed the importance of love for healthy childhood development.
His experiments were often unethical and shockingly cruel, yet they uncovered fundamental truths that have heavily influenced our understanding of child development.
In one famous version of the experiments, infant monkeys were separated from their mothers immediately after birth and placed in an environment where they had access to either a wire monkey "mother" or a version of the faux-mother covered in a soft-terry cloth. While the wire mother provided food, the cloth mother provided only softness and comfort.
Harlow found that while the infant monkeys would go to the wire mother for food, they vastly preferred the company of the soft and comforting cloth mother. The study demonstrated that maternal bonds were about much more than simply providing nourishment and that comfort and security played a major role in the formation of attachments .
Pavlov’s Classical Conditioning Experiments
The concept of classical conditioning is studied by every entry-level psychology student, so it may be surprising to learn that the man who first noted this phenomenon was not a psychologist at all. Pavlov was actually studying the digestive systems of dogs when he noticed that his subjects began to salivate whenever they saw his lab assistant.
What he soon discovered through his experiments was that certain responses (drooling) could be conditioned by associating a previously neutral stimulus (metronome or buzzer) with a stimulus that naturally and automatically triggers a response (food). Pavlov's experiments with dogs established classical conditioning.
The Asch Conformity Experiments
Researchers have long been interested in the degree to which people follow or rebel against social norms. During the 1950s, psychologist Solomon Asch conducted a series of experiments designed to demonstrate the powers of conformity in groups.
The study revealed that people are surprisingly susceptible to going along with the group, even when they know the group is wrong. In Asch's studies, students were told that they were taking a vision test and were asked to identify which of three lines was the same length as a target line.
When asked alone, the students were highly accurate in their assessments. In other trials, confederate participants intentionally picked the incorrect line. As a result, many of the real participants gave the same answer as the other students, demonstrating how conformity could be both a powerful and subtle influence on human behavior.
Skinner's Operant Conditioning Experiments
Skinner studied how behavior can be reinforced to be repeated or weakened to be extinguished. He designed the Skinner Box where an animal, often a rodent, would be given a food pellet or an electric shock. A rat would learn that pressing a level delivered a food pellet. Or the rat would learn to press the lever in order to halt electric shocks.
Then, the animal may learn to associate a light or sound with being able to get the reward or halt negative stimuli by pressing the lever. Furthermore, he studied whether continuous, fixed ratio, fixed interval , variable ratio, and variable interval reinforcement led to faster response or learning.
Milgram’s Obedience Experiments
In Milgram's experiment , participants were asked to deliver electrical shocks to a "learner" whenever an incorrect answer was given. In reality, the learner was actually a confederate in the experiment who pretended to be shocked. The purpose of the experiment was to determine how far people were willing to go in order to obey the commands of an authority figure.
Milgram found that 65% of participants were willing to deliver the maximum level of shocks despite the fact that the learner seemed to be in serious distress or even unconscious.
Why This Experiment Is Notable
Milgram's experiment is one of the most controversial in psychology history. Many participants experienced considerable distress as a result of their participation and in many cases were never debriefed after the conclusion of the experiment. The experiment played a role in the development of ethical guidelines for the use of human participants in psychology experiments.
The Stanford Prison Experiment
Philip Zimbardo's famous experiment cast regular students in the roles of prisoners and prison guards. While the study was originally slated to last 2 weeks, it had to be halted after just 6 days because the guards became abusive and the prisoners began to show signs of extreme stress and anxiety.
Zimbardo's famous study was referred to after the abuses in Abu Ghraib came to light. Many experts believe that such group behaviors are heavily influenced by the power of the situation and the behavioral expectations placed on people cast in different roles.
It is worth noting criticisms of Zimbardo's experiment, however. While the general recollection of the experiment is that the guards became excessively abusive on their own as a natural response to their role, the reality is that they were explicitly instructed to mistreat the prisoners, potentially detracting from the conclusions of the study.
Van rosmalen L, Van der veer R, Van der horst FCP. The nature of love: Harlow, Bowlby and Bettelheim on affectionless mothers. Hist Psychiatry. 2020. doi:10.1177/0957154X19898997
Gantt WH . Ivan Pavlov . Encyclopaedia Brittanica .
Jeon, HL. The environmental factor within the Solomon Asch Line Test . International Journal of Social Science and Humanity. 2014;4(4):264-268. doi:10.7763/IJSSH.2014.V4.360
Koren M. B.F. Skinner: The man who taught pigeons to play ping-pong and rats to pull levers . Smithsonian Magazine .
B.F. Skinner Foundation. A brief survey of operant behavior .
Gonzalez-franco M, Slater M, Birney ME, Swapp D, Haslam SA, Reicher SD. Participant concerns for the Learner in a Virtual Reality replication of the Milgram obedience study. PLoS ONE. 2018;13(12):e0209704. doi:10.1371/journal.pone.0209704
Zimbardo PG. Philip G. Zimbardo on his career and the Stanford Prison Experiment's 40th anniversary. Interview by Scott Drury, Scott A. Hutchens, Duane E. Shuttlesworth, and Carole L. White. Hist Psychol. 2012;15(2):161-170. doi:10.1037/a0025884
Le texier T. Debunking the Stanford Prison Experiment. Am Psychol. 2019;74(7):823-839. doi:10.1037/amp0000401
Perry G. Deception and illusion in Milgram's accounts of the Obedience Experiments . Theoretical & Applied Ethics . 2013;2(2):79-92.
Specter M. Drool: How Everyone Gets Pavlov Wrong . The New Yorker. 2014; November 24.
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
The Results of the World’s Largest-Ever Psych Experiment Are In
A study of over 7 billion subjects completed december 31, 2020..
Posted January 3, 2021 | Reviewed by Gary Drevitch
Pretty much every behavioral science experiment ever conducted, whether on animals or humans carries with it a probability—not a certainty—that the results are valid. You will usually see, somewhere in the results section of publications describing the study the notation p<.05 or p<.01, signifying that the obtained results are true for an entire population of pigeons, rats, monkeys, humans, or all of the above with less than either a 5% or 1% probability ( p ) respectively that the results occurred due to random chance, not the manipulations that the experimenters applied to test subjects in the experimental vs. control group.
Asserting probabilities vs. certainties is necessary because running experiments on entire populations—say of 7.8 billion humans or an even greater number of the global population of rats—is impossible, so statistical inference from far smaller populations has always been necessary, producing the 1% or 5% margins of error.
Impossible, that is, until now.
2020 subjected the entire human population to several experimental manipulations, and, with the year over, the results of those experiments are now in. So, here, paying homage to the standard introduction, methods, results, discussion format of science papers is a description of those experiments, whose margin for error is exactly zero .
A highly contagious (R0 >1) [1], sometimes lethal pathogen was introduced into a dense population center at the beginning of the trial period, and resulting effects on both individual and group attitude and behavior measured. Manipulations were applied to experimental groups with no manipulation applied to control groups, employing a between-groups design (one group subjected to experimental measure, one group not subjected). Conditions in test and control populations were as follows:
Experimental treatment group: Strong public policy and pathogen control measures including rigorously enforced, prolonged forced lockdowns, mandatory mask-wearing, mandatory controlled quarantine, and forceful, science-based public health messaging emphasizing the need for drastic changes in behavior to control pathogen spread.
Control group: Weak public policy, including voluntary or absent pathogen control measures, weak or absent enforcement, accompanied by sporadic, inconsistent, or contradictory public health messages and/or random news reports depicting overcrowded ICU’s, exhausted health care workers, and peaks in infection rates
Dependent variables measured were:
- Beliefs and attitudes in target groups regarding reality of pathogen’s existence and need for behavior modification as evidenced by self-report and social media posts.
- Compliance with behavioral modifications espoused in public health messages as evidenced by mobile-device tracking data indicating the presence or absence of social distancing.
- Pathogen mortality rate expressed in deaths/million of population.
Beliefs and attitudes: The treatment groups (e.g. China, New Zealand, South Korea) receiving strong, consistent public policy and health messaging together with consonant, strongly enforced, mandated pathogen control measures, exhibited near-universal belief in both the seriousness of the pathogen and the need for government-mandated control measures. Beliefs and attitudes in the control group were far more heterogenous, with 50% or greater of the control population expressing strong doubt in the very existence of the pathogen and/or need for aggressive pathogen control measures.
Compliance with public health guidelines—as measured by cellphone proximity tracking data, self-report, and other measures—was greater than 90% in treatment populations, and estimated to be less than 30% across the entire study period (2020) in control populations [2], although control populations sporadically exhibited strong compliance in the early (Mach/April) interval of the study period.
Pathogen mortality rate per capita deaths per million: treatment group (e.g. China): 3.4 deaths/million [3] Control group (e.g. USA ) 921 deaths/million [3]
As compared with the control group, the treatment group showed concomitant beliefs in the seriousness of the pathogen, need for strict measures, compliance with guidelines, and low mortality, suggesting that the treatment conditions were an effective means of controlling the pathogen through modification of attitudes, beliefs, and behaviors.
Possible confounding variables that could cast doubt on this conclusion are the influence on behavior of political and cultural differences responding to authority. That is, the cultural/political acceptance of the population of authoritarian measures and perceived threat of coercion, rather than the treatment itself, might have caused lower infections, for example, in Asian autocracies vs. Western-style democracies. However, these confounding variables can be ruled out by examination of public attitudes, compliance, and mortality rates in western-style democracies such as New Zealand ( 5.1 deaths per million ) and South Korea ( 18.2 deaths/million ). [3]
Thus, the results of the experiment support the conclusion that public policy-engendered attitudes, beliefs —and behaviors driven by those beliefs —were the primary determinants of mortality in the experiment. It is also possible that the principle of cognitive dissonance (personal beliefs were driven by personal behavior in the experimental and control groups vs the other way around) can explain all or part of the results.
Prior research by Kahneman et al, [4] Roepick [5] and others, suggesting that human perceptions and beliefs in the control population arose not from assessment of facts, logic and objective probabilities, but from hardwired and acquired cognitive biases that greatly simplify decision-making while maintaining positive affective states (feeling safe and comfortable when faced with uncertainty and helplessness) has been validated.
More important, the prior work of Kahneman, Tversky, and others demonstrating cognitive biases and endemic errors in human judgment have now, based upon extremely high (>7.8 billion) sample size, been unequivocally extended to roughly half the entire human population, if not more.
Future Research
None required. For the first time in the history of science, absolute certainty of experimental findings has been achieved.
[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538841/
[2] https://www.unacast.com/covid19/social-distancing-scoreboard
[3] https://coronavirus.jhu.edu/data/mortality
[4] Tversky, A.; Kahneman, D. (1974). "Judgment under Uncertainty: Heuristics and Biases". Science. 185 (4157)
[5] Ropeick, D. Risk: A Practical Guide for Deciding What's Really Safe and What's Really Dangerous in the World Around You, 2002
Eric Haseltine, Ph.D ., is a neuroscientist and the author of Long Fuse, Big Bang.
- Find a Therapist
- Find a Treatment Center
- Find a Psychiatrist
- Find a Support Group
- Find Online Therapy
- United States
- Brooklyn, NY
- Chicago, IL
- Houston, TX
- Los Angeles, CA
- New York, NY
- Portland, OR
- San Diego, CA
- San Francisco, CA
- Seattle, WA
- Washington, DC
- Asperger's
- Bipolar Disorder
- Chronic Pain
- Eating Disorders
- Passive Aggression
- Personality
- Goal Setting
- Positive Psychology
- Stopping Smoking
- Low Sexual Desire
- Relationships
- Child Development
- Self Tests NEW
- Therapy Center
- Diagnosis Dictionary
- Types of Therapy
It’s increasingly common for someone to be diagnosed with a condition such as ADHD or autism as an adult. A diagnosis often brings relief, but it can also come with as many questions as answers.
- Emotional Intelligence
- Gaslighting
- Affective Forecasting
- Neuroscience
Pardon Our Interruption
As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:
- You've disabled JavaScript in your web browser.
- You're a power user moving through this website with super-human speed.
- You've disabled cookies in your web browser.
- A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .
To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.
Sample articles
Journal of experimental psychology: general ®.
- The Role of Political Devotion in Sharing Partisan Misinformation and Resistance to Fact-Checking (PDF, 3.3MB) June 2023 by Clara Pretus, Camila Servin-Barthet, Elizabeth A. Harris, William J. Brady, Oscar Vilarroya, and Jay J. Van Bavel
- The Secret to Happiness: Feeling Good or Feeling Right? (PDF, 171KB) October 2017 by Maya Tamir, Shalom H. Schwartz, Shige Oishi, and Min Y. Kim
- Overdistribution Illusions: Categorical Judgments Produce Them, Confidence Ratings Reduce Them (PDF, 182KB) January 2017 by C. J. Brainerd, K. Nakamura, V. F. Reyna, and R. E. Holliday
- Motivated Recall in the Service of the Economic System: The Case of Anthropogenic Climate Change (PDF, 197KB) June 2016 by Erin P. Hennes, Benjamin C. Ruisch, Irina Feygina, Christopher A. Monteiro, and John T. Jost
- Handwriting Generates Variable Visual Output to Facilitate Symbol Learning (PDF, 234KB) March 2016 by Julia X. Li and Karin H. James
- Perceptual Dehumanization of Faces Is Activated by Norm Violations and Facilitates Norm Enforcement (PDF, 262KB) February 2016 by Katrina M. Fincher and Philip E. Tetlock
- Moving While Black: Intergroup Attitudes Influence Judgments of Speed (PDF, 71KB) February 2016 by Andreana C. Kenrick, Stacey Sinclair, Jennifer Richeson, Sara C. Verosky, and Janetta Lun
- Searching for Explanations: How the Internet Inflates Estimates of Internal Knowledge (PDF, 138KB) June 2015 by Matthew Fisher, Mariel K. Goddu, and Frank C. Keil
- Finding a Needle in a Haystack: Toward a Psychologically Informed Method for Aviation Security Screening (PDF, 129KB) February 2015 by Thomas C. Ormerod and Coral J. Dando
- The Myth of Harmless Wrongs in Moral Cognition: Automatic Dyadic Completion From Sin to Suffering (PDF, 197KB) August 2014 by Kurt Gray, Chelsea Schein, and Adrian F. Ward
- Get Excited: Reappraising Pre-Performance Anxiety as Excitement (PDF, 217KB) June 2014 by Alison Wood Brooks
- Set-Fit Effects in Choice (PDF, 93KB) April 2014 by Ellen R. K. Evers, Yoel Inbar, and Marcel Zeelenberg
- I Want to Help You, But I Am Not Sure Why: Gaze-Cuing Induces Altruistic Giving (PDF, 190KB) April 2014 by Robert D. Rogers, Andrew P. Bayliss, Anna Szepietowska, Laura Dale, Lydia Reeder, Gloria Pizzamiglio, Karolina Czarna, Judi Wakeley, Phillip J. Cowen, Judi Wakeley and Phillip J. Cowen
- Breaking the Cycle of Mistrust: Wise Interventions to Provide Critical Feedback Across the Racial Divide (PDF, 366KB) April 2014 by David Scott Yeager, Valerie Purdie-Vaughns, Julio Garcia, Nancy Apfel, Patti Brzustoski, Allison Master, William T. Hessert, Matthew E. Williams, and Geoffrey L. Cohen
- Consolidation Power of Extrinsic Rewards: Reward Cues Enhance Long-Term Memory for Irrelevant Past Events (PDF, 87KB) February 2014 by Kou Murayama and Shinji Kitagami
- The Invisible Man: Interpersonal Goals Moderate Inattentional Blindness to African Americans (PDF, 83KB) February 2014 by Jazmin L. Brown-Iannuzzi, Kelly M. Hoffman, B. Keith Payne, and Sophie Trawalter
- Learning to Contend With Accents in Infancy: Benefits of Brief Speaker Exposure (PDF, 145KB) February 2014 by Marieke van Heugten and Elizabeth K. Johnson
- On Feeding Those Hungry for Praise: Person Praise Backfires in Children With Low Self-Esteem (PDF, 102KB) February 2014 by Eddie Brummelman, Sander Thomaes, Geertjan Overbeek, Bram Orobio de Castro, Marcel A. van den Hout, and Brad J. Bushman
- Paying It Forward: Generalized Reciprocity and the Limits of Generosity (PDF, 256KB) February 2014 by Kurt Gray, Adrian F. Ward, and Michael I. Norton
- Shape Beyond Recognition: Form-Derived Directionality and Its Effects on Visual Attention and Motion Perception (PDF, 336KB) February 2014 by Heida M. Sigurdardottir, Suzanne M. Michalak, and David L. Sheinberg
- Consonance and Pitch (PDF, 380KB) November 2013 by Neil McLachlan, David Marco, Maria Light, and Sarah Wilson
- Forgetting Our Personal Past: Socially Shared Retrieval-Induced Forgetting of Autobiographical Memories (PDF, 126KB) November 2013 by Charles B. Stone, Amanda J. Barnier, John Sutton, and William Hirst
- Preventing Motor Skill Failure Through Hemisphere-Specific Priming: Cases From Choking Under Pressure (PDF, 129KB) August 2013 by Jürgen Beckmann, Peter Gröpel, and Felix Ehrlenspiel
- How Decisions Emerge: Action Dynamics in Intertemporal Decision Making (PDF, 206KB) February 2013 by Maja Dshemuchadse, Stefan Scherbaum, and Thomas Goschke
- Improving Working Memory Efficiency by Reframing Metacognitive Interpretation of Task Difficulty (PDF, 110KB) November 2012 by Frédérique Autin and Jean-Claude Croizet
- Divine Intuition: Cognitive Style Influences Belief in God (PDF, 102KB) August 2012 by Amitai Shenhav, David G. Rand, and Joshua D. Greene
- Internal Representations Reveal Cultural Diversity in Expectations of Facial Expressions of Emotion (PDF, 320KB) February 2012 by Rachael E. Jack, Roberto Caldara, and Philippe G. Schyns
- The Pain Was Greater If It Will Happen Again: The Effect of Anticipated Continuation on Retrospective Discomfort (PDF, 148KB) February 2011 by Jeff Galak and Tom Meyvis
- The Nature of Gestures' Beneficial Role in Spatial Problem Solving (PDF, 181KB) February 2011 by Mingyuan Chu and Sotaro Kita
More about this journal
- Journal of Experimental Psychology: General
- Pricing and subscription info
COMMENTS
There are three types of experiments you need to know: 1. Lab Experiment. A laboratory experiment in psychology is a research method in which the experimenter manipulates one or more independent variables and measures the effects on the dependent variable under controlled conditions. A laboratory experiment is conducted under highly controlled ...
3. Bobo Doll Experiment Study Conducted by: Dr. Alburt Bandura. Study Conducted between 1961-1963 at Stanford University . Experiment Details: During the early 1960s a great debate began regarding the ways in which genetics, environmental factors, and social learning shaped a child's development. This debate still lingers and is commonly referred to as the Nature vs. Nurture Debate.
The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis. For example, researchers may want to learn how different visual patterns may impact our perception.
Experimental Psychology Examples. 1. The Puzzle Box Studies (Thorndike, 1898) Placing different cats in a box that can only be escaped by pulling a cord, and then taking detailed notes on how long it took for them to escape allowed Edward Thorndike to derive the Law of Effect: actions followed by positive consequences are more likely to occur again, and actions followed by negative ...
The Journal of Experimental Psychology: General ® publishes articles describing empirical work that is of broad interest or bridges the traditional interests of two or more communities of psychology. The work may touch on issues dealt with in JEP: Learning, Memory, and Cognition, JEP: Human Perception and Performance, JEP: Animal Behavior Processes, or JEP: Applied, but may also concern ...
Three types of experimental designs are commonly used: 1. Independent Measures. Independent measures design, also known as between-groups, is an experimental design where different participants are used in each condition of the independent variable. This means that each condition of the experiment includes a different group of participants.
Empirical research, including meta-analyses, submitted to the Journal of Experimental Psychology: Learning, Memory, and Cognition must, at a minimum, meet Level 1 (Disclosure) for all eight aspects of research planning and reporting as well as Level 2 (Requirement) for citation; data, code, and materials transparency; and study and analysis ...
experimental psychology, a method of studying psychological phenomena and processes.The experimental method in psychology attempts to account for the activities of animals (including humans) and the functional organization of mental processes by manipulating variables that may give rise to behaviour; it is primarily concerned with discovering laws that describe manipulable relationships.
Experimental research serves as a fundamental scientific method aimed at unraveling cause-and-effect relationships between variables across various disciplines. This paper delineates the key ...
The Journal of Experimental Psychology: Applied® publishes original empirical investigations in experimental psychology that bridge practical problems and psychological theory. Review articles may be considered for publication if they contribute significantly to important topics within applied experimental psychology, but the primary focus is ...
Experimental psychology is not the only theme that uses experimental research and focuses on the traditional core topics of psychology ... (3.91%), as the third most popular method. Mixed-methods research studies (0.98%) occurred across most themes, whereas multi-method research was indicated in only one study and amounted to 0.10% of the ...
Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too. In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition ...
The experimental method in psychology helps us learn more about how people think and why they behave the way they do. Experimental psychologists can research a variety of topics using many different experimental methods. Each one contributes to what we know about the mind and human behavior. 4 Sources.
Olivia Guy-Evans, MSc. Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.
Piano stairs experiment. Cognitive dissonance experiments. False memory experiments. You might not be able to replicate an experiment exactly (lots of classic psychology experiments have ethical issues that would preclude conducting them today), but you can use well-known studies as a basis for inspiration.
Experimental psychologists are interested in exploring theoretical questions, often by creating a hypothesis and then setting out to prove or disprove it through experimentation. They study a wide range of behavioral topics among humans and animals, including sensation, perception, attention, memory, cognition and emotion.
Introduction. The past decade has been a bruising one for experimental psychology. The publication of a paper by Simmons, Nelson, and Simonsohn (2011) entitled "False-positive psychology" drew attention to problems with the way in which research was often conducted in our field, which meant that many results could not be trusted. Simmons et al. focused on "undisclosed flexibility in data ...
By Kristen Fescoe Published January 2016. The field of psychology is a very broad field comprised of many smaller specialty areas. Each of these specialty areas has been strengthened over the years by research studies designed to prove or disprove theories and hypotheses that pique the interests of psychologists throughout the world. While each ...
The study proved that humans could be conditioned to enjoy or fear something, which many psychologists believe could explain why people have irrational fears and how they may have developed early in life. This is a great example of experimental study psychology. Stanford Prison Experiment, 1971
The history of psychology is filled with fascinating studies and classic psychology experiments that helped change the way we think about ourselves and human behavior. Sometimes the results of these experiments were so surprising they challenged conventional wisdom about the human mind and actions. In other cases, these experiments were also ...
More important, the prior work of Kahneman, Tversky, and others demonstrating cognitive biases and endemic errors in human judgment have now, based upon extremely high (>7.8 billion) sample size ...
These findings indicate that our approach is more effective in achieving the desired psychological manipulations, offering a potent tool for future research in psychology. Moreover our method makes the GANs methodology accessible to a broader audience, facilitating a wider range of scientific inquiries and broadening the scope of potential studies.
Recently published articles from subdisciplines of psychology covered by more than 90 APA Journals™ publications. For additional free resources (such as article summaries, podcasts, and more), please visit the Highlights in Psychological Research page. Browse and read free articles from APA Journals across the field of psychology, selected by ...
o a) Cognitive psychology o b) Developmental psychology o c) Experimental psychology o d) Clinical psychology 2. What is the primary focus of cognitive psychology? o a) The relationship between the nervous system and behavior o b) The study of mental processes such as perception, memory, and problem-solving o c) The influence of genetics on ...
Sample articles. Journal of Experimental Psychology: General ®. The Role of Political Devotion in Sharing Partisan Misinformation and Resistance to Fact-Checking (PDF, 3.3MB) June 2023 by Clara Pretus, Camila Servin-Barthet, Elizabeth A. Harris, William J. Brady, Oscar Vilarroya, and Jay J. Van Bavel. The Secret to Happiness: Feeling Good or ...