• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

The Use of Self-Report Data in Psychology

  • Disadvantages
  • Other Data Sources

How to Create a Self-Report Study

In psychology, a self-report is any test, measure, or survey that relies on an individual's own report of their symptoms, behaviors, beliefs, or attitudes. Self-report data is gathered typically in paper-and-pencil or electronic format or sometimes through an interview.

Self-reporting is commonly used in psychological studies because it can yield valuable and diagnostic information to a researcher or a clinician.

This article explores examples of how self-report data is used in psychology. It also covers the advantages and disadvantages of this approach.

Examples of Self-Reports

To understand how self-reports are used in psychology, it can be helpful to look at some examples. Some many well-known assessments and inventories rely on self-reporting to collect data.

One of the most commonly used self-report tools is the  Minnesota Multiphasic Personality Inventory (MMPI) for personality testing . This inventory includes more than 500 questions focused on different areas, including behaviors, psychological health, interpersonal relationships, and attitudes. It is often used as a mental health assessment, but it is also used in legal cases, custody evaluations, and as a screening instrument for some careers.

The 16 Personality Factor (PF) Questionnaire

This personality inventory is often used as a diagnostic tool to help therapists plan treatment. It can be used to learn more about various individual characteristics, including empathy, openness, attitudes, attachment quality, and coping style.

Myers-Briggs Type Indicator (MBTI)

The MBTI is a popular personality measure that describes personality types in four categories: introversion or extraversion, sensing or intuiting, thinking or feeling, and judging or perceiving. A letter is taken from each category to describe a person's personality type, such as INTP or ESFJ.

Personality inventories and psychology assessments often utilize self-reporting for data collection. Examples include the MMPI, the 16PF Questionnaire, and the MBTI.

Advantages of Self-Report Data

One of the primary advantages of self-reporting is that it can be easy to obtain. It is also an important way that clinicians diagnose their patients—by asking questions. Those making the self-report are usually familiar with filling out questionnaires.

For research, it is inexpensive and can reach many more test subjects than could be analyzed by observation or other methods. It can be performed relatively quickly, so a researcher can obtain results in days or weeks rather than observing a population over the course of a longer time frame.

Self-reports can be made in private and can be anonymized to protect sensitive information and perhaps promote truthful responses.

Disadvantages of Self-Report Data

Collecting information through a self-reporting has limitations. People are often biased when they report on their own experiences. For example, many individuals are either consciously or unconsciously influenced by "social desirability." That is, they are more likely to report experiences that are considered to be socially acceptable or preferred.

Self-reports are subject to these biases and limitations:

  • Honesty : Subjects may make the more socially acceptable answer rather than being truthful.
  • Introspective ability : The subjects may not be able to assess themselves accurately.
  • Interpretation of questions : The wording of the questions may be confusing or have different meanings to different subjects.
  • Rating scales : Rating something yes or no can be too restrictive, but numerical scales also can be inexact and subject to individual inclination to give an extreme or middle response to all questions.
  • Response bias : Questions are subject to all of the biases of what the previous responses were, whether they relate to recent or significant experience and other factors.
  • Sampling bias : The people who complete the questionnaire are the sort of people who will complete a questionnaire. Are they representative of the population you wish to study?

Self-Report Info With Other Data

Most experts in psychological research and diagnosis suggest that self-report data should not be used alone, as it tends to be biased. Research is best done when combining self-reporting with other information, such as an individual’s behavior or physiological data.

This “multi-modal” or “multi-method” assessment provides a more global, and therefore more likely accurate, picture of the subject.

The questionnaires used in research should be checked to see if they produce consistent results over time. They also should be validated by another data method demonstrating that responses measure what they claim they measure. Questionnaires and responses should be easy to discriminate between controls and the test group.

If you are creating a self-report tool for psychology research, there are a few key steps you should follow. First, decide what type of data you want to collect. This will determine the format of your questions and the type of scale you use. 

Next, create a pool of questions that are clear and concise. The goal is to have several items that cover all the topics you wish to address. Finally, pilot your study with a small group to ensure it is valid and reliable.

When creating a self-report study, determine what information you need to collect and test the assessment with a group of individuals to determine if the instrument is reliable.

Self-reporting can be a useful tool for collecting data. The benefits of self-report data include lower costs and the ability to collect data from a large number of people. However, self-report data can also be biased and prone to errors.

Levin-Aspenson HF, Watson D. Mode of administration effects in psychopathology assessment: Analyses of gender, age, and education differences in self-rated versus interview-based depression . Psychol Assess. 2018;30(3):287-295. doi:10.1037/pas0000474

Tarescavage AM, Ben-Porath YS. Examination of the feasibility and utility of flexible and conditional administration of the Minnesota Multiphasic Personality Inventory-2-Restructured Form . Psychol Assess. 2017;29(11):1337-1348. doi:10.1037/pas0000442

Warner CH, Appenzeller GN, Grieger T, et al. Importance of anonymity to encourage honest reporting in mental health screening after combat deployment . Arch Gen Psychiatry . 2011;68(10):1065-1071. doi:10.1001/archgenpsychiatry.2011.112

Devaux M, Sassi F. Social disparities in hazardous alcohol use: Self-report bias may lead to incorrect estimates . Eur J Public Health . 2016;26(1):129-134. doi:10.1093/eurpub/ckv190

Althubaiti A. Information bias in health research: Definition, pitfalls, and adjustment methods . J Multidiscip Healthc . 2016;9:211-217. doi:10.2147/JMDH.S104807

Hopwood CJ, Good EW, Morey LC. Validity of the DSM-5 Levels of Personality Functioning Scale-Self Report . J Pers Assess. 2018;100(6):650-659. doi:10.1080/00223891.2017.1420660

By Kristalyn Salters-Pedneault, PhD  Kristalyn Salters-Pedneault, PhD, is a clinical psychologist and associate professor of psychology at Eastern Connecticut State University.

APS

The Science of Self-Report

BETHESDA, MARYLAND-The accuracy and reliability of reports of one’s own behavior and physical state are at the root of effective medical practice and valid research in health and psychology. To address this crucial element of research, the National Institutes of Health (NIH) held an informative conference here in November, “The Science of Self-report: Implications for Research & Practice,” at which more than 500 researchers and policymakers learned about many of the critical limits of “self-report” as a research tool as well as some of the latest techniques to enhance its effectiveness.

Sponsored by the Office of Behavioral and Social Science Research (OBSSR), the symposium drew participants from virtually every area of health and medicine policy, practice, and research. The issue of self-report as a primary tool in research and as an unavoidable component in health care is of central concern to medical and social science researchers and medical and psychology practitioners, and many other scientists.

Drawing on the expertise of disciplines ranging from anthropology to sociology, the conference’s 32 speakers and introducers featured 10 APS members, including the following NIH staff: Wendy Baldwin (NIH Office of Extramural Research deputy director), Norman Anderson (OBSSR director), Virginia Cain (OBSSR), Howard Kurtzman (National Institute of Mental Health), and Jaylan Turkkan (Program Co-Chair).

Value, Limits, and Improvements

”’The issue we have to consider regarding self-report data is not that it should be replaced by external measurements but that we will always need self-report about many behaviors that are simply going to be unobservable by anyone else. We’re going to need it because the interpretation of events may be important, and only the individual can provide those interpretations,” said Baldwin in the opening session initiating the two-day conference. But assessing patient compliance with medical regimens and eliciting medical histories are just two of the particularly important areas in which self-report data is routinely, and perhaps blindly, accepted as reliable in many current medical contexts.

“Consequently, the effort should be placed on improving the self-report measures, as opposed to just looking for weaknesses or how they can be replaced by external measures,” Baldwin emphasized in her comments that set the tone for the exceptionally practical conference. In fact, all speakers at the conference emphasized the invaluable nature of self-report measurements and called for a continual effort to improve their utility. “Where we have other validation, that’s great! But we have a very important job ahead of us to make sure that we can learn why self-report either works well or doesn’t, and when it works well, and when it doesn’t,” said Baldwin. Observational and experimental studies have shown that there are barriers to accuracy at every stage of the autobiographical report process- perception of the state of the self, encoding and storage of memory, understanding the question being asked, recalling the facts, and judging how and what to answer. And one intention of the conference was to systematical1y review the documented problems across several research and medical contexts.

Reporting Symptoms and Physiology

Psychologist James Pennebaker, Southern Methodist University, presented data from studies on the ability to perceive one’s own physical symptoms and other aspects of physiology such as heart rate.

“People are generally not good at this,” he finds, “but there are interesting sex differences.” In laboratory settings, men are better at perceiving their inner physiological states than are women, but the difference is largely erased when the studies are conducted in a more natural environment. This is because men and women emphasize different sources of information, when asked to define their internal states: Men rely more directly on internal bodily cues, while women rely more on situational cues. There is, of course, a lack of normal situational cues in the laboratory setting. One practical application of the skill of defining one’s internal state is that diabetics must be trained to monitor their own blood glucose levels. Having instead to resort to chemical testing for glucose is often impractical.

Reporting Pain

Pain is not a simple sensory event and is not proportional to tissue damage, reported APS Member Francis Keefe, of Duke University Medical Center’s Pain Management Program. In his discussion of the perception of pain, Keefe explained that pain is influenced by psychological, social and cultural factors, all of which act via a gating mechanism in the spinal cord, to influence the perception of pain. Also, the intensity of pain is separate from the degree of unpleasant affect associated with it, and this difference is reflected in pharmacology: While the drug fentanyl reduces the intensity of pain, diazepam reduces its unpleasantness.

Affect, in turn, modifies pain tolerance: A negative mood decreases tolerance for experimental pain in the cold pressor test, and affect at the time of pain influences the later recall of the intensity of the pain. Keefe says some pain specialists have advocated training patients with chronic pain (i.e., cancer patients) to be more emphatic and expressive in describing their pain to their doctors, in order to help ensure that adequate pain relief is prescribed. However, he says, many pain control techniques are effective because they influence affect and mood, more than they influence the intensity of the pain per se.

Reporting Data Through High- Tech “Diaries”

In his presentation on high-tech techniques to obtain self-report data, APS Member Saul Shiffman of the University of Pittsburgh’s Department of Psychology indicated that written daily or weekly diaries have not proven themselves very good for accurate recording of simple objective events like smoking. In fact, people often fail even to accurately enter many simple events into their memory, let alone document them on paper. To avoid the problem, he describes the technique of Ecological Momentary Assessment (EMA). EMA requires the subject/patient to carry a custom-designed palm-top computer, which prompts him throughout the day to answer a question (e.g., “Are you smoking right now?”). The question is posed according to the desired sampling, which can be purely random over time or contingent upon various other behaviors (like drinking coffee). By avoiding recall completely, this method can provide a very revealing picture of the subject’s pattern of behavior. It also generates great quantities of data, but the analysis of that data poses unique and controversial statistical problems, because they do not fit into the standard definitions of repeated measures.

Reporting Temporal Frequencies of Behavior From Memory

Several presenters stressed the problems posed by aspects of the mechanisms of memory encoding and recall. Norman Bradburn of the National Opinion Research Center and the University of Chicago was the first of many speakers to note that remembering is very definitely a reconstructive task. It typically suffers from several distortions, including the bundling of events, and the tendency to “telescope” events. or bring them forward in the past when remembering.

Rounding errors are frequent when self-reported time intervals approach conventional discrete units of time (e.g., an hour, a week, a month, a year). Events six or eight days ago tend to be remembered as “one week” ago, and whatever the unit of time (e.g., an hour, a week, a month, a year). Events six or eight days ago tend to be remembered as “one week” ago, and whatever the unit of time appropriate to the interval, errors are made in whole unit chunks rather than in parts of units. “We are more likely to think in terms of three weeks, than 20 days,” said Bradburn. “Many people do not enumerate events, even when we might expect the question to lead them to do so. Rather, they estimate the number of events on the basis of some rule.”

Sex Differences Reporting Temporal Facts

And, just as many have thought, women do remember dates better than men. To help the respondent reconstruct the past, the interviewer or questionnaire should ask questions that are structured according to the way in which the events are likely to be encoded. Memories are rarely linked to calendar dates but rather to notable life events (e.g., graduation from college). Roger Tourangeau of the National Opinion Research Center further analyzed the distinction between questions designed to encourage estimation and questions designed to encourage recall of individual events.

Decompositional Approach

And, Geeta Menon of New York University’s Department of Marketing has analyzed the role of the decompositional question in eliciting recall of regular versus sporadic behaviors. Should we simply ask the open-ended question “How many times did you do X last week?” Or, should we ask the same thing using a decompositional approach? For example, “How many times did you do X while driving? While sitting at home? While working? … ”

Menon’s research indicates that the open-ended question (“How many times did you do X in the last month?”) tends to encourage the subject to answer by referring to a “rule,” or an estimate of frequency. For regularly occurring behaviors this elicits accurate answers with the minimum of mental effort. For behaviors that are more sporadic, on the other hand, it is better to ask decompositional questions (i.e., to help the respondent by breaking the problem up into chunks). For irregular behaviors, a rule is less useful, and it is desirable to encourage the subject to recal1 each instance, using an enumeration strategy.

False and Forgotten Memories

Demonstrations that there can be both false negatives and false positives in memories of events that occurred long ago (or did not occur at all) have a particular relevance to the problem of sexual abuse of children. Speaking on the subject of false positives in memory, APS Fellow Elizabeth Loftus of the Psychology Department at the University of Washington presented findings—demonstrated in many experiments- that it is possible to create false memories. Such “memories” can be induced either by: (I) simply having the subject imagine a scenario vividly, and then later asking them to recount “memories” of similar events, or (2) by frankly telling a subject that a specific event happened and then reinforcing the associated “memory” by attempting to convince the subject of the authenticity of the event (e.g., by coaxing the subject with the question “Can’t you try to remember the time you got lost at the shopping mall?”).

People can import true memories from other events, thereby giving their false event memories seeming credibility, people can forget the source of a memory by wrongly attributing the memory of a fantasy to memory of a real event, and people make up completely unfounded facts, as well. The confidence one feels in the validity of one’s recall also has little correlation with its accuracy.

Linda Williams of the University of New Hampshire’s Family Research Laboratory has documented the other side of this issue, the false negative for a documented event. In these studies, children who were seen at hospitals for instances of sexual abuse were asked, many years later, to recall any such events. A substantial minority of the children, including those who had findings on physical exam that confirmed the abuse, failed to recall the instances. Interestingly, the forgetting was not correlated with the use of force or coercion by their abuser. The children were, however, more likely to forget abuse at the hands of individuals closest to them (i.e., in terms of familial relation, familiarity, or friendship).

Prolong the Pain

Psychologist Daniel Kahneman of the Woodrow Wilson School at Princeton University studies the memory of pain, as in painful medical procedures. Do we remember the quantity of pain as something like its intensity multiplied by its duration? Not at all. We remember an average of the moment of peak intensity and the pain at the end of the procedure. This has applications to colonoscopy, which is distinctly unpleasant, and for which one would like the subject to return for a repeat test every ten years. Strangely, Kahneman suggested, his research findings may mean that in order to make the long-term memory of the pain less severe, one should extend the time of the procedure, by keeping the colonoscope inserted, but not moving it. The pain is less for those last few minutes, even though we have added several minutes of diminished pain to the end of a painful experience.

Mood and Memory

APS Fellow John Kihlstrom of the Department of Psychology at Yale University took a logical and deductive approach to the problem of the influence of affect on memory. Although some experimenters have failed to find a link, he says, others have. There are some robust paradigms of mood-dependent memory. Because memory is reconstructive, not merely a readout of data, it is a cognitive task. Performance on other cognitive tasks is affected by mood, and so we should expect recall to be influenced by mood. For example, many mental patients report being abused in childhood. Is this a causal association or an example of preferential recall of mood congruent memories? What is needed to untangle this link, he says, are prospective studies.

Sensitive Topics

Nora Cate Shaeffer of the Department of Sociology at the University of Wisconsin-Madison addressed the problem of self-report in sensitive topics, such as sexual behavior or drug abuse. People will tend to present themselves in a positive light, sometimes to look good, and sometimes to “please” the researcher. The more serious an illegal behavior (e.g., the “harder” the drug), the less likely people are to report their recent use of it, while events in the distant past are less sensitive, and consequently are less likely to be concealed. Men tend to exaggerate their sexual histories, while women tend to understate them. But in any individual case, one doesn’t know how accurate a source is. Not only do people calculate the risk of revealing sensitive information (e.g., they may ask themselves “Will my spouse find out?” “Will the police find out?”), but they may even reinterpret the question, so as to allow themselves to answer evasively. (For example, a respondent may reason as follows: “Well, I did have that abortion, but I’m really not ‘the kind of person’ who would do that normally, so I’ll say “never.'” Or, “This interviewer has a hell of a nerve; it’s none of his business, ergo I don’t feel dishonest lying about this.”)

Medical Compliance

Cynthia Rand of the Johns Hopkins University Asthma and Allergy Center discussed the problem of medical noncompliance. This generates a problem for research as well as practice. If everyone in a study takes half as many pills as they say they did, the FDA-approved and officially sanctioned dosage will be twice as high as the dosage that most people reported worked best. (Yes, this suggests that to avoid an overdose of medication, it may be best to be no more compliant than the average participant in the clinical trial that determined the proper dose!) What can be done to increase the honesty of responses? For starters, a physician’s question such as “You’re taking the pills the way I prescribed, aren’t you?” is not likely to uncover any problems with compliance. It is important to discuss the patient’s experience with the regimen in more detail, to reveal possible problems or hidden issues.

Ethics in Self-report

APS Fellow Donald Bersoff of the Villanova University School of Law addressed the knotty problems arising from ethical considerations in asking sensitive questions. If a subject reports self-destructive behavior, should the researcher intervene? Does that violate confidentiality and thereby compromise the autonomy of the subject?

Bersoff implores researchers to at least address these issues before beginning research studies. For example, before undertaking a study on the attitudes of teenagers toward dangerous behaviors, researchers should consider what they will do if they find out that a teenager is contemplating suicide, or is using heroin. “Have a plan, have a policy, discuss the pros and cons of breaking confidentiality before the issue comes up,” said Bersoff. ”Too many researchers of sensitive topics don ‘t even think about what they will do, until they have in hand the information, and then they must agonize over their choices.”

Ethnic and Cultural Considerations

In many cases, the accuracy of a subject’s response depends on the understanding of the question. Spero Manson of the Department of Psychiatry at the University of Colorado’s School of Medicine has rewritten surveys specifically for Native American populations, and, with sensitivity to cross-cultural issues is able to raise the consistency of the scores very significantly. He cites one particular Indian culture in which it is considered very important never to give voice to certain negative thoughts; consequently, questions about suicidal ideation are either simply skipped by respondents at very high rates or are not answered frankly.

Efficient Screen for Depression

Ronald Kessler of Harvard Medical School’s Department of Health Care Policy has been developing a short screening test for major depression. A psychiatrist asks questions until he knows the answers he is seeking, but screening tests must be designed for administration by non-specialists, with minimal preparation. Kessler’s test, intended for screening large populations and subject to severe budget constraints, is an extreme version of this problem. The screen must not yield many false positives, it must be understandable by people of widely varying literacy and cultural backgrounds; 75% of the general population should score zero on the test, meaning that it is sensitive to only the serious cases. Interestingly, out of scores of possible questions, he has been able to narrow the survey to six very robust questions! They will be made available on the worldwide web at URLs http://www.umich.edu/~icpe/ or www.umich.edu/~ncsum/.

If You Can’t Beat Them, Join Them

Douglas Massey of the University of Pennsylvania’s Population Studies Center presented a novel approach to securing sensitive or personal data in his presentation titled “When surveys fail,” addressing the fact that many such research efforts simply demand that the researcher abandon the traditionally administered surveyor questionnaire. For a detailed study of undocumented workers from Mexico, for example, he has combined ethnography and surveys into an approach, called “ethnosurvey,” in which anthropologists get to personally know the members of a Mexican town and then travel to a town in the United States where many of the workers go to work.

By demonstrating their involvement in the community and their knowledge of its members and worker’s relatives, Massey and colleagues are able to establish trust, over a period of years, and to get answers about the laborers’ experiences, documenting answers to non-standardized questions in an extensive data recording sheet. But, of course, even ethnosurveys are plagued by the same problems of faulty recall and encoding that researchers using more standard surveys encounter.

Practical Implications for Symptoms, Illness, & Health

Linking the findings from self-report research directly to medical practice , speaker Arthur Barsky of Harvard Medical School’s Division of Psychiatry at Brigham and Women’s Hospital pointed out that there is a very poor correlation between the patient’s report of the seriousness of his symptoms, the medical findings of the presence of a pathological condition, and the patient’s utilization of health care.

Why, then, given the flawed nature of self-report of symptoms, is history-taking so important in medical practice? Several speakers reaffirmed the dogma that hi story-taking must come first. The implication would seem to be that the real skill of history-taking is in the ability to get useful information about the patient, despite the fact that his/her self-report is probably riddled with factual errors. As other speakers stated repeatedly during these two days, the respondent is always telling us something important. It just isn’t always the answer to the question we thought we were asking!

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

self report research method advantages and disadvantages

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

A new NIH report emphasizes the importance of behavioral science in improving health, observes that support for these sciences at NIH is unevenly distributed, and makes recommendations for how to improve their support at the agency.

self report research method advantages and disadvantages

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS has written to the U.S. Senate to encourage the integration of psychological science into a new draft bill focused on U.S. pandemic preparedness and response.

self report research method advantages and disadvantages

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

APS has responded to urge that psychological science expertise be included in the group’s personnel and activities.

Privacy Overview

Self-Report Measures

  • Reference work entry
  • pp 2240–2241
  • Cite this reference work entry

self report research method advantages and disadvantages

  • Molly E. Zimmerman 5  

120 Accesses

A common form of data collection in clinical research and practice is administration of a self-report measure in which an individual is asked to provide information about their subjective experiences and behaviors. Although less common, an individual’s caregiver may also be asked to complete self-report measures on that individual’s behalf, such as when a caregiver for a patient with dementia completes an activities of daily living questionnaire.

Current Knowledge

Self-report measures are usually completed in a paper-and-pencil format, although computer administration is utilized with increasing frequency. The scope of self-report measures may vary from assessment of a broad range of symptoms (e.g., childhood behavior disorders) to targeted assessment of specific symptoms (e.g., obsessive compulsive disorder). The selection of an appropriate self-report measure is dependent on the goal of the clinician’s assessment and is aided by consideration of the reliability and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References and Readings

Field, A., & Hole, G. (2003). How to design and report experiments . London: Sage.

Google Scholar  

Jex, S. M. (2002). Organizational psychology: A scientist-practitioner approach . New York: Wiley.

Kenny, M. C., Alvarez, K., Donohue, B. C., Winick, C. B. (2008). Overview of behavioral assessment with adults. In M. Hersen & J. Rosqvist (Eds.), Handbook of psychological assessment, case conceptualization, and treatment (Vol. 1). New York: Wiley.

Download references

Author information

Authors and affiliations.

Albert Einstein College of Medicine, 1165 Morris Park Ave Rousso Bldg Rm 310, 10461, Bronx, NY, USA

Molly E. Zimmerman

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Physical Medicine and Rehabilitation, and Professor of Neurosurgery, and Psychiatry Virginia Commonwealth University – Medical Center Department of Physical Medicine and Rehabilitation, VCU, 980542, Richmond, Virginia, 23298-0542, USA

Jeffrey S. Kreutzer

Kessler Foundation Research Center, 1199 Pleasant Valley Way, West Orange, NJ, 07052, USA

John DeLuca

Professor of Physical Medicine and Rehabilitation, and Neurology and Neuroscience, University of Medicine and Dentistry of New Jersey – New Jersey Medical School, New Jersey, USA

Independent Practice, 564 M.O.B. East, 100 E. Lancaster Ave., Wynnewood, PA, 19096, USA

Bruce Caplan

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry.

Zimmerman, M.E. (2011). Self-Report Measures. In: Kreutzer, J.S., DeLuca, J., Caplan, B. (eds) Encyclopedia of Clinical Neuropsychology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-79948-3_1244

Download citation

DOI : https://doi.org/10.1007/978-0-387-79948-3_1244

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-79947-6

Online ISBN : 978-0-387-79948-3

eBook Packages : Behavioral Science

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Measuring bias in self-reported data

Robert rosenman.

School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA

Vidhura Tennekoon

School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA ude.usw@aruhdiv

Laura G. Hill

Department of Human Development, Washington State University, 523 Johnson Tower, Pullman WA 99164, USA ude.usw@lliharual

Response bias shows up in many fields of behavioural and healthcare research where self-reported data are used. We demonstrate how to use stochastic frontier estimation (SFE) to identify response bias and its covariates. In our application to a family intervention, we examine the effects of participant demographics on response bias before and after participation; gender and race/ethnicity are related to magnitude of bias and to changes in bias across time, and bias is lower at post-test than at pre-test. We discuss how SFE may be used to address the problem of ‘response shift bias’ – that is, a shift in metric from before to after an intervention which is caused by the intervention itself and may lead to underestimates of programme effects.

1 Introduction

In this paper, we demonstrate the potential of a common econometric tool, stochastic frontier estimation (SFE), to measure response bias and its covariates in self-reported data. We illustrate the approach using self-reported measures of parenting behaviours before and after a family intervention. We demonstrate that in addition to affecting targeted behaviours, an intervention may also affect any bias associated with self-assessment of those behaviours. We show that SFE can be used to identify and correct for bias in self-assessment both before and after treatment, resulting in more accurate estimates of treatment effects.

Response bias is a widely discussed phenomenon in behavioural and healthcare research where self-reported data are used; it occurs when individuals offer self-assessed measures of some phenomenon. There are many reasons individuals might offer biased estimates of self-assessed behaviour, ranging from a misunderstanding of what a proper measurement is to social-desirability bias, where the respondent wants to ‘look good’ in the survey, even if the survey is anonymous. Response bias itself can be problematic in programme evaluation and research, but is especially troublesome when it causes a recalibration of bias after an intervention. Recalibration of standards can cause a particular type of measurement bias known as ‘response-shift bias’ ( Howard, 1980 ). Response-shift bias occurs when a respondent's frame of reference changes across measurement points, especially if the changed frame of reference is a function of treatment or intervention, thus, confounding the treatment effect with bias recalibration. More specifically, an intervention may change respondents’ understanding or awareness of the target concept and the estimation of their level of functioning with respect to the concept ( Sprangers and Hoogstraten, 1989 ), thus changing the bias at each measurement point. In fact, some treatments or interventions are intended to change how respondents look at the target concept. Further complicating matters is that an intervention may affect not only a respondent's metric for targeted behaviours across time points (resulting in response shift bias) but may also affect other types of response bias. For example, social desirability bias may decrease over the course of an intervention as respondents come to know and trust a service provider. Thus, it is necessary to understand the degree and type of response bias at both pretest and posttest in order to determine whether response shift has occurred.

When there is a potential for confusing bias recalibration with treatment outcomes, statistical approaches may be useful ( Schwartz and Sprangers, 1999 ). In recent years, researchers have applied structural equation modelling (SEM) to the problem of decomposing error in order to identify response shift bias ( Oort, 2005 ; Oort et al., 2005 ). In this paper, we suggest a different statistical approach which reveals response bias at a single time point as well as differences in bias across time points. Perhaps more importantly, it identifies covariates of these differences. When applied before and after an intervention, it reveals differences related to changes in respondents’ frame of reference. Thus, it can be used to decompose errors so that recalibration of the bias occurring across time points can be distinguished from simple response bias within each time point. The suggested approach is based on SFE ( Aigner et al., 1977 ; Battese and Coelli, 1995 ; Meeusen and van den Broeck, 1977 ), a technique widely used in economics and operational research.

Our approach has two significant advantages over that proposed by Oort et al. (2005) . Their approach reveals only aggregate changes in the responses and requires a minimum of two temporal sets of observations on the self-rating of interest as well as multiple measures of the item to be rated. SFE, to its credit, can identify response differences across individuals (as opposed to simply aggregate response shifts) with a single temporal observation and a single measure, so is much less data intensive. Moreover, since it identifies differences at the individual level, it allows the analyst to identify not only that responses differ by individual, but what characteristics are at the root of the differences. Thus, as long as more than one temporal observation is available for respondents, SFE can be used to systematically identify different types of response recalibration by looking at the changes at the individual level, and aggregating them. SFE again has an advantage because the causes of both bias and recalibration can be identified at the individual level.

What may superficially be seen as two disadvantages to SFE when compared to SEM approaches are actually common to both methods. First, both measure response (and therefore response shift) against a common subjective metric established by the norm of the data. In fact, any systematic difference by an individual from this norm is how we measure ‘response bias’. With both SEM and SFE, if an objective metric exists, the difference between the self-rating and the objective measure is easily established. A second apparent disadvantage is that SFE requires a specific assumption of a truncated distribution of the bias (although it is possible to test this assumption statistically). While SEM can reveal response shift on individual bias without such a strong assumption, aggregate changes become manifest only if “many respondents experience the same shift in the same direction” [ Oort, (2005) , p.595]. Hence, operationally the assumptions are nearly equivalent.

In next section, we explain how we model response bias and response recalibration within the SFE framework. In Section 3, we present our empirical application including the results of our baseline model and a model with heteroscedastic errors as a robustness check. In Section 4, we discuss the relative merits of the method we propose, together with its limitations and offer some conclusions.

2 Response bias and SFE

We are concerned with situations where individuals do not have an objective measure of some variable of interest which we denote Y * it , and we have to use a subjective measure (denoted Y it ) as a proxy instead. An unbiased estimate of the variable of interest Y * it can be defined as,

where Y it denotes the observed measurement, Y * it is the true attribute being measured and Z it represents variables other than Y * it . When Y it is self-reported Z it includes (often unobserved) variables affecting the frame of reference used by respondents for measuring Y * it and (1) is not assured. Within this context, response bias is simply the case that Y it | Y * it , Z it ≠ Y it | Y * it . The bias is upward if Y it | Y * it , Z it > Y it | Y * it and downward if the inequality goes the other way.

Our approach for measuring response bias and bias recalibration (change in response bias between two time periods) is based on the Battese and Coelli (1995) adaptation of the stochastic frontier model (SFE) independently proposed by Aigner et al. (1977) , and Meeusen and van den Broeck (1977) . Let

where Y i t ∗ is the true (latent) outcome, T denotes some treatment or intervention, 1 X it are variables other than the treatment that explain the outcome and ε i is a random error term. For identification, we assume that ε it is distributed iid N ( 0 , σ ε 2 ) . The observed self-reported outcome is a combination of true outcome and the response bias Y i t R .

We consider the specific case that the bias term Y i t R has a truncated-normal distribution

where u it is a random variable which accounts for response shift away from a subjective norm response level (usually called the ‘frontier’ in SFE) and is distributed N ( μ i t , σ u 2 ) independent of ε it . Moreover,

where the vector z it includes variables (other than the treatment) that explain the specific deviation from the response frontier. Subscript i indexes the individual observation and, subscript t denotes time. 2 Substituting (2), (4) and (5) in (3) we can write,

where φ(.) and Φ(.) are the standard normal probability density function and cumulative probability functions, respectively. Any treatment effect is given by β 0 in equation (6) . The normal relationship between the Xs and Y are given by β t . The last three terms on the right hand side represent the observation-specific response bias from this normal relationship. Treatment can affect both the maximum possible value of the measured outcome of a given individual (as defined by X it β t ), and the response bias. If treatment changes the response bias it will be indicated by the term δ 0 and the bias recalibration is given by

The estimated δ 0 coefficient on treatment indicates how treatment has changed response bias. If δ 0 = 0 there is no recalibration and the response bias, if it exists, is not affected by the treatment. Cross terms of treatment and other variables (that is, slope dummy variables) may be used if the treatment is thought to change the general way these other variables interact with functioning.

Recalibration can occur independently of the treatment effect. In fact, recalibration is sometimes a goal of the treatment or intervention in addition to the targeted outcome, which means a desired outcome is that δ ≠ 0 and Y i 1 | Y * it ≠ Y i 2 | Y * it for t ∈{1,2}. In other words, there is a change in individual measurement scale caused (and intended) by the intervention.

3 An application to evaluation of a family intervention

We applied SFE to examine response bias and recalibration in programme evaluations of a popular, evidence-based family intervention (the Strengthening Families Program for Parents and Youth 10–14, or SFP) ( Kumpfer et al., 1996 ). Families attend SFP once a week for seven weeks and engage in activities designed to improve family communication, decrease harsh parenting practices, and increase parents’ family management skills. At the beginning and end of a programme, parents report their level of agreement with various statements related to skills and behaviours targeted by the intervention (e.g., ‘I have clear and specific rules about my child's association with peers who use alcohol’). Consistent with the literature on response shift, we hypothesised that non-random bias would be greater at pretest than at posttest as parents changed their standards about intervention-targeted behaviours and became more conservative in their self-ratings. In other words, we expected that after the intervention parents would recalibrate their self-ratings downward, resulting in an underestimate of the programme's effects.

Our data consisted of 1437 parents who attended 94 SFP cycles in Washington State and Oregon from 2005 through 2009. 25% of the participants identified themselves as male, 72% as female, and 3% did not report gender. 27% of the participants identified themselves as Hispanic/Latino, 60% as White, 2% as Black, 4% as American Indian/Alaska Native, 3% as other or multiple race/ethnicity, and 3% did not report race/ethnicity. Almost 74% of the households included a partner or spouse of the attending parent, and 19% reported not having a spouse or partner. For almost 8% of the sample, the presence of a partner or spouse is unknown. Over 62% of our observations are from Washington State, with the remainder from Oregon.

3.2 Measures

The outcome measure consisted of 13 items assessing parenting behaviours targeted by the intervention, including communication about substance use, general communication, involvement of children in family activities and decisions, and family conflict. Items were designed by researchers of the programme's efficacy trial and information about the scale has been reported on elsewhere ( Spoth et al., 1995 ; Spoth et al., 1998 ). Cronbach's alpha (a measure of internal consistency) in the current data was .85 at both pretest and posttest. Items were scored on a 5-point Likert-type scale ranging from 1 (‘strongly disagree’) to 5 (‘strongly agree’).

Variables used in the analysis, including definitions and summary statistics, are presented in Table 1 . The average family functioning, as measured by the change in self-assessed parenting behaviours from the pretest to the posttest, increased from 3.98 to 4.27 after participation in SFP.

Variable names, descriptions and summary statistics

3.3 Procedure

Pencil-and-paper pretests were administered as part of a standard, ongoing programme evaluation on the first night of the programme, before programme content was delivered; posttests were administered on the last night of the programme. All data are anonymous; names of programme participants are not linked to programme evaluations and are unknown to researchers. The Institutional Review Board of Washington State University issued a Certificate of Exemption for the procedures of the current study.

We used SFE to estimate (pre- and post-treatment) family functioning scores as a function primarily of demographic characteristics. Based on previous literature ( Howard and Dailey, 1979 ), we hypothesised that the one-sided errors (response bias) would be downward, and preliminary analysis supported that hypothesis. 3 Additional preliminary analysis of which variables to include among z i (including a model using all the explanatory variables) led us to conclude that three variables determined the level of bias in the family functioning assessment – age, Latino/Hispanic ethnicity, and whether or not the functioning measure was a pretest or posttest assessment. We used the ‘xtfrontier’ routine in Stata to estimate the parameters of our models. Unlike the applications of SFE to technical efficiency estimation our model does not require log transforming the dependent variable.

3.4 The baseline model

The results of the baseline SFE model are shown in Table 2 . The Wald χ 2 statistic indicated that the regression was highly significant. Several demographic variables were found to influence the assessment of family functioning with conventional statistical significance. Males gave lower estimates of family functioning than did females and those with unreported gender. All non-White ethnic groups (and those with unreported race/ethnicity) assessed their family's functioning more highly than did White respondents. Participation in the Strengthening Families Program increased individuals’ assessments of their family's functioning.

SFE - total effects model

We assessed bias, and its change, from the coefficient estimates for the δ parameters where μ i = z i δ. Our first overall question was if, in fact, there was a one-sided error. Three measures of unexplained variation are shown in Table 2 : σ 2 = E (ε i – u i ) 2 is the variance of the total error, which can be broken down into component parts, σ u 2 = E ( u i 2 ) and σ ε 2 = E ( ε i 2 ) . The statistic γ = σ u 2 σ u 2 + σ ε 2 gives the percent of total unexplained variation attributable to the one-sided error. To ensure 0 ≤ γ ≤ 1 the model was parameterised as the inverse logit of γ and reported as inlgtgamma. Similarly, the model estimated the natural log of σ 2 , reported as lnsigma2, and used these estimates to derive σ 2 , σ ε 2 , σ u 2 and γ. As seen in the table the estimates for inlgtgamma was highly significant but the estimate for lnsigma2 had a p-value of 0.317, which means we cannot reject a hypothesis that all of the variation in the responses is due to respondent-specific bias. Hence, we found strong support for the one-sided variation that we call bias, and we saw that by far the most substantial portion of the unexplained variation in our data came from that source.

Three variables explained the level of bias. Latino/Hispanic respondents on average had more biased estimates of their family functioning. Looking again at equation (3) , we see that this means they, relative to other ethnic groups, underestimated their family functioning. However, we found that older participants had smaller biases, thus giving closer estimates of their family's relative functioning. Of primary interest is the estimate of the treatment effect. Participation in SFP strongly lowered the bias, on average.

3.5 Decomposing the measured change in functioning

The total change in the functioning score averaged 0.295. This total change consisted of two parts as indicated by the following:

Total change = Measured prescore − Measured postscore = (Real prevalue − Prevalue bias) − (Real postvalue − Postvalue bias) = Real change − (Postvalue bias − Prevalue bias) The term in parentheses is negative (the estimation indicates that treatment lowered the bias). Thus, the total change in the family functioning score underestimated the improvement due to SFP, although the measured post-treatment family functioning was not as large as it would seem from the reported family functioning scores, on average. Table 3 shows the average estimated bias by pre- and post-treatment, and the average change in bias, which was –0.133. Thus, the average improvement in family functioning was underestimated by this amount.

Averages of bias and change

Table 4 shows the results of a regression on bias change and demographic and other characteristics. Males and Black respondents had marginally larger bias changes, while those with race/ethnicity unreported had smaller bias changes. Since the bias change was measured as postscore bias minus prescore bias, this means that the bias changed less, on average, for male and Black respondents, but more, on average, for those whose race was unreported.

Regression of bias change

3.6 The SFE model with heteroscedastic error

One alternative to our baseline model (known as the total effects model in SFE terminology) which generated the results in Table 2 is a SFE model which allows for heteroscedasticity in ε i , u i , or both. More precisely, for this model, we maintained equation (3) but had E (ε 2 ) = ω ε w i and E ( u ) = ω u w i where ω ε and ω u are parameters to be estimated and w i are variables that explain the heteroscedasticity. We note that w i need not be the same in the two expressions, but since elements of ω ε and ω u can be zero we lose no generality by showing it as we do, and in fact in our application we used the same variables in both expressions, those that we used to explain μ in the first model. Table 5 reports the results of such a model. In this case, the one-sided error we ascribe to bias is evident from statistically significant parameters in the explanatory expressions for σ u 2 .

SFE with heteroscedasticity

We note first that the estimates in the main body of the equation were quantitatively and qualitatively very similar to those for the non-heteroscedastic SFE model. The only substantive change is that age was no longer significant at an acceptable p-value, and race unreported had a p-value of 0.1. All signs and magnitudes were similar. Once again, results indicated that participation in SFP (treatment) strongly improved functioning. Additionally, treatment lowered the variability of both sources of unexplained variation across participants. Th e decreased unexplained variation due to ε is likely explained by individuals having a better idea of the constructs assessed by scale items. For our purposes, the key statistic here is the coefficient of treatment explaining σ u 2 . The estimated parameter was negative and significant with a p-value = 0.03. Since the bias was one-sided we clearly can conclude that going through SFP lowered the variability of the bias significantly. Moreover, these estimates can be used to predict the bias of each observation, and with this model the average bias fell from 0.545 to 0.492, so while the biases were larger with this model, the decrease in the average (–0.63) was about one-half the decrease we saw in the first model.

4 Discussion and conclusions

As we noted earlier, bias in self-rating is of concern in a variety of research areas. In particular, the potential for recalibration of self-rating bias as a function of material or skills learned in an intervention has long been a concern to programme evaluators as it may result in underestimates of programme effectiveness ( Howard and Dailey, 1979 ; Norman, 2003 ; Pratt et al., 2000 ; Sprangers, 1989 ). However, in the absence of an objective performance measurement, it has not been possible to determine whether lower posttest scores truly represent response-shift bias or instead an actual decrement in targeted behaviours or knowledge (i.e., an iatrogenic effect of treatment). By allowing evaluators to test for a decrease in response bias from pretest to posttest, SFE provides a means of resolving this conundrum.

The SFE method, however, is not without problems. The main limitation is that the estimates rely on assumptions about the distributions of the two error components. Model identification requires that one of the error terms, the bias term in our application, to be one-sided. This, however, is not as strong an assumption as it looks, for two reasons. First, often there is prior information or theory that indicates the most likely direction for the bias. Second, the validity of the assumption can be tested statistically.

We presented SFE as a method to identify response bias and changes in response bias, within the context of self-reported measurements at individual and aggregate levels. Even though we proposed a novel application, the techniques not new, and has been widely used in economics and operational research for over three decades. The procedure is easily adoptable by researchers, since it is already supported by several statistical packages including Stata ( StataCorp., 2009 ) and Limdep ( Econometrica Software, Inc., 2009 ).

Response bias has long been a key issue in psychometrics, with response shift bias a particular concern in programme evaluation. However, almost all statistical attempts to address the issue have been confined to using SEM to test for response shift bias at the aggregate level. As noted in the introduction, our approach has three significant advantages over SEM techniques that try to measure response bias. SEM requires more data – multiple time periods and multiple measures, and measures bias only in the aggregate. SFE can identify bias with a single time period (although multiple observations are needed to identify bias recalibration) and identifies response biases across individuals. Perhaps the biggest advantage over SEM approaches is that SFE not only identifies bias but also provides information about the root causes of the bias. SFE allows simultaneously analysis about treatment effectiveness, causal factors of outcomes, and covariates to the bias, improving the statistical efficiency of the analysis over traditional SEM which often cannot identify causal factors and covariates to bias, and when it can, it requires two-step procedures. And since SFE allows the researcher to identify bias and causal factors at the individual level, it expands our ability to identify, understand, explain, and potentially correct for, response shift bias. Of course, bias at the individual level can be aggregated to measures comparable to what is learned through SEM approaches.

Acknowledgements

The authors would like to thank the anonymous referees. This study was supported in part by the National Institute of Drug Abuse (grants R21 DA025139-01Al and R21 DA19758-01). We thank the programme providers and families who participated in the programme evaluation.

Robert Rosenman is a Professor of Economics in the School of Economic Sciences at Washington State University. His current research aims to develop new approaches to measure of economic benefits of substance abuse prevention programmes. His research has appeared in journals such as the American Economic Review , Health Economics , Clinical Infectious Diseases and Health Care Management Science .

Vidhura Tennekoon is a Graduate student in the School of Economic Sciences at Washington State University. His research interests are in health economics, applied econometrics and prevention science with a current research focus in dealing with misclassification in survey data.

Laura G. Hill is a Psychologist and Associate Professor of Human Development at Washington State University. Her research focuses on the translation of evidence-based prevention programmes from research to practice and measurement of programme effectiveness in uncontrolled settings.

Reference to this paper should be made as follows: Rosenman, R., Tennekoon, V. and Hill, L.G. (2011). ‘Measuring bias in self-reported data’, Int. J. Behavioural and Healthcare Research , Vol. 2, No. 4/2011, pp. 320-332.

1 We present a single model that allows for pre- and post-intervention measurement of the outcome of interest and bias. If the self-reported data is not related to an intervention, β 0 and δ 0 (below) are identically 0 and there is only one time period, t .

2 Due to symmetry of the normal distribution, without loss of generality we can also assume that the bias distribution is right truncated.

3 When we tried to estimate the parameters of a model with one-sided errors upward the maximisation procedure failed to converge. A specification with one-sided errors upward but without a constant term converged, but a null hypothesis that there is a one-side error term was rejected with near certainty, indicating that there is no sizable upward response bias. A similar analysis but with the one-sided upward errors completely random (rather than dependent on treatment and other variables) was also rejected, again with near certainty. Thus, upward bias was robustly rejected.

Contributor Information

Robert Rosenman, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA.

Vidhura Tennekoon, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA ude.usw@aruhdiv .

Laura G. Hill, Department of Human Development, Washington State University, 523 Johnson Tower, Pullman WA 99164, USA ude.usw@lliharual .

  • Aigner D, Lovell CAK, Schmidt P. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics. 1977; 6 (1):21–37. [ Google Scholar ]
  • Battese GE, Coelli TJ. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics. 1995; 20 2 :325–332. [ Google Scholar ]
  • Econometrica Software, Inc. LIMDEP Version 9.0 [Computer Software] Econometrica Software, Inc.; Plainview, NY: 2009. [ Google Scholar ]
  • Howard GS. Response-shift bias: a problem in evaluating interventions with pre/post self-reports. Evaluation Review. 1980; 4 1 :93–106. DOI: 10.1177/0193841x8000400105. [ Google Scholar ]
  • Howard GS, Dailey PR. Response-shift bias: a source of contamination of self-report measures. Journal of Applied Psychology. 1979; 64 (2):144–150. [ Google Scholar ]
  • Kumpfer KL, Molgaard V, Spoth R. The strengthening families program for the prevention of delinquency and drug use. In: Peters RD, McMahon RJ, editors. Preventing Childhood Disorders, Substance Abuse, and Delinquency, Banff International Behavioral Science Series. Vol. 3. Sage Publications, Inc.; Thousand Oaks, CA, USA: 1996. pp. 241–267. [ Google Scholar ]
  • Meeusen W, van den Broeck J. Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review. 1977; 18 2 :435–444. [ Google Scholar ]
  • Norman G. Hi! How are you? Response shift, implicit theories and differing epistemologies. Quality of Life Research. 2003; 12 3 :239–249. [ PubMed ] [ Google Scholar ]
  • Oort FJ. Using structural equation modeling to detect response shifts and true change. Quality of Life Research. 2005; 14 3 :587–598. [ PubMed ] [ Google Scholar ]
  • Oort FJ, Visser MRM, Sprangers MAG. An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Quality of Life Research. 2005; 14 3 :599–609. [ PubMed ] [ Google Scholar ]
  • Pratt CC, McGuigan WM, Katzev AR. Measuring program outcomes: using retrospective pretest methodology. American Journal of Evaluation. 2000; 21 (3):341–349. [ Google Scholar ]
  • Schwartz CE, Sprangers MAG. Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Social Science & Medicine. 1999; 48 (11):1531–1548. [ PubMed ] [ Google Scholar ]
  • Spoth R, Redmond C, Shin C. Direct and indirect latent-variable parenting outcomes of two universal family-focused preventive interventions: Extending a public health-oriented research base. Journal of Consulting and Clinical Psychology. 1998; 66 (2):385–399. DOI: 10.1037/0022-006x.66.2.385. [ PubMed ] [ Google Scholar ]
  • Spoth R, Redmond C, Haggerty K, Ward T. A controlled parenting skills outcome study examining individual difference and attendance effects. Journal of Marriage and Family. 1995; 57 (2):449–464. DOI: 10.2307/353698. [ Google Scholar ]
  • Sprangers M. Subject bias and the retrospective pretest in retrospect. Bulletin of the Psychonomic Society. 1989; 27 (1):11–14. [ Google Scholar ]
  • Sprangers M, Hoogstraten J. Pretesting effects in retrospective pretest-posttest designs. Journal of Applied Psychology. 1989; 74 (2):265–272. DOI: 10.1037/0021-9010.74.2.265. [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 11 [Computer Software] StataCorp LP; College Station, TX: 2009. [ Google Scholar ]

self report research method advantages and disadvantages

Live revision! Join us for our free exam revision livestreams Watch now →

Reference Library

Collections

  • See what's new
  • All Resources
  • Student Resources
  • Assessment Resources
  • Teaching Resources
  • CPD Courses
  • Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

  • All Psychology Resources

Resource Selections

Currated lists of resources

Study Notes

Self-Report Techniques

Last updated 22 Mar 2021

  • Share on Facebook
  • Share on Twitter
  • Share by Email

Self-report techniques describe methods of gathering data where participants provide information about themselves without interference from the experimenter.

Such techniques can include questionnaires, interviews, or even diaries, and ultimately will require giving responses to pre-set questions.

Evaluation of self-report methods

- Participants can be asked about their feelings and cognitions (i.e. thoughts), which can be more useful than simply observing behaviour alone.

- Scenarios can be asked about hypothetically without having to physically set them up and observe participants’ behaviour.

Weaknesses:

- Gathering information about thoughts or feelings is only useful if participants are willing to disclose them to the experimenter.

- Participants may try to give the ‘correct’ responses they think researchers are looking for (or deliberately do the opposite), or try to come across in most socially acceptable way (i.e. social desirability bias), which can lead to giving untruthful responses.

  • Self-report techniques
  • Design of questionnaires and interviews
  • Questionnaire

You might also like

Questionnaires​, a level psychology topic quiz - research methods.

Quizzes & Activities

Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

Research Methods - Self Report Techniques

Model answer for question 11 paper 2: as psychology, june 2016 (aqa).

Exam Support

Example Answer for Question 10 Paper 1: AS Psychology, June 2017 (AQA)

​example answer for question 20 paper 2: a level psychology, june 2017 (aqa), our subjects.

  • › Criminology
  • › Economics
  • › Geography
  • › Health & Social Care
  • › Psychology
  • › Sociology
  • › Teaching & learning resources
  • › Student revision workshops
  • › Online student courses
  • › CPD for teachers
  • › Livestreams
  • › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

  • › Contact us
  • › Terms of use
  • › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

National Academies Press: OpenBook

Measurement Problems in Criminal Justice Research: Workshop Summary (2003)

Chapter: 3. comparison of self-report and official data for measuring crime, 3 comparison of self-report and official data for measuring crime.

Terence P. Thornberry and Marvin D. Krohn

T here are three basic ways to measure criminal behavior on a large scale. The oldest method is to rely on official data collected by criminal justice agencies, such as data on arrests or convictions. The other two rely on social surveys. In one case, individuals are asked if they have been victims of crime; in the other, they are asked to self-report their own criminal activity. This paper reviews the history of the third method—self-report surveys—assesses its validity and reliability, and compares results based on this approach to those based on official data. The role of the self-report method in the longitudinal study of criminal careers is also examined.

HISTORICAL OVERVIEW

The development and widespread use of the self-report method of collecting data on delinquent and criminal behavior together were one of the most important innovations in criminology research in the twentieth century. This method of data collection is used extensively both in the United States and abroad (Klein, 1989). Because of its common use, we often lose sight of the important impact that self-report studies have had on the study of the distribution and patterns of crime and delinquency, the etiological

This study was supported by the National Consortium on Violence Research.

study of criminality, and the study of the juvenile justice and criminal justice systems.

Sellin made the simple but critically important observation that “the value of a crime rate for index purposes decreases as the distance from the crime itself in terms of procedure increases” (1931:337). Thus, prison data are less useful than court or police data as a measure of actual delinquent or criminal behavior. Moreover, the reactions of the juvenile and criminal justice systems often rely on information from victims or witnesses of crime. It does not take an expert on crime to recognize that a substantial amount of crime is not reported and, if reported, is not officially recorded. Thus, reliance on official sources introduces a number of layers of potential bias between the actual behavior and the data. Yet, through the first half of the twentieth century, our understanding of the behavior of criminals and those who reacted to crime was based almost entirely on official data.

While researchers were aware of many of these limitations, the dilemma they faced was how to obtain valid information on crime that was closer to the source of the behavior. Observing the behavior taking place would be one method of doing so, but given the illegal nature of the behavior and the potential consequences if caught committing the behavior, participants in crime are reluctant to have their behavior observed. Even when observational studies have been conducted—for example, gang studies (e.g., Thrasher, 1927)—researchers could observe only a very small portion of the crimes that took place. Hence, observational studies had limited utility in describing the distribution and patterns of criminal behavior.

If one could not observe the behavior taking place, self-reports of delinquent and criminal behavior would be the data source nearest to the actual behavior. There was great skepticism, however, about whether respondents would be willing to tell researchers about their participation in illegal behaviors. Early studies (Porterfield, 1943; Wallerstein and Wylie, 1947) found that not only were respondents willing to self-report their delinquency and criminal behavior, they did so in surprising numbers.

Since those very early studies, the self-report methodology has become much more sophisticated in design, making it more reliable and valid and extending its applicability to myriad issues. Much work has been done to improve the reliability and validity of self-reports, including the introduction of specialized techniques intended to enhance the quality of self-report data. These developments have made self-report studies an integral part of the way crime and delinquency are studied.

Although the self-report method began with the contributions of

Porterfield (1943, 1946) and Wallerstein and Wylie (1947), the work of Short and Nye (1957, 1958) “revolutionized ideas about the feasibility of using survey procedures with a hitherto taboo topic” and changed how the discipline thought about delinquent behavior itself (Hindelang et al., 1981:23). Short and Nye’s research is distinguished from previous self-report measures in their attention to methodological issues, such as scale construction, reliability and validity, and sampling and their explicit focus on the substantive relationship between social class and delinquent behavior. A 21-item list of criminal and antisocial behaviors was used to measure delinquency, although in most of their analyses a scale comprised of a subset of only seven items was employed. Focusing on the relationship between delinquent behavior and the socioeconomic status of the adolescents’ parents, Nye et al. (1958) found that relatively few of the differences in delinquent behavior among the different socioeconomic status groups were statistically significant.

Short and Nye’s work stimulated much interest in both use of the self-report methodology and the relationship between some measure of social status (socioeconomic status, ethnicity, race) and delinquent behavior. The failure to find a relationship between social status and delinquency served at once to question extant theories built on the assumption that an inverse relationship did in fact exist and to suggest that the juvenile justice system may be using extra-legal factors in making decisions concerning juveniles who misbehave. A number of studies in the late 1950s and early 1960s used self-reports to examine the relationship between social status and delinquent behavior (Akers, 1964; Clark and Wenninger, 1962; Dentler and Monroe, 1961; Empey and Erickson, 1966; Erickson and Empey, 1963; Gold, 1966; Reiss and Rhodes, 1959; Slocum and Stone, 1963; Vaz, 1966; Voss, 1966). These studies advanced the use of the self-report method by applying it to different, more ethnically diverse populations (Clark and Wenninger, 1962; Gold, 1966; Voss, 1966), attending to issues concerning validity and reliability (Clark and Tifft, 1966; Dentler and Monroe, 1961; Gold, 1966), and constructing measures of delinquency that specifically addressed issues regarding offense seriousness and frequency (Gold, 1966). These studies found that, while most juveniles engaged in some delinquency, relatively few committed serious delinquency repetitively. With few exceptions, these studies supported the general conclusion that, if there were any statistically significant relationship between measures of social status and self-reported delinquent behavior, it was weak and clearly did not mirror the findings of studies using official data sources.

During this period of time researchers began to recognize the true potential of the self-report methodology. By including questions concerning other aspects of an adolescent’s life as well as a delinquency scale on the same questionnaire, researchers could explore a host of etiological issues. Theoretically interesting issues concerning the family (Dentler and Monroe, 1961; Gold, 1970; Nye et al., 1958; Stanfield, 1966; Voss, 1964), peers (Erickson and Empey, 1963; Gold, 1970; Matthews, 1968; Reiss and Rhodes, 1964; Short, 1957; Voss, 1964), and school (Elliott, 1966; Gold, 1970; Kelly, 1974; Polk, 1969; Reiss and Rhodes, 1963) emerged as the central focus of self-report studies. The potential of the self-report methodology in examining etiological theories of delinquency was perhaps best displayed in Hirschi’s (1969) Causes of Delinquency .

The use of self-report studies to examine theoretical issues continued throughout the 1970s. In addition to several partial replications of Hirschi’s arguments (Conger, 1976; Hepburn, 1976; Hindelang, 1973; Jensen and Eve, 1976), other theoretical perspectives such as social learning theory (Akers et al., 1979), self-concept theory (Jensen, 1973; Kaplan, 1972), strain theory (Elliott and Voss, 1974; Johnson, 1979), and deterrence theory (Anderson et al., 1977; Jensen et al., 1978; Silberman, 1976; Waldo and Chiricos, 1972) were evaluated using data from self-report surveys.

Another development during this period was the introduction of national surveys on delinquency and drug use. Williams and Gold (1972) conducted the first nationwide survey, with a probability sample of 847 boys and girls 13 to 16 years old. Monitoring the Future (Johnston et al., 1996) is a national survey on drug use that has been conducted annually since 1975. It began as an in-school survey of a nationally representative sample of high school seniors and was expanded to include eighth- and tenth-grade students.

One of the larger undertakings on a national level is the National Youth Survey (NYS), conducted by Elliott and colleagues (1985). The NYS began in 1976 by surveying a national probability sample of 1,725 youth ages 11 through 17. The survey design was sensitive to a number of methodological deficiencies of prior self-report studies and has been greatly instrumental in improving the self-report method. The NYS is also noteworthy because it is a panel design, having followed the original respondents into their thirties.

Despite the expanding applications of this methodology, questions remained about what self-report instruments measure. The discrepancy in findings regarding the relationship between social status and delinquency

based on self-report data versus official (and victim) data continued to perplex scholars. Early on, self-reports came under heavy criticism on a number of counts, including the selection of respondents and the selection of delinquency items. Nettler (1978:98) stated that “an evaluation of these unofficial ways of counting crime does not fulfill the promise that they would provide a better enumeration of offensive activity.” Gibbons (1979:84) was even more critical in his summary evaluation, stating:

The burst of energy devoted to self-report studies of delinquency has apparently been exhausted. This work constituted a criminological fad that has waned, probably because such studies have not fulfilled their early promise.

Two studies were particularly instrumental at that time in pointing to flaws in self-report measures. Hindelang and colleagues (1979) illustrated the problems encountered when comparing the results from studies using self-reports and those using official data or victimization data by comparing characteristics of offenders across the three data sources. They observed more similarity in those characteristics between victimization and Uniform Crime Reports data than between self-report data and the other two sources. They argued that self-report instruments did not include the more serious crimes for which people are arrested and that are included in victimization surveys. Thus, self-reports tap a different, less serious domain of behaviors than either of the other two sources, and discrepancies in observed relationships when using self-reports should not be surprising. The differential domain of crime tapped by early self-report measures could also explain the discrepancy in findings regarding the association between social status and delinquency.

Elliott and Ageton (1980) also explored the methodological shortcomings of self-reports. They observed that a relatively small number of youth commit a disproportionate number of serious offenses. However, most early self-report instruments failed to include serious offenses in the inventory and truncated the response categories for the frequency of offenses. In addition, many of the samples did not include enough high-rate offenders to clearly distinguish them from other delinquents. By allowing respondents to report the number of delinquent acts they committed rather than specifying an upper limit (e.g., 10 or more) and by focusing on high-rate offenders, Elliott and Ageton found relationships between engaging in serious delinquent behavior and race and social class that are more consistent with results from studies using official data.

Hindelang and colleagues (1979) and Elliott and Ageton (1980) sug

gested designing self-report studies so that they would acquire sufficient data from those high-rate, serious offenders who would be most likely to come to the attention of the authorities. They also suggested a number of changes in the way in which self-report data are measured, so that the data reflect the fact that some offenders contribute disproportionately to the rate of serious and violent delinquent acts.

The development of instruments to better measure serious offenses and the suggestion to acquire data from high-rate offenders coincided with a substantive change in the 1980s in the focus of much criminology work on the etiology of offenders. The identification of a relatively small group of offenders who commit a disproportionate amount of crime and delinquency led for a call to focus research efforts on “chronic” or “career” criminals (Blumstein et al., 1986; Wolfgang et al., 1972, 1987). Blumstein et al.’s observation that we need to study the careers of criminals, including early precursors of delinquency, maintenance through the adolescent years, and later consequences during the adult years, was particularly important in recognizing the need for examining the life-course development of high-rate offenders with self-report methodology.

The self-report methodology continues to advance in terms of both its application to new substantive areas and the improvement of its design. Gibbons’s (1979) suggestion that self-reports were just a fad, likely to disappear, is clearly wrong. Rather, with improvements in question design, administration technique, reliability and validity, and sample selection, this technique is being used in the most innovative research on crime and delinquency. The sections that follow describe the key methodological developments that have made such applications possible.

DEVELOPMENT OF THE SELF-REPORT METHOD

Self-report measures of delinquent behavior have advanced remarkably in the 30-odd years since their introduction by Short and Nye (1957). Considerable attention has been paid to the development and improvement of their psychometric properties. The most sophisticated and influential work was done by Elliott and colleagues (Elliott and Ageton, 1980; Elliott et al., 1985; Huizinga and Elliott, 1986) and by Hindelang, Hirschi, and Weis (1979, 1981). From their work a set of characteristics for acceptable (i.e., reasonably valid and reliable) self-report scales has emerged. Five of the most salient of these characteristics are the inclusion of (1) a wide array of offenses, including serious offenses; (2) frequency response sets; (3)

screening for trivial behaviors; (4) application to a wider age range; and (5) the use of longitudinal designs. Each is discussed below.

• Inclusion of a wide array of delinquency items . The domain of crime covers a wide range of behaviors, from petty theft to aggravated assault and homicide. If the general domain of delinquent and criminal behavior is to be represented in a self-report scale, it is necessary for the scale to cover that same wide array of human activity. Simply asking about a handful of these behaviors does not accurately represent the theoretical construct of crime. In addition, empirical evidence suggests that crime does not have a clear unidimensional structure that would facilitate the sampling of a small number of items from a theoretically large pool to adequately represent the entire domain.

These considerations suggest that an adequate self-report scale for delinquency will be relatively lengthy. Many individual items are required to represent the entire domain of delinquent behavior, to represent each of its subdomains, and to ensure that each subdomain (e.g., violence, drug use) is itself adequately represented.

In particular, it is essential that a general self-reported delinquency scale tap serious as well as less serious behaviors. Early self-report scales tended to ignore serious criminal and delinquent events and concentrated almost exclusively on minor forms of delinquency. Failure to include serious offenses misrepresents the domain of delinquency and contaminates comparisons with other data sources. In addition, it misrepresents the dependent variable of many delinquency theories (e.g., Elliott et al., 1985; Thornberry, 1987) that attempt to explain serious, repetitive delinquency.

• Inclusion of frequency response sets . Many early self-report studies relied on response sets with a relatively small number of categories, thus tending to censor high-frequency responses. For example, Short and Nye (1957) used a four-point response with the highest category being “often.” Aggregated over many items, these limited response sets had the consequence of lumping together occasional and high-rate delinquents, rather than discriminating between these behaviorally different groups.

• Screening for trivial behaviors . Self-report questions have a tendency to elicit reports of trivial acts that are very unlikely to elicit official reactions and even acts that are not violations of the law. This occurs more frequently with the less serious offenses but also plagues responses to serious

offenses. For example, respondents have included as thefts such pranks as hiding a classmate’s books in the respondent’s locker between classes, or as serious assault events that are really roughhousing between siblings.

Some effort must be made to adjust or censor the data to remove these events if the delinquency of the subjects is to be reflected properly and if the rank order of subjects with respect to delinquency is to be portrayed properly. Two strategies are generally available. First, one can ask a series of follow-up questions designed to elicit more information about an event, such as the value of stolen property, the extent of injury to the victim, and the like. Second, one can use an open-ended question asking the respondent to describe the event and then probe to obtain the information necessary to classify the act. Both strategies have been used with some success.

• Application to a wider age range . With increasing emphasis on the study of crime across the entire life course, self-report surveys have had to be developed to take into account both the deviant behavior of very young children and the criminal behavior of older adults. The behavioral manifestations of illegal behaviors or the precursors of such behavior can change depending on the stage in the life course at which the assessment takes place. For the very young child, measures have been developed that are administered to parents to assess antisocial behavior such as noncompliance, disobedience, and aggression (Achenbach, 1992). For the school-age child, Loeber and colleagues (1993) have developed a checklist that expands the range of antisocial behaviors to include such behaviors as stubbornness, lying, bullying, and other externalizing problems.

There has been less development of instruments targeted at adults. Weitekamp (1989) has criticized self-report studies for being primarily concerned with the adolescent years and simply using the same items for adults. This is particularly crucial given the concern over the small but very significant problem of chronic violent offenders.

• Use of longitudinal designs . Perhaps the most significant development in the application of the self-report methodology is its use in following the same subjects over time in order to account for changes in their criminal behavior. This has enabled researchers to examine the effect of age of onset, to track the careers of offenders, to study desistance, and to apply developmental theories to study both the causes and consequences of criminal behavior over the life course.

While broadening the range of issues that can be examined, application of the self-report technique within longitudinal panel designs introduces potential threats to the reliability and validity of the data. In addition to concern over construct continuity in applying the technique to different-aged respondents, researchers need to consider the possibility of panel or testing effects.

All of these newer procedures are likely to improve the validity, and to some extent the reliability, of self-report scales since they improve our ability to identify delinquents and to discriminate among different types of delinquents. These are clearly desirable qualities.

To gain these desirable qualities, however, requires a considerable expansion of the self-report schedule. This can be illustrated by describing the major components of the index currently being used in the Rochester Youth Development Study (Thornberry et al., in press) as well as the other two projects of the Program of Research on the Causes and Correlates of Delinquency (see Browning et al., 1999). The inventory includes 32 items that tap general delinquency and 12 that tap drug use, for a total of 44 items. For each item the subjects are asked if they committed the act since the previous interview. For all items to which they respond in the affirmative, a series of follow-up questions are asked, such as whether they had been arrested. In addition, for the most serious instance of each type of delinquency reported in the past six months, subjects are asked to describe the event by responding to the question: “Could you tell me what you did?” If that open-ended question does not elicit the information needed to describe the event adequately, a series of questions, which vary from 2 to 14 probes depending on the offense, are asked.

Although most of these specific questions are skipped for most subjects since delinquency remains a rare event, this approach to measuring self-reported delinquency is a far cry from the initial days of the method, when subjects used a few categories to respond to a small number of trivial delinquencies with no follow-up items. Below we evaluate the adequacy of this approach for measuring delinquent and criminal behavior.

RELIABILITY AND VALIDITY

For any measure to be scientifically worthwhile it must possess both reliability and validity. Reliability is the extent to which a measuring procedure yields the same result on repeated trials. No measure is absolutely, perfectly reliable. Repeated use of a measuring instrument will always pro

duce some variation from one application to another. That variation can be very slight or quite large. So the central question in assessing the reliability of a measure is not whether it is reliable but how reliable it is; reliability is always a matter of degree.

Validity is a more abstract notion. A measure is valid to the extent to which it measures the concept you set out to measure and nothing else. While reliability focuses on a particular property of the measure—namely, its stability over repeated uses—validity concerns the crucial relationship between the theoretical concept one is attempting to measure and what one actually measures. As is true with the case of reliability, the assessment of validity is not an either/or proposition. There are no perfectly valid measures, but some measures are more valid than others. We now turn to an assessment of whether self-reported measures of delinquency are psychometrically acceptable.

Assessing Reliability

There are two classic ways to assess the reliability of social science measures: test-retest reliability and internal consistency. Huizinga and Elliott (1986) make a convincing case that the test-retest approach is fundamentally more appropriate for assessing self-reported measures of delinquency.

Internal consistency means that multiple items measuring the same underlying concept should be highly intercorrelated. Although a reasonable expectation for attitudinal measures, this expectation is less reasonable for behavioral inventories such as self-report measures of delinquency. Current self-report measures typically include 30 to 40 items measuring a wide array of delinquent acts. Just because someone was truant is no reason to expect that they would be involved in theft or vandalism. Similarly, if someone reports that they have been involved in assaultive behavior, there is no reason to assume they have been involved in drug sales or loitering. Indeed, given the relative rarity of involvement in delinquent acts, it is very likely that most people will respond in the negative to most items and in the affirmative to only a few items. This is especially the case if we are asking about short reference periods (e.g., the past year or past six months). There is no strong underlying expectation that the responses will be highly intercorrelated, and therefore an internal consistency approach to assessing reliability may not be particularly appropriate. (See Huizinga and Elliott, 1986, for a more formal discussion of this point.)

Some theories of crime (e.g., Gottfredson and Hirschi, 1990; Jessor et al., 1991) assume there is an underlying construct, such as self-control, that generates versatility in offending. If so, there should be high internal consistency among self-reported delinquency items. While this result may be supportive of the theoretical assumption, it is not necessarily a good indicator of the reliability of the measures. If internal consistency were low, it may not have any implication for reliability but may simply mean that this particular theoretical assumption was incorrect. Nevertheless, we do note that studies that have examined the internal consistency of self-report measures generally find acceptable alpha coefficients. For example, Hindelang and colleagues report alphas between 0.76 and 0.93 for various self-report measures (1981:80).

We will focus our attention on the test-retest method of assessing reliability. This approach is quite straightforward. A sample of respondents is administered a self-reported delinquency inventory (the test) and then, after a short interval, the same inventory is readministered (the retest). In doing this the same questions and the same reference period should be used at both times.

The time lag between the test and the retest is also important. If it is too short, it is likely the answers provided on the retest will be a function of memory. If so, estimates of reliability would be inflated. If the time period between the test and the retest is too great, it is likely the responses given on the retest would be less accurate than those given on the test because of memory decay. In this case the reliability of the scale would be underestimated. There is no hard-and-fast rule for assessing the appropriateness of this lag, but somewhere in the range of one to four weeks appears to be optimal.

A number of studies have assessed the test-retest reliability of self-reported delinquency measures. In general, the results indicate that these measures are acceptably reliable. The reliability coefficients vary somewhat depending on the number and types of delinquent acts included in the index and the scoring procedures used (e.g., simple frequencies or ever-variety scores), but scores well above 0.80 are common. In summarizing much of the previous literature in this area, Huizinga and Elliott (1986:300) state:

Test-retest reliabilities in the 0.85 - 0.99 range were reported by several studies employing various scoring schemes and numbers of items and using test-retest intervals of from less than one hour to over two months (Kulik et al., 1968; Belson, 1968; Hindelang et al., 1981; Braukmann et al., 1979;

Patterson and Loeber, 1982; Skolnick et al., 1981; Clark and Tifft, 1966; Broder and Zimmerman, 1978).

Perhaps the most comprehensive assessment of the psychometric properties of the self-report method was conducted by Hindelang and his colleagues (1981). Their self-report inventory was quite extensive, consisting of 69 items divided into the following major subindices: official contact index, serious crime index, delinquency index, drug index, and school and family offenses index. While mindful of the limitations of internal consistency approaches, Hindelang and colleagues (1981) reported Cronbach’s alpha coefficients for a variety of demographic subgroups and for the ever-variety, last-year variety, and last-year frequency scores. The coefficients range from 0.76 to 0.93. Most of the coefficients are above 0.8, and 8 of the 18 coefficients are above 0.9.

Hindelang and colleagues (1981) also estimated test-retest reliabilities for these three self-report measures for each of the demographic subgroups. Unfortunately, only 45 minutes elapsed between the test and the retest, so it is quite possible the retest responses were strongly influenced by memory effects. Nevertheless, most of the test-retest correlations are above 0.9.

Hindelang et al. point out that reliability scores of this magnitude are higher than those typically associated with many attitudinal measures and conclude that “the overall implication is that in many of the relations examined by researchers, the delinquency dimension is more reliably measured than are many of the attitudinal dimensions studied in the research” (p. 82).

The other major assessment of the psychometric properties of the self-report method was conducted by Huizinga and Elliott (1986) using data from the NYS. At the fifth NYS interview, 177 respondents were randomly selected and reinterviewed approximately four weeks after their initial assessment. Based on these data, Huizinga and Elliott estimated test-retest reliability scores for the general delinquency index and a number of subindices. They also estimated reliability coefficients for frequency scores and variety scores.

The general delinquency index appears to have an acceptable level of reliability. The test-retest correlation for the frequency score is 0.75 and for the variety score, 0.84. For the various subindices—ranging from public disorder offenses to the much more serious index offenses—the reliabilities vary from a low of 0.52 (for the frequency measure of felony theft) to a high of 0.93 (for the frequency measure of illegal services). In total,

Huizinga and Elliott report 22 estimates of test-retest reliability (across indices and across frequency and variety scores), and the mean reliability coefficient is 0.74.

Another way Huizinga and Elliott assessed the level of test-retest reliability is by estimating the percentage of the sample that changed frequency responses by two or less. If the measure is highly reliable, few changes would be expected over time. For most subindices there appears to be acceptable reliability based on this measure. For example, for index offenses 97 percent of respondents changed their answers by two delinquent acts or less. Huizinga and Elliott (1986:303) summarized these results as follows:

Scales representing more serious, less frequently occurring offenses (index offenses, felony assault, felony theft, robbery) have the highest precision, with 96 to 100 percent agreement, followed by the less serious offenses (minor assault, minor theft, property damage), with 80 to 95 percent agreement. The public disorder and status scales have lower reliabilities (in the 40 to 70 percent agreement range), followed finally by the general SRD [self-reported delinquency] scale, which, being a composite of the other scales, not surprisingly has the lowest test-retest agreement.

Huizinga and Elliott did not find any consistent differences across sex, race, class, place of residence, or delinquency level in terms of test-retest reliabilities (see also Huizinga and Elliott, 1983).

Assessing Validity

There are several ways to assess validity. We concentrate on three: content validity, construct validity, and criterion validity.

Content Validity

Content validity is a subjective or logical assessment of the extent to which a measure adequately reflects the full domain, or full content, that is contained in the concept being measured. To argue that a measure has content validity, the following three criteria must be met. First, the domain of the concept must be defined clearly and fully. Second, questions or items must be created that cover the whole range of the concept under investigation. Third, items or questions must be sampled from that range so that the ones that appear on the test are representative of the underlying concept.

A reasonable definition of delinquency and crime is the commission of

behaviors that violate criminal law and that place the individual at some risk of arrest if the behavior were known to the police. Can a logical case be made that self-report measures of delinquency are valid in this respect?

As noted above, the earlier self-report inventories contained relatively few items to measure the full range of delinquent behaviors. For example, Short and Nye’s (1957) inventory contains only 21 items, and most of their analysis was conducted with a 7-item index. Similarly, Hirschi’s self-report measure (1969) is based on only 6 items. More importantly, the items included in these scales are clearly biased toward the minor or trivial end of the continuum.

The more recent self-report measures appear to be much better in this regard. For example, the Hindelang and colleagues (1981) index includes 69 items that range from status offenses, such as skipping class, to violent crimes, like serious assault and armed robbery. The NYS index (Elliott et al., 1985) has 47 items designed to measure all but one (homicide) of the Uniform Crime Reports Part I offenses, 60 percent of the Part II offenses, and offenses that juveniles are particularly likely to commit. The self-report inventory used by the three projects of the Program of Research on the Causes and Correlates of Delinquency has 32 items that measure delinquent behavior and 12 that measure substance use.

These more recent measures, while not perfect, tap into a much broader range of delinquent and criminal behaviors. As a result, they appear to have reasonable content validity.

Construct Validity

Construct validity refers to the extent to which the measure being validated is related in theoretically expected ways to other concepts or constructs. In our case the key question is: Are measures of delinquency based on the self-report method correlated in expected ways with variables expected to be risk factors for delinquency?

In general, self-report measures of delinquency and crime, especially the more recent longer inventories, appear to have a high degree of construct validity. They are generally related in theoretically expected ways to basic demographic characteristics and to a host of theoretical variables drawn from various domains such as individual attributes, family structure and processes, school performance, peer relationships, neighborhood characteristics, and so forth. Hindelang and colleagues (1981) offer one of the clearer assessments of construct validity. They correlate a number of etio

logical variables with different self-report measures collected under different conditions. With a few nonsystematic exceptions, the correlations are in the expected direction and of the expected magnitude.

Overall, construct validity may offer the strongest evidence for the validity of self-report measures of delinquency and crime. Indeed, if one examines the general literature on delinquent and criminal behavior, it is surprising how few theoretically expected relationships are not observed for self-reported measures of delinquency and crime. It is unfortunate that this approach is not used to assess validity more formally and more systematically.

Criterion Validity for Delinquency and Crime

Criterion validity “refers to the relationship between test scores and some known external criterion that adequately indicates the quantity being measured” (Huizinga and Elliott, 1986:308). There is a fundamental difficulty in assessing the criterion validity of self-reported measures of delinquency and crime and for that matter all measures of delinquency and crime. Namely, there is no gold standard by which to judge the self-report measure. That is, there is no fully accurate assessment that can be used as a benchmark. In contrast, to test the validity of self-reports of weight, people could be asked to self-report their weight and each respondent could then be weighed on an accurate scale—the external criterion. Given the secretive nature of criminal behavior, however, there is nothing comparable to a scale in the world of crime. As a result, the best that can be done is to compare different flawed measures of criminal involvement to see if there are similar responses and results. If so, the similarity across different measurement strategies heightens the probability that the various measures are tapping into the underlying concept of interest. While not ideal, this is the best that can be done in this area of inquiry.

There are several ways to assess criterion validity. One of the simplest is called known group validity. In this approach one compares scores for groups of people who are likely to differ in terms of their underlying involvement in delinquency. For example, the delinquency scores of seminarians would be expected to be lower than the delinquency scores of street gang members.

Over the years a variety of group comparisons have been made to assess the validity of self-report measures. They include comparisons between individuals with and without official arrest records, between individuals

convicted and not convicted of criminal offenses, and between institutionalized adolescents and high school students. In all cases these comparisons indicate that the group officially involved with the juvenile justice system self-reports substantially more delinquent acts than the other group. (See, for example, the work by Erickson and Empey, 1963; Farrington, 1973; Hardt and Petersen-Hardt, 1977; Hindelang et al., 1981; Hirschi, 1969; Kulik et al., 1968; Short and Nye, 1957; and Voss, 1963.)

While comparisons across known groups are helpful, they offer a minimal test of criterion validity. The real issue is not whether groups differ but the extent to which individuals have similar scores on the self-report measure and on other measures of criminal behavior. A variety of external criteria have been used (see the discussion in Hindelang et al., 1981). The two most common approaches are to compare self-reported delinquency scores with official arrest records and self-reports of arrest records with official arrest records.

We can begin by examining the correlation between self-reported official contacts and official measures of delinquency. These correlations are quite high in the Hindelang et al. study, ranging from 0.70 to 0.83. Correlations of this magnitude are reasonably large for this type of data. 1

The most recent investigation of this topic is by Maxfield, Weiler, and Widom (2000), using Widom’s (1989) sample of child maltreatment victims and their matched controls. Unlike most studies in this area, the respondents were adults (mean age = 28). They were interviewed only once, so all of the self-reported arrest data are retrospective, with relatively long recall periods. Nevertheless, the concordance between having an official arrest and a self-report of being arrested is high. Of those arrested, 73 percent reported an arrest. Maxfield et al. noted lower levels of reporting for females than males and for blacks than whites. The gender differences were quite persistent, but the race differences were more pronounced for less frequent offenders and diminished considerably for more frequent offenders.

Maxfield et al. also studied “positive bias,” the self-reporting of arrests that are not found in official records. They found that 21 percent of respondents with no arrest history self-reported being arrested. Positive bias

was higher for males than females, but there were no race differences. It is not clear whether this is a problem with the self-reports (i.e., positive bias) or with the official records such as sealed records, sloppy record keeping, use of aliases, and so forth. This is an understudied topic that needs greater investigation.

The generally high level of concordance between self-reports of being arrested or having a police contact and having an official record has been observed in other studies as well (Hardt and Petersen-Hardt, 1977; Hathaway et al., 1960; Rojek, 1983). When convictions are examined, even higher concordance rates are reported (Blackmore, 1974; and Farrington, 1973).

It appears that survey respondents are quite willing to self-report their involvement with the juvenile justice and criminal justice systems. Are they also willing to self-report their involvement in undetected delinquent behavior? This is the central question. The best way to examine this is to compare self-reported delinquent behavior and official measures of delinquency. If these measures are valid, a reasonably large positive correlation between them would be expected.

Hindelang and colleagues (1981) presented correlations using a number of different techniques for scoring the self-report measures, but here we focus on the average correlation across these different measures and on the correlation based on the ever-variety scores, as presented in their Figure 3. Overall, these correlations are reasonably high, somewhere around 0.60 for all subjects. The most important data though are presented for race-bygender groups. For white and African American females and for white males, the correlations range from 0.58 to 0.65 when the ever-variety score is used; for the correlations that are averaged across the different self-report measures, the magnitudes range from 0.50 to 0.60. For African American males, however, the correlation is at best moderate. For the ever-variety self-reported delinquency score, the correlation is 0.35, and the average across the other self-reported measures is 0.30.

Huizinga and Elliott (1986), using data from the NYS, also examined the correspondence between self-reports of delinquent behavior and official criminal histories. They recognized that there can be considerable slippage between these two sources of data even when the same event is actually contained in both data sets. For example, while an adolescent can self-report a gang fight, it may be recorded in the arrest file as disturbing the peace, or an arrest for armed robbery can be self-categorized as a mugging or theft by the individual. Because of this, Huizinga and Elliott pro

vided two levels of matching, “tight matches” and “broad matches.” The analysis provides information on both the percentage of people who provide tight and broad matches to their arrest records and the percentage of arrests that are matched by self-reported behavior.

For the tight matches, almost half of the respondents (48 percent) concealed or forgot at least some of their offensive behavior, and about a third (32 percent) of all the offenses were not reported. When the broad matches are used, the percentage of respondents concealing or forgetting some of their offenses dropped to 36 percent and the percentage of offenses not self-reported to 22 percent. While the rates of underreporting are substantial, it should be noted that the majority of individuals who have been arrested self-report their delinquent behavior, and the majority of offenses they commit also are reported.

The reporting rates for gender, race, and social class groupings are quite comparable to the overall rates, with one exception. As was the case with the Seattle data, African American males substantially underreported their involvement in delinquency.

Farrington and colleagues (1996), using data from the middle and oldest cohorts of the Pittsburgh Youth Study, also examined this issue. The Pittsburgh study, as one of three projects in the Program of Research on the Causes and Correlates of Delinquency, uses the same self-reported delinquency index as described earlier for the Rochester Youth Development Study. Farrington et al. classified each of the boys in the Pittsburgh study into one of four categories based on the seriousness of their self-reported delinquency: no delinquency, minor delinquency only, moderate delinquency only, and serious delinquency. They then used juvenile court petitions as an external criterion to assess the validity of the self-reported responses. Both concurrent validity and predictive validity were assessed.

Overall, this analysis suggests that there is a substantial degree of criterion validity for the self-report inventory used in the Program of Research on the Causes and Correlates of Delinquency. Respondents who are in the most serious category based on their self-report responses are significantly more likely to have juvenile court petitions, both concurrently and predictively. For example, the odds ratio of having a court petition for delinquency is about 3:0 for the respondents in the most serious self-reported delinquency category versus the other three.

African American males are no more or less likely to self-report delinquent behavior than white males. With few exceptions, the odds ratios comparing self-reported measures and official court petitions are signifi

cant for both African Americans and whites; in some cases the odds ratios are higher for whites, and in some cases they are higher for African Americans.

These researchers also compared the extent to which boys with official court petitions self-reported being apprehended by the police. Overall, about two-thirds of the boys with court petitions answered in the affirmative. Moreover, there was no evidence of differential validity. Indeed, the African American respondents were more likely to admit being apprehended by the police than were the white respondents. Farrington and his colleagues (1996:509) concluded that “concurrent validity for admitting offenses was higher for Caucasians but concurrent validity for admitting arrests was higher for African Americans. There were no consistent ethnic differences in predictive validity.”

Finally, Farrington and colleagues (2000) used data from the Seattle Social Development Project to assess the concurrent and predictive validity of self-report data. They compared self-report responses for a variety of indices and offense types to the odds of having a court referral. For the general delinquency index the concurrent odds ratio was 2:8 and the predictive odds ratio was 2:2. Validity was highest for self-reports of drug involvement and lowest for property offenses, with violent offenses falling in the middle.

Putting all this together leads to a somewhat mixed assessment of the validity of self-report measures. On the one hand, it seems that the overall validity of self-report data is in the moderate-to-strong range, especially for self-reports of being arrested. For the link between self-reported delinquent behavior and official measures of delinquency, the only link based on independent sources of data, the overall correlations are somewhat smaller but still quite acceptable. On the other hand, looking at the issue of differential validity, there are some disturbing differences by race. It is hard to determine whether this is a problem with the self-report measures, the official measures, or both. We will return to a discussion of this issue after additional data are presented.

Criterion Validity for Substance Use

The previous studies focused on delinquent or criminal behavior where, as mentioned earlier, there is no true external criterion for evaluating validity. There is an external criterion for one class of deviant behavior—substance use. Physiological data (e.g., from saliva or urine) can be

used to independently assess recent use of various substances. The physiological data can then be compared to self-reports of substance use to assess the validity of the self-report instruments. A few examples of this approach can be offered.

We begin with a study of a minor form of deviant behavior—adolescent tobacco use. Akers and colleagues (1983) examined tobacco use among a sample of junior and senior high school students in Muscatine, Iowa. The respondents provided saliva samples that were used to detect nicotine use by the level of salivary thiocyanate. The students also self-reported whether they smoked and how often. The self-report data had very low levels of either underreporting of tobacco use or overreporting. Overall, the study estimated that 95 to 96 percent of the self-reported responses were accurate and valid.

The Drug Use Forecasting (DUF) project (1990), sponsored by the National Institute of Justice, is an ongoing assessment of the extensiveness of drug use for samples of arrestees in cities throughout the country. Individuals who have been arrested and brought to central booking stations are interviewed and asked to provide urine specimens. Both the urine samples and the interviews are provided voluntarily, and there is an 80 percent cooperation rate for the urine samples and a 90 percent cooperation rate for the interviews. The urine specimens are tested for 10 different drugs, and in some of the interviews there is a self-reported drug use inventory. Assuming the urine samples provide a reasonably accurate estimate of actual drug use, they can be used to validate self-reported information.

The results vary considerably by type of drug. There is generally a fairly high concordance for marijuana use. For example, in 1990 in New York City 28 percent of the arrestees self-reported marijuana use and 30 percent tested positive for marijuana use. Similarly, in Philadelphia 28 percent self-reported marijuana use and 32 percent tested positive. The worst comparison in this particular examination of the Drug Use Forecasting data was from Houston, where 15 percent of arrestees self-reported marijuana use and 43 percent tested positive.

For more serious drugs, the level of underreporting is much more severe. For example, 47 percent of the New York City arrestees self-reported cocaine use and 74 percent tested positive. Very similar numbers were generated in Philadelphia, where 41 percent self-reported cocaine use but 72 percent tested positive. Similar levels of underreporting were observed for other hard drugs such as heroin and in other cities.

The data collected in the Drug Use Forecasting project are obviously

quite different from those collected in typical self-report surveys. The sample is limited to people who have just been arrested, and they are asked to provide self-incriminating evidence to a research team while in a central booking station. It is not entirely clear how this setting affects the results. On the one hand, individuals may be reluctant to provide additional self-incriminating evidence after having just been arrested. On the other hand, if one has just been arrested for a serious crime like robbery or burglary, admitting to recent drug use may not be considered a big deal. In any event, caution is needed in using these data to generalize to the validity of typical self-report inventories.

We have examined three different approaches to assessing the validity of self-reported measures of delinquency and crime: content, construct, and criterion validity. Several conclusions, especially for the more recent self-report inventories, appear warranted.

The self-report method for measuring this rather sensitive topic—undetected criminal behavior—appears to be reasonably valid. The content validity of the recent inventories is acceptable, the construct validity is quite high, and the criterion validity appears to be in the moderate-to-strong range. Putting this all together, it could be concluded that for most analytical purposes, self-reported measures are acceptably accurate and valid.

Despite this general conclusion, there are still several substantial issues concerning the validity of self-report measures. First, the validity of the earlier self-report scales, and the results based on them, are at best questionable. Second, based on the results of the tests of criterion validity, there appears to be a substantial degree of either concealing or forgetting past criminal behavior. While the majority of individual respondents report their offenses and the majority of all offenses are reported, there is still a good deal of underreporting.

Third, there is an unresolved issue of differential validity. As compared to other race-gender groups, some studies have found that the responses provided by African American males appear to have lower levels of validity (Hindelang et al., 1981; Huizinga and Elliott, 1986). More recently, however, Farrington et al. (1996) and Maxfield et al. (2000) found no evidence of differential validity by race. Maxfield and colleagues (2000) did find lower reporting for females than males. The level of differential validity is

one of the most important methodological issues confronting the self-report method and should be a high priority for future research efforts.

Fourth, based on studies of self-reported substance use, there is some evidence that validity may be less for more serious types of offenses. In the substance use studies, the concordance between the self-report and physiological measures was strongest for adolescent tobacco use, then for marijuana use, and it was weakest for hard drugs such as cocaine and heroin. A similar pattern is seen for several studies of self-reported delinquency and crime (e.g., Elliott and Voss, 1974; Huizinga and Elliott, 1986).

What then can be said about the psychometric properties of self-reported measures of delinquency and crime? With respect to reliability, this approach to measuring involvement in delinquency and crime appears to be acceptable. Most estimates of reliability are quite high, and there is no evidence of differential reliability. With respect to validity, the conclusion is a little murkier. There is a considerable amount of underreporting, and there is also the potential problem of differential validity. Nevertheless, content validity and construct validity appear to be quite high, and an overall estimate of criterion validity would be in the moderate-to-strong range. Perhaps the conclusion reached by Hindelang and colleagues (1981:114) is still the most reasonable:

The self-report method appears to behave reasonably well when judged by standard criteria available to social scientists. By these criteria, the difficulties in self-report instruments currently in use would appear to be surmountable; the method of self-reports does not appear from these studies to be fundamentally flawed. Reliability measures are impressive and the majority of studies produce validity coefficients in the moderate to strong range.

SPECIALIZED RESPONSE TECHNIQUES

Because of the sensitive nature of this area—asking people to report previously undetected criminal behavior—there has always been concern about how best to ask such questions to maximize the accuracy of the responses. Some early self-report researchers favored self-administered questionnaires while others favored more personal face-to-face interviews. Similarly, some argued that anonymous responses were inherently better than nonanonymous responses. In their Seattle study, Hindelang and his colleagues (1981) directly tested these concerns by randomly assigning respondents to one of four conditions: nonanonymous questionnaire, anonymous questionnaire, nonanonymous interview, and anonymous interview.

Their results indicate that there is no strong method effect in producing self-report responses, and that no one approach is consistently better than the other approaches. Similar results were reported by Krohn and his colleagues (1974). Some research, especially in the alcohol and drug use area, has found a method effect. For example, Aquilino (1994) found that admission of alcohol and drug use is lowest in telephone interviews, somewhat higher in face-to-face interviews, and highest in self-administered questionnaires (see also Aquilino and LoSciuto, 1990; Turner et al., 1992). While evident, the effect size is typically not very large.

Although basic method effects do not appear to be very strong, there is still concern that in all of these approaches to the collection of survey data, respondents will feel vulnerable about reporting sensitive information. Because of that, a variety of more specialized techniques have been developed to protect the individual respondent’s confidentiality, hopefully increasing the level of reporting.

Randomized Response Technique

The randomized response technique assumes that the basic problem with the validity of self-reported responses is that respondents are trying to conceal sensitive information; that is, they are unwilling to report undetected criminal behavior as long as there is any chance of others, including the researchers, linking the behavior to them. Randomized response techniques allow respondents to conceal what they really did while at the same time providing useful data to researchers. There are various ways to accomplish this, and how the basic process works can be illustrated with a simple example of measuring the prevalence of marijuana use. The basic question is: “Have you ever smoked marijuana?”

Imagine an interview setting in which there is a screen between the interviewer and respondent so that the interviewer cannot see what the respondent is doing. The interviewer asks a sensitive question (e.g., “Have you ever smoked marijuana?”) with the following special instruction: Before answering, please flip a coin. If the coin lands on heads, please answer “yes” regardless of whether or not you smoked marijuana. If the coin lands on tails, please tell me the truth. It is impossible for the interviewer to know whether a “yes” response is produced by the coin or by the fact that the respondent actually smoked marijuana. In this way the respondent can admit to sensitive behavior but other people, including the interviewer, do not know if the admission is truthful or not.

From the resulting data the prevalence of marijuana use can be estimated quite easily. Say we receive 70 “yes” responses in a sample of 100 respondents. Fifty of those would be produced by the coin landing on heads, and these 50 respondents can simply be ignored. Of the remaining 50 respondents though, 20 said “yes” because they smoked marijuana, so the prevalence of marijuana use is 20 out of 50, or 40 percent.

This technique is not limited to “yes” or “no” questions or to flipping coins. Any random process can be used as long as the probability distribution of bogus versus truthful responses is known. From these data, prevalence, variety, and frequency scores and means and variances can be estimated, and the information can be correlated with other variables, just as is done with regular self-report data.

Weis and Van Alstyne (1979) tested a randomized response procedure in the Seattle study. They concluded that the randomized response approach is no more efficient in eliciting positive responses to sensitive items than are traditional methods of data collection. This finding is consistent with the overall conclusion in the Seattle study that the method of administration is relatively unimportant.

The other major assessment of the randomized response technique was conducted by Tracy and Fox (1981). They sampled people who had been arrested in Philadelphia and then went to their homes to interview them. Respondents were asked if they had been arrested and, if so, how many times. There were two methods of data collection: a randomized response procedure and a regular self-report interview.

The results indicated that the randomized response approach does make a difference. For all respondents there was about 10 percent less error in the randomized response technique. For respondents who had been arrested only once, the randomized response approach actually increased the level of error. But for recidivists the randomized response technique reduced the level of error by about 74 percent.

Also, the randomized response technique generated random errors; that is, the errors were not correlated with other important variables. The regular self-reported interview, however, generated systematic error or bias. In this approach, underreporting was related to females, African American females, respondents with a high need for approval, lower-income respondents, and persons with a larger number of arrests.

Overall, it is not clear to what extent a randomized response approach actually generates more complete and accurate reporting. The two major studies of this topic produced different results: Weis and Van Alstyne (1979)

reported no effect, and Tracy and Fox (1981) reported sizable positive effects.

Computer-Assisted Interviewing

Advances in both computer hardware and software have made the introduction of computers in the actual data collection process not only a possibility but, according to Tourangeau and Smith (1996:276), “perhaps the most commonly used method of face-to-face data collection today.” The use of computers in the data collection process began in the 1970s with computer-assisted telephone surveys (Saris, 1991). The technology was soon adapted to the personal interview setting with either the interviewer administering the schedule, the computer-assisted personal interview, or the respondent self-administering the schedule by reading the questions on the computer screen and entering the responses—the computer-assisted self-administered interview (CASI). It is also possible to have an audio version in which the questions are recorded and the respondent listens to them on headphones rather than having them read aloud by the interviewer. This is called an audio computer-assisted self-administered interview (ACASI).

One reason for the use of computer-assisted data collection that is particularly relevant for this paper is its potential for collecting sensitive information in a manner that increases the confidentiality of responses. Another advantage is that it allows for the incorporation of complex branching patterns (Beebe et al., 1998; Saris, 1991; Tourangeau and Smith, 1996; Wright et al., 1998). Computer software can be programmed to include skip patterns and increase the probability that the respondent will answer all appropriate questions. An added advantage of computer-assisted presentation is that the respondent does not see the implication of answering in the affirmative to questions with multiple follow-ups.

ACASI has two additional advantages. First, it circumvents the potential problem of literacy; the respondent does not have to read the questions. Second, in situations where other people might be nearby, the questions and responses are not heard by anyone but the respondent. Hence, the respondent can be more assured that answers to sensitive questions will remain private.

While computer-assisted administration of sensitive questions provides obvious advantages in terms of efficiency of presentation and data collection, the key question is the difference in the responses that are elicited

when such technology is used. Tourangeau and Smith (1996) reviewed 18 studies that compared different modes of data collection. The types of behavior examined included health problems (e.g., gastrointestinal problems), sexual practices, abortion, and alcohol and drug use. Tourangeau and Smith indicate that techniques that are self-administered generally elicit higher rates of problematic behaviors than those administered by an interviewer. Moreover, CASIs elicit higher rates than either self-administered questionnaires or paper-and-pencil interviews administered by an interviewer. Also, ACASIs elicit higher rates than CASIs.

Estimates of prevalence rates of illegal and embarrassing behaviors appear to be higher when computer-assisted techniques, particularly those involving self-administration, are used. The higher prevalence rates need to be externally validated. The added benefits of providing for schedule complexity and consistency in responses make these techniques attractive, and it is clear that they will continue to be used with increasing frequency.

SELF-REPORT MEASURES ACROSS THE LIFE COURSE

One of the most exciting developments in criminology over the past 15 years has been the emergence of a life-course or developmental focus (Farrington, 1986; Jessor, 1998; Thornberry, 1997; Thornberry and Krohn, 2001; Weitekamp, 1989). Theoretical work has expanded from a narrow focus on the adolescent years to encompass the entire criminal careers of individuals, from the precursors of delinquency that are manifest in early childhood (Moffitt, 1997; Tremblay et al., 1999) through the high-delinquency years of middle and late adolescence, on into adulthood when most, but not all, offenders decrease their participation in illegal behaviors (Loeber et al., 1998; Moffitt, 1997; Sampson and Laub, 1990; Thornberry and Krohn, 2001). Research on criminal careers (Blumstein et al., 1986) has documented the importance of examining such issues as the age of onset (Krohn et al., 2001) and the duration of criminal activity (Wolfgang et al., 1987).

In addition, a growing body of research has demonstrated that antisocial behavior is rather stable from childhood to adulthood (Farrington, 1989a; Huesmann et al., 1984; Moffitt, 1993; Olweus, 1979). Much of this work has relied on official data. However, criminological research increasingly relies on longitudinal panel designs using self-report measures of antisocial behavior to understand the dynamics of criminal careers. Nevertheless, relatively little attention has been paid to the use of self-report tech

niques in longitudinal studies over the life course, even though this introduces a number of interesting measurement issues. Several of these issues are discussed in this section. Some of them involve the construction of valid measures at different developmental stages; others involve the consequences of repeated measures.

Construct Continuity

While many underlying theoretical constructs such as involvement in crime remain constant over time, their behavioral manifestations can change as subjects age. Failure to adapt measures to account for these changes may lead to age-inappropriate measures with reduced validity and reliability. To avoid this, measures need to adapt to the respondent’s developmental stage to reflect accurately the theoretical constructs of interest (Campbell, 1990; LeBlanc, 1989; Patterson, 1993; Weitekamp, 1989). In some cases this may mean defining the concept at a level to accommodate the changing contexts in which people at different ages act. In other cases it may mean recognizing that different behaviors at different ages imply consistency in behavioral style (Campbell, 1990).

Construct continuity creates a difficult design dilemma. If the measure does not change to reflect the developmental stage, the accuracy of the measure is likely to deteriorate and the study of change is compromised. Changing the measure over time, however, creates its own set of problems, especially for the study of change. If change is observed, is it a function of changes in the person’s behavior or of changes in the measure?

Relatively little attention has been paid to this issue in the study of criminal careers and, in particular, the study of self-report measures. At a more practical level, several studies have adapted self-report measures to both childhood and adulthood.

Self-Report Measures for Children

Antisocial behavior has been likened to a chimera (Patterson, 1993) with manifestations that change and accumulate with age. At very young ages (2 to 5 years) behavioral characteristics such as impulsivity, noncompliance, disobedience, and aggression are seen as early analogs of delinquent behavior. At these young ages, self-report instruments are not practical. Rather, researchers have measured these key indicators through either parental reports or observational ratings. Many studies of youngsters at

these ages have used Achenbach’s (1992) Child Behavior Checklist (CBCL), which is a parent-completed inventory and has versions for children as young as 2 to assess “externalizing” problem behaviors. Studies using either the CBCL, some other parental or teacher report of problem behaviors, or observational ratings have demonstrated that there is a relationship between these early manifestations of problem behavior and antisocial behavior in school-age children (Belsky et al., 1996; Campbell, 1987; Richman et al., 1982; Shaw and Bell, 1993). 2

Starting at school age, the range of antisocial behaviors expands to include stubbornness, lying, bullying, and other externalizing problems (Loeber et al., 1993). School-age children, even those as young as first grade, begin to participate in delinquent behaviors. However, self-report instruments of delinquent behavior have rarely been administered to preteenage children (Loeber et al., 1989). Some studies have administered self-report instruments to 10 or 11 year olds, slightly modifying the standard delinquency items (Elliott et al., 1985).

Loeber et al. (1989) provide one of the few attempts not only to gather self-report information from children under 10 but also to examine the reliability of those reports. They surveyed a sample of 849 first-grade and 868 fourth-grade boys using a 33-item self-reported antisocial behavior scale. This is a younger-age version of the self-reported delinquency index used by the three projects of the Program of Research on the Causes and Correlates of Delinquency. Items that were age appropriate were selected, and some behaviors were placed in a number of different contexts in order to make them less abstract for the younger children. A special effort was made to ensure that each child understood the question by preceding each behavior with a series of questions to ascertain whether the respondent knew the meaning of the behavior. If the child did not understand the question, the interviewer gave an example and then asked the child to do the same. If the child still did not understand the question, the item was skipped. The parents and teachers of these children also were surveyed using a combination of the appropriate CBCL and delinquency items.

Loeber and colleagues reported that the great majority of boys understood most of the items. First-grade boys did have problems understanding the items regarding marijuana use and sniffing glue, and fourth-grade boys had difficulty understanding the question regarding sniffing glue.

To assess the validity of self-reported delinquent behavior among elementary school children, Loeber and his colleagues compared the children’s self-reports to parental reports about similar behaviors. They found a surprisingly high degree of concordance between children’s and parents’ reports about the prevalence of delinquent behavior. This is especially true for behaviors that are likely to come to the attention of parents, such as aggressive behaviors and school suspension. Concordance was higher for first graders than fourth graders, which Loeber et al. suggest would be expected since parents would be more likely to know about any misbehavior that takes place at younger ages. These findings are encouraging and suggest that self-report instruments, if administered with concern for the age of the respondents, can be used for very young children.

Self-Report Measures for Adults

Interest in assessing antisocial behavior across the life span has also led to an increasing number of longitudinal surveys that have followed respondents from their adolescent years into early adulthood (e.g., Elliott, 1994; Farrington, 1989b; Hawkins et al., 1992; Huizinga et al., 1998; LeBlanc, 1989; Loeber et al., 1998; Krohn et al., 1997). The concern in constructing self-report instruments for adults is to include items that take into account the different contexts in which crime occurs at these ages (e.g., work instead of school), the opportunities for different types of offenses (e.g., domestic violence, fraud), the inappropriateness or inapplicability of offenses that appear on adolescent self-report instruments (e.g., status offenses), and the potential for very serious criminal behaviors, at least among a small subset of chronic violent offenders.

Weitekamp (1989) has criticized self-report studies for not only being predominantly concerned with the adolescent years but also, when covering the adult years, for using the same items used for juveniles. He argues that even such studies as the NYS (Elliott, 1994) do not include many items that are more serious, and therefore appropriate for adults, than the items included in the original Short and Nye study (1957). Weitekamp asserts that different instruments need to be used during different life stages. Doing so, however, raises questions about construct continuity. If researchers want to document the change in the propensity to engage in antisocial behavior throughout the life course, it must be assumed that different items used to measure antisocial behavior at different ages do indeed measure the same underlying construct. LeBlanc (1989) suggests that a strategy of in

cluding different but overlapping items on instruments covering different ages across the life span is the best compromise.

There have been relatively few assessments of the validity of self-report data collected from adults. The data that are available suggest that the validity of adult self-report data is not fundamentally different from that of adolescent self-report data, however. For example, the validity data from Maxfield and his colleagues (2000) and from the DUF project presented above are from adult samples. Their estimates of validity are in the same range as those of most adolescent surveys. Elliott (1994) has presented information from the NYS that suggests adult self-report data are more congruent with adult arrests than juvenile self-report data are with juvenile arrests.

Panel or Testing Effects

Developments in self-report methods have improved the quality of data collected and have expanded their applicability to the study of antisocial behavior throughout the life course. While these advances are significant, they have increased the potential for the data to be contaminated by testing or panel effects. Testing effects are any alterations of a respondent’s response to an item or scale that is caused by the prior administration of the same item or scale (Thornberry, 1989).

Improvements in self-report instruments have led to the inclusion of a longer list of items in order to tap more serious offenses, and often a number of follow-up questions are asked. The more acts that a respondent admits to, the longer the overall interview will take. The concern is that this approach will make respondents increasingly unwilling to admit to delinquent acts because those responses will increase the overall length of the interview. This effect likely would be unequally distributed across respondents because those who had the most extensive involvement in delinquency would have the most time to lose by answering affirmatively to the delinquency items.

It is also possible that the simple fact that a respondent is reinterviewed may create a generalized fatigue and lead to decreased willingness by the respondent to respond to self-report items. Research using the National Crime Victimization Survey found that the reduction in reporting was due more to the number of prior interviews than to the number of victimizations reported in prior interviews (Lehnen and Reiss, 1978).

Three studies have examined testing effects in the use of self-report

studies; all are based on data from the NYS (Elliott et al., 1985). They were conducted by Thornberry (1989), Menard and Elliott (1993), and Lauritsen (1998). The NYS surveyed a nationally representative sample of 1,725 youth ages 11 to 17 in 1976. The same subjects were reinterviewed annually through 1981. These data allow researchers to examine age-specific prevalence rates by the number of times a respondent was interviewed. For example, some respondents were 14 at the time of their first interview; some were 14 at their second interview (the original 13-year-old cohort); some were 14 at their third interview (the original 12-year-old cohort); and so forth. Because of this, a 14-year-old prevalence rate can be calculated from data collected when the respondents were interviewed for only the first time, from data collected when they were interviewed a second time, etc. If a testing or panel effect plays a role in response rates, the more frequently respondents are interviewed the lower the age-specific rates should be.

Thornberry (1989) analyzed these rates for 17 NYS self-report items representing the major domains of delinquency and, for the most part, the most frequently occurring items. The overall trend suggests a panel effect. For all offenses except marijuana use, comparisons between adjacent waves indicated that age-specific prevalence rates decreased more often than they increased. For example, comparing the rate of gang fights from wave to wave, Thornberry found that for 67 percent of the comparisons there was a decrease in age-specific prevalence rates, whereas there was an increase in only 20 percent of the comparisons and in 13 percent there was no change. The magnitude of the changes was substantial in many cases. For example, for stealing something worth $5 to $50, the rate for 15-year-olds dropped by 50 percent for 15-year-olds from wave 1 to wave 4.

The NYS did not introduce detailed follow-up questions to the delinquency items until the fourth wave of data collection. The data analyzed by Thornberry show that the decline in reporting occurred across all waves. Hence, it appears that the panel design itself, rather than the design of the specific questions, had the effect of decreasing prevalence rates. The observed decline in age-specific rates could be due to an underlying secular drop in offenses during these years (1976-1981). Cross-sectional trend data from the Monitoring the Future (MTF) study, which cannot be influenced by a testing effect, do not indicate any such secular decline (see Thornberry, 1989).

Menard and Elliott (1993) reexamined this issue using both NYS and MTF data. They rightfully pointed out that comparisons between these

studies need to be undertaken cautiously because of differences in samples, design features, item wording, and similar concerns. Menard and Elliott’s analysis also showed that at the item level, declining trends are more evident in the NYS data than the MTF data. Most of these year-to-year changes are not statistically significant, however. Menard and Elliott then used a modified Cox-Stuart trend test to examine short-term trends in delinquency and drug use. Overall, the trends for 81 percent of the NYS offenses are not statistically significant and about half of the MTF trends are. But an examination of the trends for the 16 items included in their Table 2 indicates that there are more declining trends in the NYS data, 9 of 16 for the 1976-1980 comparisons and 7 of 16 for the 1976-1983 comparisons, than there are for the MTF data, 3 of 16 in both cases. Menard and Elliott focus on the statistically significant effects, which do indicate fewer declining trends in the NYS than is evident when one focuses on all trends, regardless of the magnitude of the change.

More recently, Lauritsen (1998) examined this topic using hierarchical linear models to estimate growth curve models for general delinquency and serious delinquency. She limited her analysis to four of the seven cohorts in the NYS, those who were 11, 13, 15, and 17 years old at wave 1. For those who were 13, 15, or 17 at the start of the NYS, involvement in both general delinquency and serious delinquency decreased significantly over the next four years. For the 11-year-old cohort, the rate of change was also negative but not statistically significant. This downward trajectory in the rate of delinquent behavior for all age cohorts is not consistent with theoretical expectations or with what is generally known about the age-crime curve. Also, as Lauritsen points out, it is not consistent with other data on secular trends for the same time period (see also Thornberry, 1989; Osgood et al., 1989).

Finally, Lauritsen examined whether this testing effect is due to the introduction of detailed follow-up questions during wave 4 of the NYS or whether it appeared to be produced by general panel fatigue. Her analysis of individual growth trajectories indicates that the decline is observed across all waves. Thus she concludes, as Thornberry did, that the reduced reporting is unlikely to have been produced by the addition of follow-up questions.

Overall, Lauritsen offers two explanations for the observed testing effects. One concerns generalized panel fatigue, suggesting that as respondents are asked the same inventory at repeated surveys they become less willing to respond affirmatively to screening questions. The second expla

nation concerns a maturation effect in which there is change in the content validity of the self-report questions with age. For example, how respondents interpret a question on simple assault and the type of behavior they consider relevant for responding to the question may be quite different for 11 and 17 year olds. This would not account for the drop in the age-specific rates observed by Thornberry (1989), however.

The studies by Thornberry and Lauritsen suggest that it is likely there is some degree of panel bias in self-report data collected in longitudinal panel studies. The analysis by Menard and Elliott indicates that this is indeed just a suggestion at this point, as the necessary comparisons between panel studies and cross-sectional trend studies are severely hampered by the lack of comparability in item wording, administration, and other methodological differences. Also, if there are testing effects, neither Thornberry nor Lauritsen argues that they are unique to the NYS. It just so happens that the sequential cohort design of the NYS makes it a good vehicle for examining this issue. The presumption, unfortunately, is that if testing effects interfere with the validity of the NYS data, they also interfere with the validity of other longitudinal data containing self-report information. This is obviously a serious matter because etiological research has focused almost exclusively on longitudinal designs during the past 20 years. Additional research to identify the extensiveness of testing effects, their sources, and ways to remedy them are certainly a high priority.

Validity of Self-Reports Across Developmental Stages

Earlier we reviewed the literature that assessed the criterion validity of self-report data. Almost all of those studies assess criterion validity at a single point in time. There has been little systematic investigation of validity at different ages, especially for the same subjects followed over time. Because of that, we have begun to assess this issue using the self-report and official data collected in the Rochester Youth Development Study. As in previous studies, two comparisons can be made: (1) the prevalence of self-reported arrests versus the prevalence of official arrests and (2) the prevalence of self-reported delinquency and drug use versus the prevalence of official arrests. We combine the delinquency and drug use items into one self-report inventory since youth can be, and are, arrested for this full range of illegal behaviors. We expect, of course, positive correlations across these alternative measures of involvement in crime.

Table 3-1 presents the results for the total Rochester sample at each of

TABLE 3-1 Yule’s Q Comparing the Prevalence of Self-Reported and Official Data, Rochester Youth Development Study, Total Panel

11 waves of data, waves 2 through 12. This allows us to assess criterion validity from early adolescence (the average age at wave 2 is 14) to early adulthood (the average age at wave 12 is 22). The self-reported arrest measure asks respondents if they had been arrested or picked up by the police since the last interview. The self-reported delinquency data are for our general delinquency index, which includes a variety of offenses from trivial to serious. The official arrest file contains information on arrests and official warnings during the juvenile years and arrests during the adult years. Rochester city, Monroe County, and New York state files were searched. Each arrest was assigned to an interview wave.

There is a high degree of concordance between the official arrest histories and the self-reported arrest histories for the Rochester subjects. We use Yule’s Q, a standard measure of association for two-by-two contingency tables, that varies from 0 to 1 (Christensen, 1997). The average Yule’s Q is 0.81 across the 11 waves, and the range is from 0.68 to 0.89. Subjects who have an official contact or arrest were, generally speaking, willing to report that to their interviewers. There does not appear to be a strong developmental trend in the validity of these data.

The second panel in Table 3-1 presents the association between official arrests and self-reported general delinquency and drug use. If the self-report data are valid, it can be expected that subjects who report offending will be more apt to have an official record than subjects who do not. This is generally what we see, although consistent with the literature, these coefficients are somewhat lower than those in the top panel.

The average Yule’s Q across the 11 waves is 0.50, with a range between 0.41 and 0.64. Here there does seem to be a slight downward drift in the size of the relationship over time. During the first few waves, the correlations are in the 0.5 to 0.6 range, but by the last four waves they are in the 0.40 to 0.45 range. The coefficients for the early waves are similar to those reported in previous studies of adolescents (e.g., Hindelang et al., 1981).

It is not yet clear why these coefficients decline over time. The drop in the validity estimates for self-reported delinquency is consistent with a testing effect, although the major decline does not occur until the last few waves. The absence of a strong trend in the self-reported arrest data argues against a testing effect, however, since for most waves these questions were embedded in the self-report follow-up questions. An alternative explanation concerns the changing nature of criminal behavior. It is possible that offenses committed at these ages (early 20s) are less public and therefore somewhat less well correlated with arrest data.

A major question about the validity of self-report data concerns differential levels of reporting by race/ethnicity and gender. Table 3-2 presents comparisons for male and female respondents separately. Overall, there is a somewhat higher degree of validity for the female respondents than the males. The average Yule’s Q for the comparison between official arrests and self-reported arrests is 0.74 for males and 0.84 for females. There is no evidence of a strong developmental trend for these data. For the comparison between self-reported delinquency/drug use and official arrests, the average association is 0.43 for the males and 0.52 for the females. For the male respondents, the size of the coefficients tails off somewhat at the older ages. The results for females are unstable, probably because of the low number of females who were arrested at these six-month intervals.

Table 3-3 presents the results by race/ethnicity. When attention focuses on the association between self-reported arrests and official arrests, there is no evidence of differential validity. The mean Yule’s Q for African Americans is 0.82, for Hispanics 0.80, and for whites 0.83. There are no strong developmental trends across time for any of the three groups.

The comparison between self-reported delinquency/drug use and official arrests is hampered by our inability to estimate Yule’s Q for the white subjects. At 9 of the 11 waves there are empty cells and/or expected cell frequencies of less than 5. Nevertheless, there does seem to be some evidence of differential validity across racial groups. The mean for African Americans is 0.47; for Hispanics, 0.67.

Overall, when self-reported arrests and official arrests are compared, there is little evidence of differential attrition by gender or race/ethnicity and all the coefficients are reasonably high. For the comparison between self-reported delinquency/drug use and official arrests, however, validity is lower for African Americans than Hispanics. This finding is consistent with previous research and must be taken into account when using self-report data.

Similarity of Results

In the past quarter century criminological research has increasingly relied on longitudinal studies to describe and explain patterns of criminal behavior. Much of this research, especially the descriptive studies, has used official measures of crime, but there has been growing use of self-report data, especially in etiological studies. An important but understudied topic

is the extent to which these two measures provide the same or different results with respect to key criminal career parameters.

Farrington and colleagues (2000) have begun to address this issue with data from the Seattle Social Development Project. Focusing on the juvenile years, ages 11 to 17, they compared results based on self-reports to those based on court referrals. There was a good deal of similarity across the methods. In particular, similar patterns were found for variations in prevalence by age, the level of continuity in commission of offenses, and the relationship between age of onset and later frequency of committing offenses. There were also some notable differences. “In self-reports, prevalence and individual offending frequency were higher, the age of onset was earlier, and the concentration of offending was greater” (Farrington et al., 2000:21). Also, there was less variation in the individual offending rate by age for the official data compared to self-reports.

While this study is a good first step to take in exploring the issue, it is not yet clear whether the glass is half empty or half full. Additional investigation is needed to identify which criminal career parameters are similar and which are different, across a variety of data sets.

Longitudinal research has demonstrated a substantial degree of continuity in offending. Past offending is related to future offending in both official and self-report data (Farrington et al., 2000). A current controversy in the criminological literature is the source of this continuity. Some argue that it is generated by static, time-stable characteristics (persistent population heterogeneity); others argue that it is generated by dynamic, time-varying processes (state dependence). A number of studies have empirically examined whether the association between past and future offending persists after stable individual differences are taken into account (Nagin and Paternoster, 1991, 2000). If it does, we assume the association is due in part to dynamic processes. Previous studies have used both self-reported data and official data, but typically not on the same individuals. Unfortunately, the results vary somewhat by type of data. Studies based on self-reports are more apt to find a state dependence effect than are studies based on official data.

To examine this issue more systematically, Brame, Bushway, Paternoster, and Thornberry (2001) used both self-report data and official data on subjects in the Rochester Youth Development Study. Separate models were estimated for violent and property offenses, for self-report and official data, and for the younger (<13) and older (>14) groups in the Rochester sample, yielding a total of eight models.

TABLE 3-2 Yule’s Q Comparing the Prevalence of Self-Reported and Official Data, Rochester Youth Development Study, by Gender

TABLE 3-3 Yule’s Q Comparing the Prevalence of Self-Reported and Official Data, Rochester Youth Development Study, by Race/Ethnicity

In all but one case (violent offenses for the older group measured by official data) there is a positive effect of past offenses on future offenses after unobserved heterogeneity is held constant. In the exceptional case the number of arrests for violent offenses is so sparse over time that we do not think the estimate is reliable.

Overall, therefore, these results based on the same subjects suggest that self-report and official data yield the same substantive conclusion on this central issue. Both data sources indicate there are both static and dynamic processes at work that produce the observed association between past and future offenses.

CONCLUSIONS

The self-report method for measuring crime and delinquency has developed substantially since it was introduced a half century ago. It is now one of the fundamental ways to scientifically measure criminality, and it forms the bedrock of etiological studies. The challenges confronting this approach to measurement are daunting; after all, individuals are asked to tell about their own undetected criminality. Despite this fundamental challenge, the technique seems to be successful and capable of producing valid and reliable data.

Early self-report scales had substantial weaknesses, containing few items and producing an assessment of only minor forms of offending. Gradually, as the underlying validity of the approach became evident, the scales expanded in terms of breadth, seriousness, and comprehensiveness. Contemporary measures typically cover a wide portion of the behavioral domain included under the construct of crime and delinquency. These scales are able to measure serious as well as minor forms of crime, major subdomains (such as violence, property crimes, and drug use), and different parameters of criminal careers (such as prevalence, frequency, and seriousness) and identify high-rate as well as low-rate offenders. This is substantial progress for a measurement approach that began with a half dozen items and a four-category response set.

The self-report approach to measuring crime has acceptable, albeit far from perfect, reliability and validity. Of these two basic psychometric properties, the evidence for reliability is stronger. There are no fundamental challenges to the reliability of these data. Test-retest measures (and internal consistency measures) indicate that self-reported measures of delinquency are as reliable as, if not more reliable than, most social science measures.

Validity, as noted above, is much harder to assess as there is no gold standard by which to judge self-reports. Nevertheless, current scales seem to have acceptable levels of content and construct validity. The evidence for criterion validity is less clear-cut. At an overall level, criterion validity seems to be in the moderate-to-strong range. While there is certainly room for improvement, the validity appears acceptable for most analytical tasks. At a more specific level, however, there is a potentially serious problem with differential validity in that African American males have lower validity than do Hispanic males. Additional research on this topic is imperative.

While basic self-report surveys appear to be reliable and valid, researchers have experimented with a variety of data collection methods to improve the quality of reporting. Several of these attempts have produced ambiguous results; for example, there is no clear-cut benefit to the mode of administration (interview vs. questionnaire) or the use of randomized response techniques. There is one approach that appears to hold great promise— audio-assisted computerized interviews, which produce increased reporting of many sensitive topics, including delinquency and drug use. Greater use of this approach is warranted.

In the end, the available data indicate that the self-report method is an important and useful way to collect information about criminal behavior. The skepticism of early critics like Nettler (1978) and Gibbons (1979) has not been realized. Nevertheless, the self-report technique can clearly be improved. The final topic addressed in this chapter concerns suggestions for future research.

Future Directions

Much of our research on reliability and validity simply assesses these characteristics; there is much less research on improving their levels. For example, it is likely that both validity and reliability would be improved if we experimented with alternative items for measuring the same behavior and identified the strongest ones. Similarly, reliability and validity vary across subscales (e.g., Huizinga and Elliott, 1986); improving subscales will not only help them but also the overall scale as they are aggregated.

This chapter raised the issue of differential validity for African American males. It is crucial that more is learned about the magnitude of this bias and, if it exists, its source. Future research should address this issue directly and attempt to identify techniques for eliminating it. These re

search efforts should not lose sight of the fact that the problem may be with the criterion variable (official records) and not the self-reports.

The self-report method was developed in and for cross-sectional studies. Using it in longitudinal studies, especially ones that cover major portions of the life course, creates a new set of challenges. Maintaining the age appropriateness of the items while at the same time ensuring content validity is a knotty problem that we have just begun to address. There is some evidence that repeated measures may create testing effects. More research is needed to measure the size of this effect and its sources and to identify methods to reduce its threat to the validity of self-report data in the longitudinal studies so crucial to etiological investigation.

The similarities and differences in our understanding of criminal career parameters in self-report data and official data are just beginning to be investigated. This approach began with official data but is increasingly coming to rely on self-report data. It is important that we understand more about the validity of both types of data for these purposes.

Finally, we recommend that methodological studies be done in a cross-cutting fashion so that several of these issues—reliability and validity, improved item selection, assessing panel bias—can be investigated simultaneously. In particular it is important to examine all of these methodological issues when data are collected using audio-assisted computerized interviewing. For example, studies that have found differential validity or testing effects have all used paper-and-pencil interviews. Whether these same problems are evident under the enhanced confidentiality of audio interviews is an open question. It is clearly a high-priority one as well.

There is no dearth of work that can be done to assess and improve the self-report method. If the progress of the past half century is any guide, we are optimistic that the necessary studies will be conducted and that they will improve this basic way of collecting data on criminal behavior.

Achenbach, T.M. 1992 Manual for the Child Behavior Checklist/2-3 and 1992 Profile . Burlington: University of Vermont.

Akers, R.L. 1964 Socio-economic status and delinquent behavior: A retest. Journal of Research in Crime and Delinquency 1:38-46.

Akers, R.L., M.D. Krohn, L. Lanza-Kaduce, and M. Radosevich 1979 Social learning and deviant behavior: A specific test of a general theory. American Sociological Review 44:636-655.

Akers, R.L., J. Massey, W. Clarke, and R.M. Lauer 1983 Are self-reports of adolescent deviance valid? Biochemical measures, randomized response, and the bogue pipeline in smoking behavior. Social Forces 62(September):234-251.

Anderson, L.S., T.G. Chiricos, and G.P. Waldo 1977 Formal and informal sanctions: A comparison of deterrent effects. Social Problems 25:103-112.

Aquilino, W.S. 1994 Interview mode effects in surveys of drug and alcohol use. Public Opinion Quarterly 58:210-240.

Aquilino, W.S., and L. LoSciuto 1990 Effects of interview mode on self-reported drug use. Public Opinion Quarterly 54:362-395.

Beebe, T.J., P.A. Harrison, J.A. McRae, Jr., R.E. Anderson and J.A. Fulkerson 1998 An evaluation of computer-assisted self-interviews in a school setting. Public Opinion Quarterly 62:623-632.

Belsky, J., S. Woodworth, and K. Crnic 1996 Troubled family interaction during toddlerhood. Development and Psychopathology 8:477-495.

Belson, W.A. 1968 The extent of stealing by London boys and some of its origins. Advancement of Science 25:171-184.

Blackmore, J. 1974 The relationship between self reported delinquency and official convictions amongst adolescent boys. British Journal of Criminology 14:172-176.

Blumstein, A., J. Cohen, J.A. Roth, and C.A. Visher 1986 Criminal Careers and Career Criminals . Washington, DC: National Academy Press.

Brame, R., S. Bushway, R. Paternoster, and T.P. Thornberry 2001 Temporal Linkages in Violent and Nonviolent Criminal Activity. Unpublished manuscript, University of South Carolina, Columbia.

Braukmann, C.J., K.A. Kirigin, and M.M. Wolf 1979 Social Learning and Social Control Perspectives in Group Home Delinquency Treatment Research. Paper presented to the American Society of Criminology, Philadelphia.

Broder, P.K., and J. Zimmerman 1978 Establishing the Reliability of Self-Reported Delinquency Data . Williamsburg, VA: National Center for State Courts.

Browning, K., D. Huizinga, R. Loeber, and T.P. Thornberry 1999 Causes and Correlates of Delinquency Program , Fact Sheet, Washington, DC: U.S. Department of Justice, Office of Juvenile Justice and Delinquency Prevention.

Campbell, S.B. 1987 Parent-referred problem three-year-olds: Developmental changes in symptoms. Journal of Child Psychology and Psychiatry 28:835-845.

Campbell, S.B. 1990 Behavioral Problems in Preschool Children: Clinical and Developmental Issues . New York: Guilford Press.

Christensen, R. 1997 Log-Linear Models and Logistic Regression , 2nd ed. New York: Springer-Verlag.

Clark, J.P., and L.L. Tifft 1966 Polygraph and interview validation of self-reported delinquent behavior. American Sociological Review 31:516-523.

Clark, J.P., and E.P. Wenninger 1962 Socioeconomic class and area as correlates of illegal behavior among juveniles. American Sociological Review 28:826-834.

Conger, R. 1976 Social control and social learning models of delinquency: A synthesis. Criminology 14:17-40.

Drug Use Forecasting 1990 Drug Use Forecasting Annual Report: Drugs and Crime in America . Washington, DC: U.S. Department of Justice.

Dentler, R.A., and L.J. Monroe 1961 Social correlates of early adolescent theft. American Sociological Review 26:733-743.

Elliott, D.S. 1966 Delinquency school attendance and dropout. Social Problems 13:306-318.

1994 Serious violent offenders: Onset, developmental course, and termination. Criminology 32:1-21.

Elliott, D.S., and S.S. Ageton 1980 Reconciling race and class differences in self-reported and official estimates of delinquency. American Sociological Review 45:95-110.

Elliott, D.S., and H.L. Voss 1974 Delinquency and Dropout . Lexington, MA: D.C. Heath.

Elliott, D.S., D. Huizinga, and S.S. Ageton 1985 Explaining Delinquency and Drug Use . Beverly Hills, CA: Sage.

Empey, L.T., and M. Erickson 1966 Hidden delinquency and social status. Social Forces 44(June):546-554.

Erickson, M. and L.T. Empey 1963 Court records, undetected delinquency and decision-making. Journal of Criminal Law, Criminology, and Police Science 54:456-469.

Farrington, D.P. 1973 Self-reports of deviant behavior: Predictive and stable? Journal of Criminal Law and Criminology 64:99-110.

1986 Stepping Stones to Adult Criminal Careers. In Development of Antisocial and Prosocial Behavior , D. Olweus, J. Block, and M.R. Yarrow, eds. New York: Academic Press.

1989a Early predictors of adolescent aggression and adult violence. In Violence and Victims . Washington, DC: Springer.

1989b Self-reported and official offending from adolescence to adulthood. In Cross-

National Research in Self-Reported Crime and Delinquency , M.W. Klein, ed. Los Angeles: Kluwer Academic Publishers.

Farrington, D.P., R. Loeber, M. Stouthamer-Loeber, W.B. Van Kammen, and L. Schmidt 1996 Self-reported delinquency and a combined delinquency seriousness scale based on boys, mothers, and teachers: Concurrent and predictive validity for African-American and Caucasians. Criminology 34:493-517.

Farrington, D.P., D. Jolliffe, J.D. Hawkins, R.F. Catalano, K.G. Hill, and R. Kosterman 2000 Comparing Delinquency Careers in Court Records and Self-reports. Unpublished manuscript, Cambridge University, United Kingdom.

Gibbons, D.C. 1979 The Criminological Enterprise: Theories and Perspectives . Englewood Cliffs, NJ: Prentice-Hall.

Gold, M. 1966 Undetected delinquent behavior. Journal of Research in Crime and Delinquency 3:27-46.

1970 Delinquent Behavior in an American City . Belmont, CA: Brooks/Cole.

Gottfredson, M.R., and T. Hirschi 1990 A General Theory of Crime . Stanford, CA: Stanford University Press.

Hardt, R.H., and S. Petersen-Hardt 1977 On determining the quality of the delinquency self-report method. Journal of Research in Crime and Delinquency 14:247-261.

Hathaway, R.S., E.D. Monachesi, and L.A. Young 1960 Delinquency rates and personality. Journal of Criminal Law, Criminology, and Police Science 50:433-440.

Hawkins, J.D., R.F. Catalano, and J.Y. Miller 1992 Risk and protective factors for alcohol and other drug problems in adolescence and early adulthood: Implications for substance abuse prevention. Psychological Bulletin 112:64-105.

Hepburn, J.R. 1976 Testing alternative models of delinquency causation. Journal of Criminal Law and Criminology 67:450-460.

Hindelang, M.J. 1973 Causes of delinquency: A partial replication and extension. Social Problems 20: 471-487.

Hindelang, M.J., T. Hirschi, and J.G. Weis 1979 Correlates of delinquency: The illusion of discrepancy between self-report and official measures. American Sociological Review 44:995-1014.

1981 Measuring Delinquency . Beverly Hills, CA: Sage.

Hirschi, T. 1969 Causes of Delinquency . Berkeley: University of California Press.

Huesmann, L.R., L.D. Eron, M.M. Lefkowitz, and L.O. Walder 1984 The stability of aggression over time and generations. Developmental Psychology 20:1120-1134.

Huizinga, D., and D.S. Elliott 1983 A Preliminary Examination of the Reliability and Validity of the National Youth

Survey Self-Reported Delinquency Indices . National Youth Survey Project Report 27. Boulder, CO: Behavioral Research Institute.

1986 Reassessing the reliability and validity of self-report delinquent measures. Journal of Quantitative Criminology 2:293-327.

Huizinga, D., A.W. Weiher, S. Menard, R. Espiritu, and F.A. Esbensen 1998 Some not so boring findings from the Denver Youth Survey. Paper presented at the annual meeting of American Society of Criminology, Washington, D.C.

Jensen, G.F. 1973 Inner containment and delinquency. Journal of Criminal Law and Criminology 64:464-470.

Jensen, G.F., and R. Eve 1976 Sex differences in delinquency. Criminology 13:427-448.

Jensen, G.F., M.L. Erickson, and J.P. Gibbs 1978 Perceived risk of punishment and self-reported delinquency. Social Forces 57:57-78.

Jessor, R. 1998 New Perspectives on Adolescent Risk Behavior . New York: Cambridge University Press.

Jessor, R., J.E. Donovan, and F.M. Costa 1991 Beyond Adolescence: Problem Behavior and Young Adult Development . Cambridge, England: Cambridge University Press.

Johnson, R.E. 1979 Juvenile Delinquency and Its Origins . Cambridge, England: Cambridge University Press.

Johnston, L.D., P.M. O’Malley, and J.G. Bachman 1996 National Survey Results on Drug Use from the Monitoring the Future Study, 1975-1995 . Washington, DC: U.S. Government Printing Office.

Kaplan, H.B. 1972 Toward a general theory of psychosocial deviance: The case of aggressive behavior. Social Science and Medicine 6:593-617.

Kelly, D.H. 1974 Track position and delinquent involvement: A preliminary analysis. Sociology and Social Research 58:380-386.

Klein, M.W., ed. 1989 Cross-National Research in Self-Reported Crime and Delinquency . Los Angeles: Kluwer Academic Publishers.

Krohn, M.D., G.P. Waldo, and T.G. Chiricos 1974 Reported delinquency: A comparison of structured interviews and self-administered check-lists. Journal of Criminal Law and Criminology 65:545-553.

Krohn, M.D., A.J. Lizotte, and C.M. Perez 1997 The interrelationship between substance use and precocious transitions to adult statutes. Journal of Health and Social Behavior 38:87-103.

Krohn, M.D., T.P. Thornberry, C. Rivera, and M. LeBlanc 2001 Later delinquency careers. In Child Delinquents: Development, Intervention, and Service Needs , R. Loeber and D.P. Farrington, eds. Thousand Oaks, CA: Sage.

Kulik, J.A., K.B. Stein, and T.R. Sarbin 1968 Disclosure of delinquent behavior under conditions of anonymity and nonanonymity. Journal of Consulting and Clinical Psychology 32:506-509.

Lauritsen, J.L. 1998 The age-crime debate: Assessing the limits of longitudinal self-report data. Social Forces 76:1-29.

LeBlanc, M. 1989 Designing a self-report instrument for the study of the development of offending from childhood to adulthood: Issues and problems. Pp. 371-398 in Cross-National Research in Self-Reported Crime and Delinquency , M.W. Klein, ed. Los Angeles: Kluwer Academic Publishers.

Lehnen, R.G., and A.J. Reiss 1978 Response effects in the National Crime Survey. Victimology: An International Journal 3:110-124.

Loeber, R., M. Stouthamer-Loeber, W.B. Van Kammen, and D.P. Farrington 1989 Development of a new measure of self-reported antisocial behavior for young children: Prevalence and reliability. Pp. 203-225 in Cross-National Research in Self-Reported Crime and Delinquency , M.W. Klein, ed. Los Angeles: Kluwer Academic Publishers.

Loeber, R., P. Wung, K. Keenan, B. Giroux, and M. Stouthamer-Loeber 1993 Developmental pathways in disruptive child behavior. Development and Psychopathology 5:101-133.

Loeber, R., D.P. Farrington, M. Stouthamer-Loeber, T.E. Moffitt, and A. Caspi 1998 The development of male offending: Key findings from the first decade of the Pittsburgh Youth Study. Studies on Crime and Crime Prevention 7:1-31.

Matthews, V.M. 1968 Differential identification: An empirical note. Social Problems 14:376-383.

Maxfield, M.G., B.L. Weiler, and C.S. Widom 2000 Comparing self-reports and official records of arrests. Journal of Quantitative Criminology 16:87-100.

Menard, S. and D.S. Elliott 1993 Data set comparability and short-term trends in crime and delinquency. Journal of Criminal Justice 21:433-445.

Moffitt, T.E. 1993 Life-course-persistent and adolescence-limited antisocial behavior: A developmental taxonomy. Psychological Review 100:674-701.

1997 Adolescence-limited and life-course-persistent offending: A complementary pair of developmental theories. Pp. 11-54 in Developmental Theories of Crime and Delinquency, Volume 7: Advances in Criminological Theory , T.P. Thornberry, ed. New Brunswick, NJ: Transaction Publishers.

Nagin, D.S., and R. Paternoster 1991 On the relationship of past and future participation in delinquency. Criminology 29:163-190.

2000 Population heterogeneity and state dependence: State of the evidence and directions for future research. Journal of Quantitative Criminology 16:117-145 .

Nettler, G. 1978 Explaining Crime . New York: McGraw-Hill.

Nye, F.I., J.F. Short, and V.J. Olson 1958 Socioeconomic status and delinquent behavior. American Journal of Sociology 63:381-389.

Olweus, D. 1979 Stability and aggressive reaction patterns in males: A review. Psychological Bulletin 86:852-875.

Osgood, D.W., P. O’Malley, J. Bachman, and L. Johnston 1989 Time trends and age trends in arrests and self-reported illegal behavior. Criminology 27:389-418.

Patterson, G.R. 1993 Orderly change in a stable world: The antisocial trait as a chimera. Journal of Consulting and Clinical Psychology 61:911-919.

Patterson, G.R., and R. Loeber 1982 The Understanding and Prediction of Delinquent Child Behavior . Research proposal to National Institute of Mental Health from the Oregon Social Learning Center, Eugene.

Polk, K. 1969 Class strain and rebellion among adolescents. Social Problems 17:214-224.

Porterfield, A.L. 1943 Delinquency and outcome in court and college. American Journal of Sociology 49:199-208.

1946 Youth in Trouble . Fort Worth, TX: Leo Potishman Foundation.

Reiss, A.J., Jr., and A.L. Rhodes 1959 A Socio-Psychological Study of Adolescent Conformity and Deviation . Washington, DC: U.S. Office of Education.

1963 Status deprivation and delinquent behavior. Sociological Quarterly 4:135-149.

1964 An empirical test of differential association theory. Journal of Research in Crime and Delinquency 1:5-18.

Richman, N., J. Stevenson, and P.J. Graham 1982 Preschool to School: A Behavioural Study . London: Academic Press.

Rojek, D.G. 1983 Social status and delinquency: Do self-reports and official reports match? In Measurement Issues in Criminal Justice , G.P. Waldo, ed. Beverly Hills, CA: Sage.

Sampson, R.J., and J.H. Laub 1990 Crime and deviance over the life course: The salience of adult social bonds. American Sociological Review 55:609-627.

Saris, W.E. 1991 Computer-Assisted Interviewing . Beverly Hills, CA: Sage.

Sellin, T. 1931 The basis of a crime index. Journal of Criminal Law and Criminology 22:335-356.

Shaw, D.S., and R.Q. Bell 1993 Developmental theories of parental contributors to antisocial behavior. Journal of Abnormal Child Psychology 21:35-49.

Short, J.F. 1957 Differential association and delinquency. Social Problems 4:233-239.

Short, J.F., Jr., and F.I. Nye 1957 Reported behavior as a criterion of deviant behavior. Social Problems 5:207-213.

1958 Extent of unrecorded juvenile delinquency: Tentative conclusions. Journal of Criminal Law and Criminology 49:296-302.

Silberman, M. 1976 Toward a theory of criminal deterrence. American Sociological Review 41:442-461. Slocum, W.L., and C.L. Stone

1963 Family culture patterns and delinquent-type behavior. Marriage and Family Living 25:202-208.

Skolnick, J.V., C.J. Braukmann, M.M. Bedlington, K.A. Kirigin, and M.M. Wolf 1981 Parent-youth interaction and delinquency in group homes. Journal of Abnormal Child Psychology 9:107-119.

Stanfield, R. 1966 The interaction of family variables and gang variables in the aetiology of delinquency. Social Problems 13:411-417.

Thornberry, T.P. 1987 Toward an interactional theory of delinquency. Criminology 25:863-891.

1989 Panel effects and the use of self-reported measures of delinquency in longitudinal studies. Pp. 347-369 in Cross-National Research in Self-Reported Crime and Delinquency , M.W. Klein, ed. Los Angeles: Kluwer Academic Publishers.

Thornberry, T.P., ed. 1997 Developmental Theories of Crime and Delinquency . New Brunswick, NJ: Transaction Publishers.

Thornberry, T.P., and M.D. Krohn 2001 The development of delinquency: An interactional perspective. Pp. 289-305 in Handbook of Law and Social Science: Youth and Justice , S.O. White, ed. New York: Plenum.

Thornberry, T.P., M.D. Krohn, A.J. Lizotte, C.A. Smith, and K. Tobin In Press The Toll of Gang Membership: Gangs and Delinquency in Developmental Perspective . New York: Cambridge University Press.

Thrasher, F. 1927 The Gang: A Study of 1,313 Gangs in Chicago . Chicago: University of Chicago Press.

Tourangeau, R. and T.W. Smith 1996 Asking sensitive questions: The impact of data collection, mode, question format, and question context. Public Opinion Quarterly 60:275-304.

Tracy, P.E., and J.A. Fox 1981 The validity of randomized response for sensitive measurements. American Sociological Review 46:187-200.

Tremblay, R.E., C. Japel, D. Perusse, P. McDuff, M. Boivin, M. Zoccolillo, and J. Montplaisir 1999 The search for the age of ‘onset’ of physical aggression: Rousseau and Bandura revisited. Criminal Behaviour and Mental Health 9:8-23.

Turner, C.F., J.T. Lessler, and J. Devore 1992 Effects of mode of administration and wording on reporting of drug use. Pp. 177-220 in Survey Measurement of Drug Use: Methodological Studies , C.F. Turner, J.T. Lessler, and J.C. Gfroerer, eds. Washington, DC: U.S. Department of Health and Human Services.

Vaz, E.W. 1966 Self-reported juvenile delinquency and social status. Canadian Journal of Corrections 8:20-27.

Voss, H.L. 1963 Ethnic differentials in delinquency in Honolulu. Journal of Criminal Law and Criminology 54:322-327.

1964 Differential association and reported delinquent behavior: A replication. Social Problems 12:78-85.

1966 Socio-economic status and reported delinquent behavior. Social Problems 13:314-324.

Waldo, G.P., and T.G. Chiricos 1972 Perceived penal sanction and self-reported criminality: A neglected approach to deterrence research. Social Problems 19:522-540.

Wallerstein, J.S., and C.J. Wylie 1947 Our law-abiding law-breakers. Probation 25:107-112.

Weis, J.G., and D.V. Van Alstyne 1979 The Measurement of Delinquency by the Randomized Response Method. Paper presented at the meeting of the American Society of Criminology, Philadelphia.

Weitekamp, E. 1989 Some problems with the use of self-reports in longitudinal research. Pp. 329-346 in Cross-National Research in Self-Reported Crime and Delinquency , M.W. Klein, ed. Los Angeles: Kluwer Academic Publishers.

Widom, C.S. 1989 Child abuse, neglect, and violent criminal behavior. Criminology 27:251-271.

Williams, J.R., and M. Gold 1972 From delinquent behavior to official delinquency. Social Problems 20(2):209-229.

Wolfgang, M.E., R.M. Figlio, and T. Sellin 1972 Delinquency in a Birth Cohort . Chicago: University of Chicago Press.

Wolfgang, M.E., T.P. Thornberry, and R.M. Figlio 1987 From Boy to Man, From Delinquency to Crime . Chicago: University of Chicago Press.

Wright, D.L., W.S. Aquilino and A.J. Supple 1998 A comparison of computer-assisted and paper-and-pencil self-administered questionnaires in a survey on smoking, alcohol, and drug use. Public Opinion Quarterly 62:331-353.

Most major crime in this country emanates from two major data sources. The FBI’s Uniform Crime Reports has collected information on crimes known to the police and arrests from local and state jurisdictions throughout the country. The National Crime Victimization Survey, a general population survey designed to cover the extent, nature, and consequences of criminal victimization, has been conducted annually since the early1970s. This workshop was designed to consider similarities and differences in the methodological problems encountered by the survey and criminal justice research communities and what might be the best focus for the research community. In addition to comparing and contrasting the methodological issues associated with self-report surveys and official records, the workshop explored methods for obtaining accurate self-reports on sensitive questions about crime events, estimating crime and victimization in rural counties and townships and developing unbiased prevalence and incidence rates for rate events among population subgroups.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

self report research method advantages and disadvantages

Skip to content

Get Revising

Join get revising, already a member.

Ai Tutor Bot Advert

Self-Report

  • Created by: Jasmine Khatri
  • Created on: 10-04-15 18:30
  • Research methods and techniques

No comments have yet been made

Similar Psychology resources:

Research methods 3.0 / 5 based on 2 ratings

Self Report Notes - AS OCR Psychology Unit 1 0.0 / 5

AS UNIT 2 PRACTICAL 0.0 / 5

Psychological Investigations - Interviews & Questionnaires 3.0 / 5 based on 1 rating

Psychology Key Words 0.0 / 5

OCR Research Methods G541 0.0 / 5

Research methods 0.0 / 5

Self report 0.0 / 5

Practical Social 0.0 / 5

AS Psychology - The Limbic System 0.0 / 5

Related discussions on The Student Room

  • Any revision tips/materials for Bio and maths? »
  • Edexcel Higher GCSE maths »
  • AQA A Level Psychology Paper 2 7182/2 - 8 Jun 2022 [Exam Chat] »
  • AQA A-level Psychology Paper 2 (7182/2) - 25th May 2023 [Exam Chat] »
  • A level AQA Psychology »
  • how to start revising to achieve grade 9s for GCSEs »
  • A level Sociology »
  • A level chemistry required practical help »
  • What are the 5 concepts of business intelligence? »
  • Health and social care unit 17 »

self report research method advantages and disadvantages

REVIEW article

On the advantages and disadvantages of choice: future research directions in choice overload and its moderators.

Raffaella Misuraca

  • 1 Department of Political Science and International Relations (DEMS), University of Palermo, Palermo, Italy
  • 2 Atkinson Graduate School of Management, Willamette University, Salem, OR, United States
  • 3 Department of Psychology, Educational Science and Human Movement, University of Palermo, Palermo, Italy

Researchers investigating the psychological effects of choice have provided extensive empirical evidence that having choice comes with many advantages, including better performance, more motivation, and greater life satisfaction and disadvantages, such as avoidance of decisions and regret. When the decision task difficulty exceeds the natural cognitive resources of human mind, the possibility to choose becomes more a source of unhappiness and dissatisfaction than an opportunity for a greater well-being, a phenomenon referred to as choice overload. More recently, internal and external moderators that impact when choice overload occurs have been identified. This paper reviews seminal research on the advantages and disadvantages of choice and provides a systematic qualitative review of the research examining moderators of choice overload, laying out multiple critical paths forward for needed research in this area. We organize this literature review using two categories of moderators: the choice environment or context of the decision as well as the decision-maker characteristics.

Introduction

The current marketing orientation adopted by many organizations is to offer a wide range of options that differ in only minor ways. For example, in a common western grocery store contains 285 types of cookies, 120 different pasta sauces, 175 salad-dressing, and 275 types of cereal ( Botti and Iyengar, 2006 ). However, research in psychology and consumer behavior has demonstrated that when the number of alternatives to choose from becomes excessive (or superior to the decision-makers’ cognitive resources), choice is mostly a disadvantage to both the seller and the buyer. This phenomenon has been called choice overload and it refers to a variety of negative consequences stemming from having too many choices, including increased choice deferral, switching likelihood, or decision regret, as well as decreased choice satisfaction and confidence (e.g., Chernev et al., 2015 ). Choice overload has been replicated in numerous fields and laboratory settings, with different items (e.g., jellybeans, pens, coffee, chocolates, etc.), actions (reading, completing projects, and writing essays), and populations (e.g., Chernev, 2003 ; Iyengar et al., 2004 ; Schwartz, 2004 ; Shah and Wolford, 2007 ; Mogilner et al., 2008 ; Fasolo et al., 2009 ; Misuraca and Teuscher, 2013 ; Misuraca and Faraci, 2021 ; Misuraca et al., 2022 ; see also Misuraca, 2013 ). Over time, we have gained insight into numerous moderators of the choice overload phenomena, including aspects of the context or choice environment as well as the individual characteristics of the decision-maker (for a detailed review see Misuraca et al., 2020 ).

The goal of this review is to summarize important research findings that drive our current understanding of the advantages and disadvantages of choice, focusing on the growing body of research investigating moderators of choice overload. Following a discussion of the advantages and disadvantages of choice, we review the existing empirical literature examining moderators of choice overload. We organize this literature review using two categories of moderators: the choice environment or context of the decision as well as the decision-maker characteristics. Finally, based on this systematic review of research, we propose a variety of future research directions for choice overload investigators, ranging from exploring underlying mechanisms of choice overload moderators to broadening the area of investigation to include a robust variety of decision-making scenarios.

Theoretical background

The advantages of choice.

Decades of research in psychology have demonstrated the many advantages of choice. Indeed, increased choice options are associated with increase intrinsic motivation ( Deci, 1975 ; Deci et al., 1981 ; Deci and Ryan, 1985 ), improved task performance ( Rotter, 1966 ), enhanced life satisfaction ( Langer and Rodin, 1976 ), and improved well-being ( Taylor and Brown, 1988 ). Increased choice options also have the potential to satisfy heterogeneous preferences and produce greater utility ( Lancaster, 1990 ). Likewise, economic research has demonstrated that larger assortments provide a higher chance to find an option that perfectly matches the individual preferences ( Baumol and Ide, 1956 ). In other words, with larger assortments it is easier to find what a decision-maker wants.

The impact of increased choice options extends into learning, internal motivation, and performance. Zuckerman et al. (1978) asked college students to solve puzzles. Half of the participants could choose the puzzle they would solve from six options. For the other half of participants, instead, the puzzle was imposed by the researchers. It was found that the group free to choose the puzzle was more motivated, more engaged and exhibited better performance than the group that could not choose the puzzle to solve. In similar research, Schraw et al. (1998) asked college students to read a book. Participants were assigned to either a choice condition or a non-choice condition. In the first one, they were free to choose the book to read, whereas in the second condition the books to read were externally imposed, according to a yoked procedure. Results demonstrated the group that was free to make decisions was more motivated to read, more engaged, and more satisfied compared to the group that was not allowed to choose the book to read ( Schraw et al., 1998 ).

These effects remain consistent with children and when choice options are constrained to incidental aspects of the learning context. In the study by Cordova and Lepper (1996) , elementary school children played a computer game designed to teach arithmetic and problem-solving skills. One group could make decisions about incidental aspects of the learning context, including which spaceship was used and its name, whereas another group could not make any choice (all the choices about the game’s features were externally imposed by the experimenters). The results demonstrated that the first group was more motivated to play the game, more engaged in the task, learned more of the arithmetical concepts involved in the game, and preferred to solve more difficult tasks compared to the second group.

Extending benefits of choice into health consequences, Langer and Rodin (1976) examined the impact that choice made in nursing home patients. In this context, it was observed that giving patients the possibility to make decisions about apparently irrelevant aspects of their life (e.g., at what time to watch a movie; how to dispose the furniture in their bedrooms, etc.), increased psychological and physiological well-being. The lack of choice resulted, instead, in a state of learned helplessness, as well as deterioration of physiological and psychological functions.

The above studies lead to the conclusion that choice has important advantages over no choice and, to some extent, limited choice options. It seems that providing more choice options is an improvement – it will be more motivating, more satisfying, and yield greater well-being. In line with this conclusion, the current orientation in marketing is to offer a huge variety of products that differ only in small details (e.g., Botti and Iyengar, 2006 ). However, research in psychology and consumer behavior demonstrated that when the number of alternatives to choose from exceeds the decision-makers’ cognitive resources, choice can become a disadvantage.

The disadvantages of choice

A famous field study conducted by Iyengar and Lepper (2000) in a Californian supermarket demonstrated that too much choice decreases customers’ motivation to buy as well as their post-choice satisfaction. Tasting booths were set up in two different areas of the supermarket, one of which displayed 6 different jars of jam while the other displayed 24 options, with customers free to taste any of the different flavors of jam. As expected, the larger assortment attracted more passers-by compared to the smaller assortment; Indeed, 60% of passers-by stopped at the table displaying 24 different options, whereas only 40% of the passers-by stopped at the table displaying the small variety of 6 jams. This finding was expected given that more choice options are appealing. However, out of the 60% of passers-by who stopped at the table with more choices, only 3% of them decided to buy jam. Conversely, 30% of the consumers who stopped at the table with only 6 jars of jam decided to purchase at least one jar. Additionally, these customers expressed a higher level of satisfaction with their choices, compared to those who purchased a jar of jam from the larger assortment. In other words, it seems that too much choice is at the beginning more appealing (attracts more customers), but it decreases the motivation to choose and the post-choice satisfaction.

This classic and seminal example of choice overload was quickly followed by many replications that expanded the findings from simple purchasing decisions into other realms of life. For example, Iyengar and Lepper (2000) , asked college students to write an essay. Participants were randomly assigned to one of the following two experimental conditions: limited-choice condition, in which they could choose from a list of six topics for the essay, and extensive-choice condition, in which they could choose from a list of 30 different topics for the essay. Results showed that a higher percentage of college students (74%) turned in the essay in the first condition compared to the second condition (60%). Moreover, the essays written by the students in the limited-choice conditions were evaluated as being higher quality compared to the essays written by the students in the extensive choice condition. In a separate study, college students were asked to choose one chocolate from two randomly assigned choice conditions with either 6 or 30 different chocolates. Those participants in the limited choice condition reporting being more satisfied with their choice and more willing to purchase chocolates at the end of the experiment, compared to participants who chose from the larger assortment ( Iyengar and Lepper, 2000 ).

In the field of financial decision-making, Iyengar et al. (2004) analyzed 800,000 employees’ decisions about their participation in 401(k) plans that offered from a minimum of 2 to a maximum of 59 different fund options. The researchers observed that as the fund options increased, the participation rate decreased. Specifically, plans offering less than 10 options had the highest participation rate, whereas plans offering 59 options had the lowest participation rate.

The negative consequences of having too much choice driven by cognitive limitations. Simon (1957) noted that decision-makers have a bounded rationality. In other words, the human mind cannot process an unlimited amount of information. Individuals’ working memory has a span of about 7 (plus or minus two) items ( Miller, 1956 ), which means that of all the options to choose from, individuals can mentally process only about 7 alternatives at a time. Because of these cognitive limitations, when the number of choices becomes too high, the comparison of all the available items becomes cognitively unmanageable and, consequently, decision-makers feel overwhelmed, confused, less motivated to choose and less satisfied (e.g., Iyengar and Lepper, 2000 ). However, a more recent meta-analytic work [ Chernev et al., 2015 : see also Misuraca et al. (2020) ] has shown that choice overload occurs only under certain conditions. Many moderators that mitigate the phenomenon have been identified by researchers in psychology and consumer behavior (e.g., Mogilner et al., 2008 ; Misuraca et al., 2016a ). In the next sections, we describe our review methodology and provide a detailed discussion of the main external and internal moderators of choice overload.

Literature search and inclusion criteria

Our investigation consisted of a literature review of peer-reviewed empirical research examining moderators of choice overload. We took several steps to locate and identify eligible studies. First, we sought to establish a list of moderators examined in the choice overload literature. For this, we referenced reviews conducted by Chernev et al. (2015) , McShane and Böckenholt (2017) , as well as Misuraca et al. (2020) and reviewed the references sections of the identified articles to locate additional studies. Using the list of moderators generated from this examination, we conducted a literature search using PsycInfo (Psychological Abstracts), EBSCO and Google Scholar. This search included such specific terms such as choice set complexity, visual preference heuristic, and choice preference uncertainty, as well as broad searches for ‘choice overload’ and ‘moderator’.

We used several inclusion criteria to select relevant articles. First, the article had to note that it was examining the choice overload phenomena. Studies examining other theories and/or related variables were excluded. Second, to ensure that we were including high-quality research methods that have been evaluated by scholars, only peer-reviewed journal articles were included. Third, the article had to include primary empirical data (qualitative or quantitative). Thus, studies that were conceptual in nature were excluded. This process yielded 49 articles for the subsequent review.

Moderators of choice overload

Choice environment and context.

Regarding external moderators of choice overload, several aspects about the choice environment become increasingly relevant. Specifically, these include the perceptual attributes of the information, complexity of the set of options, decision task difficulty, as well as the presence of brand names.

Perceptual characteristics

As Miller (1956) noted, humans have “channel capacity” for information processing and these differ for divergent stimuli: for taste, we have a capacity to accommodate four; for tones, the capacity increased to six; and for visual stimuli, we have the capacity for 10–15 items. Accordingly, perceptual attributes of choice options are an important moderator of choice overload, with visual presentation being one of the most important perceptual attributes ( Townsend and Kahn, 2014 ). The visual preference heuristic refers to the tendency to prefer a visual rather than verbal representation of choice options, regardless of assortment size ( Townsend and Kahn, 2014 ). However, despite this preference, visual presentations of large assortments lead to suboptimal decisions compared to verbal presentations, as visual presentations activate a less systematic decision-making approach ( Townsend and Kahn, 2014 ). Visual presentation of large choice sets is also associated with increased perceptions of complexity and likelihood of decisions deferral. Visual representations are particularly effective with small assortments, as they increase consumers’ perception of variety, improve the likelihood of making a choice, and reduce the time spent examining options ( Townsend and Kahn, 2014 ).

Choice set complexity

Choice set complexity refers to a wide range of aspects of a decision task that affect the value of the available choice options without influencing the structural characteristics of the decision problem ( Payne et al., 1993 ). Thus, choice set complexity does not influence aspects such as the number of options, number of attributes of each option, or format in which the information is presented. Rather, choice set complexity concerns factors such as the attractiveness of options, the presence of a dominant option, and the complementarity or alignability of the options.

Choice set complexity increases when the options include higher-quality, more attractive options ( Chernev and Hamilton, 2009 ). Indeed, when the variability in the relative attractiveness of the choice alternatives increases, the certainty about the choice and the satisfaction with the task increase ( Malhotra, 1982 ). Accordingly, when the number of attractive options increases, more choice options led to a decline in consumer satisfaction and likelihood of a decision being made, but satisfaction increases and decision deferral decreased when the number of unattractive options increases ( Dhar, 1997 ). This occurs when increased choice options make the weakness and strengths of attractive and unattractive options more salient ( Chan, 2015 ).

Similarly, the presence of a dominant option simplifies large choice sets and increased the preference for the chosen option; however, the opposite effect happens in small choice sets ( Chernev, 2003 ). Choice sets containing an ideal option have been associated with increased brain activity in the areas involved in reward and value processing as well as in the integration of costs and benefits (striatum and the anterior cingulate cortex; Reutskaja et al., 2018 ) which could explain why larger choice sets are not always associated with choice overload. As Misuraca et al. (2020 , p. 639) noted, “ the benefits of having an ideal item in the set might compensate for the costs of overwhelming set size in the bounded rational mind of humans . ”

Finally, choice set complexity is impacted by the alignability and complementarity of the attributes that differentiate the options ( Chernev et al., 2015 ). When unique attributes of options exist within a choice set, complexity and choice overload increase as the unique attributes make comparison more difficult and trade-offs more salient. Indeed, feature alignability and complementarity (meaning that the options have additive utility and need to be co-present to fully satisfy the decision-maker’s need) 1 have been associated with decision deferral ( Chernev, 2005 ; Gourville and Soman, 2005 ) and changes in satisfaction ( Griffin and Broniarczyk, 2010 ).

Decision task difficulty

Decision task difficulty refers to the structural characteristics of a decision problem; unlike choice set complexity, decision task difficulty does not influence the value of the choice options ( Payne et al., 1993 ). Decision task difficulty is influenced by the number of attributes used to describe available options, decision accountability, time constraints, and presentation format.

The number of attributes used to describe the available options within an assortment influences decision task difficulty and choice overload ( Hoch et al., 1999 ; Chernev, 2003 ; Greifeneder et al., 2010 ), such that choice overload increases with the number of dimensions upon which the options differ. With each additional dimension, decision-makers have another piece of information that must be attended to and evaluated. Along with increasing the cognitive complexity of the choice, additional dimensions likely increase the odds that each option is inferior to other options on one dimension or another (e.g., Chernev et al., 2015 ).

When individuals have decision accountability or are required to justify their choice of an assortment to others, they tend to prefer larger assortments; However, when individuals must justify their particular choice from an assortment to others, they tend to prefer smaller choice sets ( Ratner and Kahn, 2002 ; Chernev, 2006 ; Scheibehenne et al., 2009 ). Indeed, decision accountability is associated with decision deferral when choice sets are larger compared to smaller ( Gourville and Soman, 2005 ). Thus, decision accountability influences decision task difficulty differently depending on whether an individual is selecting an assortment or choosing an option from an assortment.

Time pressure or constraint is an important contextual factor for decision task difficulty, choice overload, and decision regret ( Payne et al., 1993 ). Time pressure affects the strategies that are used to make decisions as well as the quality of the decisions made. When confronted with time pressure, decision-makers tend to speed up information processing, which could be accomplished by limiting the amount of information that they process and use ( Payne et al., 1993 ; Pieters and Warlop, 1999 ; Reutskaja et al., 2011 ). Decision deferral becomes a more likely outcome, as is choosing at random and regretting the decision later ( Inbar et al., 2011 ).

The physical arrangement and presentation of options and information affect information perception, processing, and decision-making. This moderates the effect of choice overload because these aspects facilitate or inhibit decision-makers’ ability to process a greater information load (e.g., Chernev et al., 2015 ; Anderson and Misuraca, 2017 ). The location of options and structure of presented information allow the retrieval of information about the options, thus allowing choosers to distinguish and evaluate various options (e.g., Chandon et al., 2009 ). Specifically, organizing information into “chunks” facilitates information processing ( Miller, 1956 ) as well as the perception of greater variety in large choice sets ( Kahn and Wansink, 2004 ). Interestingly, these “chunks” do not have to be informative; Mogilner et al. (2008) found that choice overload was mitigated to the same extent when large choice sets were grouped into generic categories (i.e., A, B, etc.) as when the categories were meaningful descriptions of characteristics.

Beyond organization, the presentation order can facilitate or inhibit decision-makers cognitive processing ability. Levav et al. (2010) found that choice overload decreased and choice satisfaction increased when smaller choice sets were followed by larger choice sets, compared to the opposite order of presentation. When sets are highly varied, Huffman and Kahn (1998) found that decision-makers were more satisfied and willing to make a choice when information was presented about attributes (i.e., price and characteristics) rather than available alternatives (i.e., images of options). Finally, presenting information simultaneously, rather than sequentially, increases decision satisfaction ( Mogilner et al., 2013 ), likely due to decision-makers choosing among an available set rather than comparing each option to an imaged ideal option.

Brand names

The presence of brand names is an important moderator of choice overload. As recently demonstrated by researchers in psychology and consumer behavior, choice overload occurs only when options are not associated with brands, choice overload occurs when the same choice options are presented without any brand names ( Misuraca et al., 2019 , 2021a ). When choosing between 6 or 24 different mobile phones, choice overload did not occur in the condition in which phones were associated with a well-known brand (i.e., Apple, Samsung, Nokia, etc.), although it did occur when the same cell phones were displayed without information about their brand. These findings have been replicated with a population of adolescents ( Misuraca et al., 2021a ).

Decision-maker characteristics

Beyond the choice environment and context, individual differences in decision-maker characteristics are significant moderators of choice overload. Several critical characteristics include the decision goal as well as an individual’s preference uncertainty, affective state, decision style, and demographic variables such as age, gender, and cultural background (e.g., Misuraca et al., 2021a ).

Decision goal

A decision goal refers to the extent to which a decision-maker aims to minimize the cognitive resources spent making a decision ( Chernev, 2003 ). Decision goals have been associated with choice overload, with choice overload increasing along with choice set options, likely due to decision-makers unwillingness to make tradeoffs between various options. As a moderator of choice overload, there are several factors which impact the effect of decision goals, including decision intent (choosing or browsing) and decision focus (choosing an assortment or an option) ( Misuraca et al., 2020 ).

Decision intent varies between choosing, with the goal of making a decision among the available options, and browsing, with the goal of learning more about the options. Cognitive overload is more likely to occur than when decision makers’ goal is choosing compared to browsing. For choosing goals, decision-makers need to make trade-offs among the pros and cons of the options, something that demands more cognitive resources. Accordingly, decision-makers whose goal is browsing, rather than choosing, are less likely to experience cognitive overload when facing large assortments ( Chernev and Hamilton, 2009 ). Furthermore, when decision-makers have a goal of choosing, brain research reveals inverted-U-shaped function, with neither too much nor too little choice providing optimal cognitive net benefits ( Reutskaja et al., 2018 ).

Decision focus can target selecting an assortment or selecting an option from an assortment. When selecting an assortment, cognitive overload is less likely to occur, likely due to the lack of individual option evaluation and trade-offs ( Chernev et al., 2015 ). Thus, when choosing an assortment, decision-makers tend to prefer larger assortments that provide more variety. Conversely, decision-makers focused on choosing an option from an assortment report increased decision difficulty and tend to prefer smaller assortments ( Chernev, 2006 ). Decision overload is further moderated by the order of decision focus. Scheibehenne et al. (2010) found that when decision-makers first decide on an assortment, they are more likely to choose an option from that assortment, rather than an option from an assortment they did not first select.

Preference uncertainty

The degree to which decision-makers have preferences varies regarding comprehension and prioritization of the costs and benefits of the choice options. This is referred to as preference uncertainty ( Chernev, 2003 ). Preference uncertainty is influenced by decision-maker expertise and an articulated ideal option, which indicates well-defined preferences. When decision-makers have limited expertise, larger choice sets are associated with weaker preferences as well as increased choice deferral and choice overload compared to smaller choice sets. Conversely, high expertise decision-makers experience weaker preferences and increased choice deferral in the context of smaller choice sets compared to larger ( Mogilner et al., 2008 ; Morrin et al., 2012 ). Likewise, an articulated ideal option, which implies that the decision-maker has already engaged in trade-offs, is associated with reduced decision complexity. The effect is more pronounced in larger choice sets compared to smaller choice sets ( Chernev, 2003 ).

Positive affect

Positive affect tends to moderate the impact of choice overload on decision satisfaction. Indeed, Spassova and Isen (2013) found that decision-makers reporting positive affect did not report experiencing dissatisfaction when choosing from larger choice sets while those with neutral affect reported being more satisfied when choosing from smaller choice sets. This affect may be associated with the affect heuristic, or a cognitive shortcut that enables efficient decisions based on the immediate emotional response to a stimulus ( Slovic et al., 2007 ).

Decision-making tendencies

Satisfaction with extensive choice options may depend on whether one is a maximizer or a satisficer. Maximizing refers to the tendency to search for the best option. Maximizers approach decision tasks with the goal to find the absolute best ( Carmeci et al., 2009 ; Misuraca et al., 2015 , 2016b , 2021b ; Misuraca and Fasolo, 2018 ). To do that, they tend to process all the information available and try to compare all the possible options. Conversely, satisficers are decision-makers whose goal is to select an option that is good enough, rather than the best choice. To find such an option, satisficers evaluate a smaller range of options, and choose as soon as they find one alternative that surpasses their threshold of acceptability ( Schwartz, 2004 ). Given the different approach of maximizers and satisficers when choosing, it is easy to see why choice overload represents more of a problem for maximizers than for satisficers. If the number of choices exceeds the individuals’ cognitive resources, maximizers more than satisficers would feel overwhelmed, frustrated, and dissatisfied, because an evaluation of all the available options to select the best one is cognitively impossible.

Maximizers attracted considerable attention from researchers because of the paradoxical finding that even though they make objectively better decisions than satisficers, they report greater regret and dissatisfaction. Specifically, Iyengar et al. (2006) , analyzed the job search outcomes of college students during their final college year and found that maximizer students selected jobs with 20% higher salaries compared to satisficers, but they felt less satisfied and happy, as well as more stressed, frustrated, anxious, and regretful than students who were satisficers. The reasons for these negative feelings of maximizers lies in their tendency to believe that a better option is among those that they could not evaluate, given their time and cognitive limitations.

Choosing for others versus oneself

When decision-makers must make a choice for someone else, choice overload does not occur ( Polman, 2012 ). When making choices for others (about wines, ice-cream flavors, school courses, etc.), decision makers reported greater satisfaction when choosing from larger assortments rather than smaller assortments. However, when choosing for themselves, they reported higher satisfaction after choosing from smaller rather than larger assortments.

Demographics

Demographic variables such as gender, age, and cultural background moderate reactions concerning choice overload. Regarding gender, men and women may often employ different information-processing strategies, with women being more likely to attend to and use details than men (e.g., Meyers-Levy and Maheswaran, 1991 ). Gender differences also arise in desire for variety and satisfaction depending on choice type. While women were more satisfied with their choice of gift boxes regardless of assortment size, women become more selective than men when speed-dating with larger groups of speed daters compared to smaller groups ( Fisman et al., 2006 ).

Age moderates the choice overload experience such that, when choosing from an extensive array of options, adolescents and adults suffer similar negative consequences (i.e., greater difficulty and dissatisfaction), while children and seniors suffer fewer negative consequences (i.e., less difficulty and dissatisfaction than adolescents and adults) ( Misuraca et al., 2016a ). This could be associated with decision-making tendencies. Indeed, adults and adolescents tend to adopt maximizing approaches ( Furby and Beyth-Marom, 1992 ). This maximizing tendency aligns with their greater perceived difficulty and post-choice dissatisfaction when facing a high number of options ( Iyengar et al., 2006 ). Seniors tend to adopt a satisficing approach when making decisions ( Tanius et al., 2009 ), as well as become overconfident in their judgments ( Stankov and Crawford, 1996 ) and focused on positive information ( Mather and Carstensen, 2005 ). Taken together, these could explain why the negative consequences of too many choice options were milder among seniors. Finally, children tend to approach decisions in an intuitive manner and quickly develop strong preferences ( Schlottmann and Wilkening, 2011 ). This mitigates the negative consequences of choice overload for this age group.

Finally, decision-makers from different cultures have different preferences for variety (e.g., Iyengar, 2010 ). Eastern Europeans report greater satisfaction with larger choice sets than Western Europeans ( Reutskaja et al., 2022 ). Likewise, cultural differences in perception may impact how choice options affect decision-makers from Western and non-Western cultures (e.g., Miyamoto et al., 2006 ).

Future research directions

As researchers continue to investigate the choice overload phenomenon, future investigations can provide a deeper understanding of the underlying mechanisms that influence when and how individuals experience the negative impacts of choice overload as well as illuminate how this phenomenon can affect people in diverse contexts (such as hiring decisions, sports, social media platforms, streaming services, etc.).

For instance, the visual preference heuristic indicates, and subsequent research supports, the human tendency to prefer visual rather than verbal representations of choice options ( Townsend and Kahn, 2014 ). However, in Huffman and Kahn’s (1998) research, decision-makers preferred written information, such as characteristics of the sofa, rather than visual representations of alternatives. Future researchers can investigate the circumstances that underlie when individuals prefer detailed written or verbal information as opposed to visual images.

Furthermore, future researchers can examine the extent to which the mechanisms underlying the impact of chunking align with those underlying the effect of brand names. Research has supported that chunking information reduces choice overload, regardless of the sophistication of the categories ( Kahn and Wansink, 2004 ; Mogilner et al., 2008 ). The presence of a brand name has a seemingly similar effect ( Misuraca et al., 2019 , 2021a ). The extent to which the cognitive processes underlying these two areas of research the similar, as well as the ways in which they might differ, can provide valuable insights for researchers and practitioners.

More research is needed that considers the role of the specific culture and cultural values of the decision-maker on choice overload. Indeed, the traditional studies on the choice overload phenomenon mentioned above predominantly focused on western cultures, which are known for being individualistic cultures. Future research should explore whether choice overload replicates in collectivistic cultures, which value the importance of making personal decisions differently than individualist cultures. Additional cultural values, such as long-term or short-term time orientation, may also impact decision-makers and the extent to which they experience choice overload ( Hofstede and Minkov, 2010 ).

While future research that expands our understanding of the currently known and identified moderators of choice overload can critically inform our understanding of when and how this phenomenon occurs, there are many new and exciting directions into which researchers can expand.

For example, traditional research on choice overload focused on choice scenarios where decision-makers had to choose only one option out of either a small or a large assortment of options. This is clearly an important scenario, yet it represents only one of many scenarios that choice overload may impact. Future research could investigate when and how this phenomenon occurs in a wide variety of scenarios that are common in the real-world but currently neglected in classical studies on choice overload. These could include situations in which the individual can choose more than one option (e.g., more than one type of ice cream or cereal) (see Fasolo et al., 2024 ).

Historically, a significant amount of research on choice overload has focused on purchasing decisions. Some evidence also indicates that the phenomenon occurs in a variety of situations (e.g., online dating, career choices, retirement planning, travel and tourism, and education), potentially hindering decision-making processes and outcomes. Future research should further investigate how choice overload impacts individuals in a variety of untested situations. For instance, how might choice overload impact the hiring manager with a robust pool of qualified applicants? How would the occurrence of choice overload in a hiring situation impact the quality of the decision, making an optimal hire? Likewise, does choice overload play a role in procrastination? When confronted with an overwhelming number of task options, does choice overload play a role in decision deferral? It could be that similar cognitive processes underlie deferring a choice on a purchase and deferring a choice on a to-do list. Research is needed to understand how choice overload (and its moderators) may differ across these scenarios.

Finally, as society continues to adapt and develop, future research will be needed to evaluate the impact these technological and sociological changes have on individual decision-makers. The technology that we interact with has become substantially more sophisticated and omnipresent, particularly in the form of artificial intelligence (AI). As AI is adopted into our work, shopping, and online experiences, future researchers should investigate if AI and interactive decision-aids (e.g., Anderson and Misuraca, 2017 ) can be effectively leveraged to reduce the negative consequences of having too many alternatives without impairing the sense of freedom of decision-makers.

As with technological advancements, future research could examine how new sociological roles contribute to or minimize choice overload. For example, a social media influencer could reduce the complexity of the decision when there is a large number of choice options. If social media influencers have an impact, is that impact consistent across age groups and culturally diverse individuals? Deepening our understanding of how historical and sociological events have impacted decision-makers, along with how cultural differences in our perceptions of the world as noted above, could provide a rich and needed area of future research.

Discussion and conclusion

Research in psychology demonstrated the advantages of being able to make choices from a variety of alternatives, particularly when compared to no choice at all. Having the possibility to choose, indeed, enhances individuals’ feeling of self-determination, motivation, performance, well-being, and satisfaction with life (e.g., Zuckerman et al., 1978 ; Cordova and Lepper, 1996 ). As the world continues to globalize through sophisticated supply chains and seemingly infinite online shopping options, our societies have become characterized by a proliferation of choice options. Today, not only stores, but universities, hospitals, financial advisors, sport centers, and many other businesses offer a huge number of options from which to choose. The variety offered is often so large that decision-makers can become overwhelmed when trying to compare and evaluate all the potential options and experience choice overload ( Iyengar and Lepper, 2000 ). Rather than lose the benefits associated with choice options, researchers and practitioners should understand and leverage the existence of the many moderators that affect the occurrence of choice overload. The findings presented in this review indicate that choice overload is influenced by several factors, including perceptual attributes, choice set complexity, decision task difficulty, and brand association. Understanding these moderators can aid in designing choice environments that optimize decision-making processes and alleviate choice overload. For instance, organizing options effectively and leveraging brand association can enhance decision satisfaction and reduce choice overload. Additionally, considering individual differences such as decision goals, preference uncertainty, affective state, decision-making tendencies, and demographics can tailor decision-making environments to better suit the needs and preferences of individuals, ultimately improving decision outcomes. Future research is needed to fully understand the role of many variables that might be responsible for the negative consequences of choice overload and to better understand under which conditions the phenomenon occurs.

Author contributions

RM: Writing – review & editing, Conceptualization, Data curation, Investigation, Methodology, Writing – original draft. AN: Writing – review & editing. SM: Writing – review & editing. GD: Methodology, Writing – review & editing. CS: Writing – review & editing, Supervision.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. ^ For example, gloves and socks have complementary features, in that they provide warmth to different parts of the body.

Anderson, B. F., and Misuraca, R. (2017). Perceptual commensuration in decision tables. Q. J. Exp. Psychol. 70, 544–553. doi: 10.1080/17470218.2016.1139603

Crossref Full Text | Google Scholar

Baumol, W., and Ide, E. A. (1956). Variety in retailing. Manag. Sci. 3, 93–101. doi: 10.1287/mnsc.3.1.93

Botti, S., and Iyengar, S. S. (2006). The dark side of choice: when choice impairs social welfare. J. Public Policy Mark. 25, 24–38. doi: 10.1509/jppm.25.1.24

Carmeci, F., Misuraca, R., and Cardaci, M. (2009). A study of temporal estimation from the perspective of the mental clock model. J. Gen. Psychol. 136, 117–128. doi: 10.3200/GENP.136.2.117-128

PubMed Abstract | Crossref Full Text | Google Scholar

Chan, E. Y. (2015). Attractiveness of options moderates the effect of choice overload. Int. J. Res. Mark. 32, 425–427. doi: 10.1016/j.ijresmar.2015.04.001

Chandon, P., Hutchinson, J. W., Bradlow, E. T., and Young, S. H. (2009). Does in-store marketing work? Effects of the number and position of shelf facings on brand attention and evaluation at the point of purchase. J. Mark. 73, 1–17. doi: 10.1509/jmkg.73.6.1

Chernev, A. (2003). When more is less and less is more: the role of ideal point availability and assortment in consumer choice. J. Consum. Res. 30, 170–183. doi: 10.1086/376808

Chernev, A. (2005). Feature complementarity and assortment in choice. J. Consum. Res. 31, 748–759. doi: 10.1086/426608

Chernev, A. (2006). Decision focus and consumer choice among assortments. J. Consum. Res. 33, 50–59. doi: 10.1086/504135

Chernev, A., Böckenholt, U., and Goodman, J. (2015). Choice overload: a conceptual review and meta-analysis. J. Consum. Psychol. 25, 333–358. doi: 10.1016/j.jcps.2014.08.002

Chernev, A., and Hamilton, R. (2009). Assortment size and option attractiveness in consumer choice among retailers. J. Mark. Res. 46, 410–420. doi: 10.1509/jmkr.46.3.410

Cordova, D. I., and Lepper, M. R. (1996). Intrinsic motivation and the process of learning: beneficial effects of contextualization, personalization, and choice. J. Educ. Psychol. 88, 715–730. doi: 10.1037/0022-0663.88.4.715

Deci, E. (1975). Intrinsic motivation . New York, NY, London: Plenum Press.

Google Scholar

Deci, E. L., Nezlek, J., and Sheinman, L. (1981). Characteristics of the rewarder and intrinsic motivation of the rewardee. J. Pers. Soc. Psychol. 40, 1–10. doi: 10.1037/0022-3514.40.1.1

Deci, E. L., and Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior . Berlin: Springer Science & Business Media.

Dhar, R. (1997). Context and task effects on choice deferral. Mark. Lett. 8, 119–130. doi: 10.1023/A:1007997613607

Fasolo, B., Carmeci, F. A., and Misuraca, R. (2009). The effect of choice complexity on perception of time spent choosing: when choice takes longer but feels shorter. Psychol. Mark. 26, 213–228. doi: 10.1002/mar.20270

Fasolo, B., Misuraca, R., and Reutskaja, E. (2024). Choose as much as you wish: freedom cues in the marketplace help consumers feel more satisfied with what they choose and improve customer experience. J. Exp. Psychol. Appl. 30, 156–168. doi: 10.1037/xap0000481

Fisman, R., Iyengar, S. S., Kamenica, E., and Simonson, I. (2006). Gender differences in mate selection: evidence from a speed dating experiment. Q. J. Econ. 121, 673–697. doi: 10.1162/qjec.2006.121.2.673

Furby, L., and Beyth-Marom, R. (1992). Risk taking in adolescence: a decision-making perspective. Dev. Rev. 12, 1–44. doi: 10.1016/0273-2297(92)90002-J

Gourville, J. T., and Soman, D. (2005). Overchoice and assortment type: when and why variety backfires. Mark. Sci. 24, 382–395. doi: 10.1287/mksc.1040.0109

Greifeneder, R., Scheibehenne, B., and Kleber, N. (2010). Less may be more when choosing is difficult: choice complexity and too much choice. Acta Psychol. 133, 45–50. doi: 10.1016/j.actpsy.2009.08.005

Griffin, J. G., and Broniarczyk, S. M. (2010). The slippery slope: the impact of feature alignability on search and satisfaction. J. Mark. Res. 47, 323–334. doi: 10.1509/jmkr.47.2.323

Hoch, S. J., Bradlow, E. T., and Wansink, B. (1999). The variety of an assortment. Mark. Sci. 18, 527–546. doi: 10.1287/mksc.18.4.527

Hofstede, G., and Minkov, M. (2010). Long-versus short-term orientation: new perspectives. Asia Pac. Bus. Rev. 16, 493–504. doi: 10.1080/13602381003637609

Huffman, C., and Kahn, B. E. (1998). Variety for sale: mass customization or mass confusion? J. Retail. 74, 491–513. doi: 10.1016/S0022-4359(99)80105-5

Inbar, Y., Botti, S., and Hanko, K. (2011). Decision speed and choice regret: when haste feels like waste. J. Exp. Soc. Psychol. 47, 533–540. doi: 10.1016/j.jesp.2011.01.011

Iyengar, S. S. (2010). The art of choosing . London: Little Brown.

Iyengar, S. S., Huberman, G., and Jiang, W. (2004). “How much choice is too much? Contributions to 401 (k) retirement plans” in Pension design and structure New Lessons from Behavioral Finance . Oxford University Press, 83–95.

Iyengar, S. S., and Lepper, M. R. (2000). When choice is demotivating: can one desire too much of a good thing? J. Pers. Soc. Psychol. 79, 995–1006. doi: 10.1037/0022-3514.79.6.995

Iyengar, S. S., Wells, R. E., and Schwartz, B. (2006). Doing better but feeling worse looking for the ‘best’ job undermines satisfaction. Psychol. Sci. 17, 143–150. doi: 10.1111/j.1467-9280.2006.01677.x

Kahn, B. E., and Wansink, B. (2004). The influence of assortment structure on perceived variety and consumption quantities. J. Consum. Res. 30, 519–533. doi: 10.1086/380286

Lancaster, K. (1990). The economics of product variety: a survey. Mark. Sci. 9, 189–206. doi: 10.1287/mksc.9.3.189

Langer, E. J., and Rodin, J. (1976). The effects of choice and enhanced personal responsibility for aged: a field experiment in an institutional setting. J. Pers. Soc. Psychol. 34, 191–198. doi: 10.1037/0022-3514.34.2.191

Levav, J., Heitmann, M., Herrmann, A., and Iyengar, S. S. (2010). Order in product customization decisions: evidence from field experiments. J. Polit. Econ. 118, 274–299. doi: 10.1086/652463

Malhotra, N. K. (1982). Information load and consumer decision making. J. Consum. Res. 8, 419–430. doi: 10.1086/208882

Mather, M., and Carstensen, L. L. (2005). Aging and motivated cognition: the positivity effect in attention and memory. Trends Cogn. Sci. 9, 496–502. doi: 10.1016/j.tics.2005.08.005

McShane, B. B., and Böckenholt, U. (2017). Multilevel multivariate Meta-analysis with application to choice overload. Psychometrika 83, 255–271. doi: 10.1007/s11336-017-9571-z

Meyers-Levy, J., and Maheswaran, D. (1991). Exploring differences in males' and females' processing strategies. J. Consum. Res. 18, 63–70. doi: 10.1086/209241

Miller, G. A. (1956). The magic number seven plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97. doi: 10.1037/h0043158

Misuraca, R. (2013). Do too many choices have negative consequences? An empirical review. Troppa scelta ha veramente conseguenze negative? Una rassegna di studi empirici. G. Ital. Psicol. 35, 129–154,

Misuraca, R., Ceresia, F., Nixon, A. E., and Scaffidi Abbate, C. (2021a). When is more really more? The effect of brands on choice overload in adolescents. J. Consum. Mark. 38, 168–177. doi: 10.1108/JCM-08-2020-4021

Misuraca, R., Ceresia, F., Teuscher, U., and Faraci, P. (2019). The role of the brand on choice overload. Mind Soc. 18, 57–76. doi: 10.1007/s11299-019-00210-7

Misuraca, R., and Faraci, P. (2021). Choice overload: A study on children, adolescents, adults and seniors/L’effetto del sovraccarico di scelta: un’indagine su bambini, adolescent, adulti e anziani. Ricerche Psicol. 43, 835–847,

Misuraca, R., Faraci, P., Gangemi, A., Carmeci, F. A., and Miceli, S. (2015). The decision-making tendency inventory: a new measure to assess maximizing, satisficing, and minimizing. Personal. Individ. Differ. 85, 111–116. doi: 10.1016/j.paid.2015.04.043

Misuraca, R., Faraci, P., Ruthruff, E., and Ceresia, F. (2021b). Are maximizers more normative decision-makers? An experimental investigation of maximizers' susceptibility to cognitive biases. Personal. Individ. Differ. 183:111123. doi: 10.1016/j.paid.2021.111123

Misuraca, R., Faraci, P., and Scaffidi-Abbate, C. (2022). Maximizers’ susceptibility to the effect of frequency vs. percentage format in risk representation. Behav. Sci. 12:496. doi: 10.3390/bs12120496

Misuraca, R., and Fasolo, B. (2018). Maximizing versus satisficing in the digital age: disjoint scales and the case for “construct consensus”. Personal. Individ. Differ. 121, 152–160. doi: 10.1016/j.paid.2017.09.031

Misuraca, R., Reutskaja, E., Fasolo, B., and Iyengar, S. S. (2020). “How much choice is ‘good enough’? Moderators of information and choice overload” in Routledge handbook of bounded rationality . ed. R. Viale (Abingdon, UK: Routledge).

Misuraca, R., and Teuscher, U. (2013). Time flies when you maximize – maximizers and satisficers perceive time differently when making decisions. Acta Psychol. 143, 176–180. doi: 10.1016/j.actpsy.2013.03.004

Misuraca, R., Teuscher, U., and Carmeci, F. A. (2016b). Who are maximizers? Future oriented and highly numerate individuals. Int. J. Psychol. 51, 307–311. doi: 10.1002/ijop.12169

Misuraca, R., Teuscher, U., and Faraci, P. (2016a). Is more choice always worse? Age differences in the overchoice effect. J. Cogn. Psychol. 28, 242–255. doi: 10.1080/20445911.2015.1118107

Miyamoto, Y., Nisbett, R. E., and Masuda, T. (2006). Culture and the physical environment: holistic versus analytic perceptual affordances. Psychol. Sci. 17, 113–119. doi: 10.1111/j.1467-9280.2006.01673.x

Mogilner, C., Rudnick, T., and Iyengar, S. S. (2008). The mere categorization effect: how the presence of categories increases choosers’ perceptions of assortment variety and outcome satisfaction. J. Consum. Res. 35, 202–215. doi: 10.1086/588698

Mogilner, C., Shiv, B., and Iyengar, S. S. (2013). Eternal quest for the best: sequential (vs. simultaneous) option presentation undermines choice commitment. J. Consum. Res. 39, 1300–1312. doi: 10.1086/668534

Morrin, M., Broniarczyk, S. M., and Inman, J. J. (2012). Plan format and participation in 401 (k) plans: the moderating role of investor knowledge. J. Public Policy Mark. 31, 254–268. doi: 10.1509/jppm.10.122

Payne, J. W., Bettman, J. R., and Johnson, E. J. (1993). The adaptive decision maker . Cambridge: Cambridge University Press.

Pieters, R., and Warlop, L. (1999). Visual attention during brand choice: the impact of time pressure and task motivation. Int. J. Res. Mark. 16, 1–16. doi: 10.1016/S0167-8116(98)00022-6

Polman, E. (2012). Effects of self-other decision making on regulatory focus and choice overload. J. Pers. Soc. Psychol. 102, 980–993. doi: 10.1037/a0026966

Ratner, R. K., and Kahn, B. E. (2002). The impact of private versus public consumption on variety-seeking behavior. J. Consum. Res. 29, 246–257. doi: 10.1086/341574

Reutskaja, E., Cheek, N. N., Iyengar, S., and Schwartz, B. (2022). Choice deprivation, choice overload, and satisfaction with choices across six nations. J. Int. Mark. 30, 18–34. doi: 10.1177/1069031X211073821

Reutskaja, E., Lindner, A., Nagel, R., Andersen, R. A., and Camerer, C. F. (2018). Choice overload reduces neural signatures of choice set value in dorsal striatum and anterior cingulate cortex. Nat. Hum. Behav. 2, 925–935. doi: 10.1038/s41562-018-0440-2

Reutskaja, E., Nagel, R., Camerer, C. F., and Rangel, A. (2011). Search dynamics in consumer choice under time pressure: an eye-tracking study. Am. Econ. Rev. 101, 900–926. doi: 10.1257/aer.101.2.900

Rotter, J. B. (1966). Generalized expectancies for internal versus external control of reinforcement. Psychol. Monogr. Gen. Appl. 80, 1–28. doi: 10.1037/h0092976

Scheibehenne, B., Greifeneder, R., and Todd, P. M. (2009). What moderates the too-much-choice effect? Psychol. Mark. 26, 229–253. doi: 10.1002/mar.20271

Scheibehenne, B., Greifeneder, R., and Todd, P. M. (2010). Can there ever be too many options? A meta-analytic review of choice overload. J. Consum. Res. 37, 409–425.

Schlottmann, A., and Wilkening, F. (2011). Judgment and decision making in young children . Cambridge: Cambridge University Press.

Schraw, G., Flowerday, T., and Reisetter, M. F. (1998). The role of choice in reader engagement. J. Educ. Psychol. 90, 705–714. doi: 10.1037/0022-0663.90.4.705

Schwartz, B. (2004). The paradox of choice: why more is less . New York, NY: Ecco.

Shah, A. M., and Wolford, G. (2007). Buying behavior as a function of parametric variation of number of choices. Psychol. Sci. 18, 369–370. doi: 10.1111/j.1467-9280.2007.01906.x

Simon, H. A. (1957). Models of man: social and rational . Oxford: John Wiley & Sons.

Slovic, P., Finucane, M. L., Peters, E., and MacGregor, D. G. (2007). The affect heuristic. Eur. J. Oper. Res. 177, 1333–1352. doi: 10.1016/j.ejor.2005.04.006

Spassova, G., and Isen, A. M. (2013). Positive affect moderates the impact of assortment size on choice satisfaction. J. Retail. 89, 397–408. doi: 10.1016/j.jretai.2013.05.003

Stankov, L., and Crawford, J. D. (1996). Confidence judgments in studies of individual differences. Personal. Individ. Differ. 21, 971–986. doi: 10.1016/S0191-8869(96)00130-4

Tanius, B. E., Wood, S., Hanoch, Y., and Rice, T. (2009). Aging and choice: applications to Medicare part D. Judgm. Decis. Mak. 4, 92–101. doi: 10.1017/S1930297500000735

Taylor, S. E., and Brown, J. D. (1988). Illusion and well-being: a social psychological perspective on mental health. Psychol. Bull. 103, 193–210. doi: 10.1037/0033-2909.103.2.193

Townsend, C., and Kahn, B. E. (2014). The “visual preference heuristic”: the influence of visual versus verbal depiction on assortment processing, perceived variety, and choice overload. J. Consum. Res. 40, 993–1015. doi: 10.1086/673521

Zuckerman, M., Porac, J., Latin, D., Smith, R., and Deci, E. L. (1978). On the importance of self-determination for intrinsically motivated behavior. Personal. Soc. Psychol. Bull. 4, 443–446. doi: 10.1177/014616727800400317

Keywords: choice-overload, decision-making, choice set complexity, decision task difficulty, decision goal, decision-making tendency

Citation: Misuraca R, Nixon AE, Miceli S, Di Stefano G and Scaffidi Abbate C (2024) On the advantages and disadvantages of choice: future research directions in choice overload and its moderators. Front. Psychol . 15:1290359. doi: 10.3389/fpsyg.2024.1290359

Received: 07 September 2023; Accepted: 24 April 2024; Published: 09 May 2024.

Reviewed by:

Copyright © 2024 Misuraca, Nixon, Miceli, Di Stefano and Scaffidi Abbate. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Raffaella Misuraca, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

IMAGES

  1. Research Methods: Advantages And Disadvantages Essay Example

    self report research method advantages and disadvantages

  2. Advantages and Disadvantages of Observation Method

    self report research method advantages and disadvantages

  3. Research methods

    self report research method advantages and disadvantages

  4. 4 types of research

    self report research method advantages and disadvantages

  5. Research Qualitative Methods: Definition and Examples

    self report research method advantages and disadvantages

  6. (PDF) Measuring personality constructs: The advantages and

    self report research method advantages and disadvantages

VIDEO

  1. Plus One Sociology- Chapter. 5 Research Methods (Part .6)

  2. Self Study Report 06-22

  3. Metho 4: Good Research Qualities / Research Process / Research Methods Vs Research Methodology

  4. Survey Research/types and advantages of survey research

  5. 21

  6. MPT Method Advantages & Disadvantages

COMMENTS

  1. The Use of Self-Report Data in Psychology

    Self-report data is gathered typically in paper-and-pencil or electronic format or sometimes through an interview. Self-reporting is commonly used in psychological studies because it can yield valuable and diagnostic information to a researcher or a clinician. This article explores examples of how self-report data is used in psychology.

  2. (PDF) The Promise and Pitfalls of Self-report: Development, research

    Such methods offer several advantages, such as capturing the construct in the moment (i.e., when the events are fresh on respondents' minds) or providing a better understanding of the temporal ...

  3. PDF The Self-Report Method

    (Allport, 1927) right up to the present advantages and the disadvantages of the self- (Dunning, Heath, & Suls, 2005). report method. Next, we examine the conver- The psychological processes underlying an gence of self-reports with other methods of act of self-reporting are now understood to assessing personality.

  4. PDF Measuring Personality Constructs: The Advantages and Disadvantages of

    on the self-report method in personality psychology, it will be argued that researchers could maximize the validity of the measurement of personality ... Advantages and disadvantages of self-reports ... published in the Journal of Research in Personality in 2003 used self-report measures. Moreover, in 70% of these studies, the self-report was ...

  5. Self-Report Method

    The most common method is self-report, in which people respond to questions about themselves regarding a wide variety of issues such as personality traits, moods, thoughts, attitudes, preferences, and behaviors. In fact, much of social science knowledge and theory are based largely on self-report data. ADVANTAGES OF SELF-REPORT

  6. The self-report method.

    The goal of this chapter is more limited: to provide a brief guide to nonexpert researchers interested in using the self-report method to assess personality. We begin by delineating three categories of self-reports. We then review the advantages and the disadvantages of the self-report method.

  7. Self-report study

    A self-report study is a type of survey, questionnaire, or poll in which respondents read the question and select a response by themselves without any outside interference. A self-report is any method which involves asking a participant about their feelings, attitudes, beliefs and so on. Examples of self-reports are questionnaires and interviews; self-reports are often used as a way of gaining ...

  8. The Science of Self-Report

    The issue of self-report as a primary tool in research and as an unavoidable component in health care is of central concern to medical and social science researchers and medical and psychology practitioners, and many other scientists. Drawing on the expertise of disciplines ranging from anthropology to sociology, the conference's 32 speakers ...

  9. Self‐Report Methods

    Summary. This chapter discusses procedures for constructing self-report methods, such as interviews and questionnaires. The advantages of self-report are that it gives the person's own perspective, and that there is no other way to access the person's own experience. There are both qualitative and quantitative approaches to self-report.

  10. Constructing and Evaluating Self-Report Measures

    A self-report measure, as the name implies, is a measure where the respondent supplies information about him or herself. Such information may include self-reports of behaviors, physical states or emotional states, attitudes, beliefs, personality constructs, and self-judged ability among others. A self-report may be obtained via questionnaire ...

  11. Self‐Report Methods

    Summary. This chapter discusses procedures for constructing self-report methods, such as interviews and questionnaires. The advantages of self-report are that it gives the person's own perspective, and that there is no other way to access the person's own experience. There are both qualitative and quantitative approaches to self-report.

  12. PDF Commentary: Self-Report is Indispensable to Assess Students' Learning

    Overall, to make progress in research on learning and instruction, it is often useful to employ a mix of quantitative and qualitative self-report, with qualitative methods used to explore new territory and gauge in-depth explanations, and quantitative method to test theoretical hypotheses in more rigorous ways (see, e.g., Pekrun et al., 2002). 2.

  13. Self-reported and objectively assessed knowledge of evidence-based

    Self-report instruments have advantages such as simple administration, low costs and greater feasibility. ... In Norway, the exposure of EBP terminology, critical appraisal skills and research methodology in teaching and learning was in general less for students at the bachelor's as compared to the master's level. In Canada, the exposure ...

  14. (PDF) Self‐Report Questionnaires

    In this entry a brief overview of the main characteristics of self-report questionnaires is presented along with advantages and disadvantages to be used in the field of clinical psychology, both ...

  15. Self-Report Measures

    Definition. A common form of data collection in clinical research and practice is administration of a self-report measure in which an individual is asked to provide information about their subjective experiences and behaviors. Although less common, an individual's caregiver may also be asked to complete self-report measures on that individual ...

  16. Self‐Report Questionnaires

    In this entry a brief overview of the main characteristics of self-report questionnaires is presented along with advantages and disadvantages to be used in the field of clinical psychology, both in research and practice, the crucial steps to develop self-report questionnaires, and description of the reliability and validity of self-report ...

  17. Assessing the validity and reliability of self-report data on

    A systematic review of measurement methods for OC use found that the majority (71%) of research studies relied solely on self-report measures (such as interviewer or self-administered questionnaires) rather than objective measures, and the terminology used to describe OC use (such as "continuation", "compliance" and "adherence ...

  18. Measuring bias in self-reported data

    Response bias is a widely discussed phenomenon in behavioural and healthcare research where self-reported data are used; it occurs when individuals offer self-assessed measures of some phenomenon. There are many reasons individuals might offer biased estimates of self-assessed behaviour, ranging from a misunderstanding of what a proper ...

  19. Self-Report Techniques

    Share : Self-report techniques describe methods of gathering data where participants provide information about themselves without interference from the experimenter. Such techniques can include questionnaires, interviews, or even diaries, and ultimately will require giving responses to pre-set questions. Evaluation of self-report methods.

  20. Comparison of Self-Report and Official Data for Measuring Crime

    The self-report methodology continues to advance in terms of both its application to new substantive areas and the improvement of its design. ... While computer-assisted administration of sensitive questions provides obvious advantages in terms of efficiency of presentation and data collection, the key question is the difference in the ...

  21. Self-Report

    Disadvantages. Fixed choice questions lack flexibility and forces people to answer - lowers validity. Social desirability bias. Acquiescence - yes more then no or just agree. Set response. Question may be misunderstood - lowers reliability. Low response rate.

  22. A comparison of self-reported and observational work sampling ...

    literature discusses the advantages and disadvantages of these two techniques. Although the self-reporting technique has been advocated as having certain advantages over the observational method (Box 1), few work sampling studies of nurses in a ward setting have employed this method.9'13,14 Box 1 Advantages of the self-reporting work sampling ...

  23. On the advantages and disadvantages of choice: future research

    Research in psychology demonstrated the advantages of being able to make choices from a variety of alternatives, particularly when compared to no choice at all. Having the possibility to choose, indeed, enhances individuals' feeling of self-determination, motivation, performance, well-being, and satisfaction with life (e.g., Zuckerman et al ...