Psychologist World

Learn More Psychology

  • Memory Psychology

10 Influential Memory Theories and Studies in Psychology

Discover the experiments and theories that shaped our understanding of how we develop and recall memories..

Permalink Print   |  

10 Influential Memory Theories and Studies in Psychology

How do our memories store information? Why is it that we can recall a memory at will from decades ago, and what purpose does forgetting information serve?

The human memory has been the subject of investigation among many 20th Century psychologists and remains an active area of study for today’s cognitive scientists. Below we take a look at some of the most influential studies, experiments and theories that continue to guide our understanding of the function of memory.

1 Multi-Store Model

(atkinson & shiffrin, 1968).

An influential theory of memory known as the multi-store model was proposed by Richard Atkinson and Richard Shiffrin in 1968. This model suggested that information exists in one of 3 states of memory: the sensory, short-term and long-term stores . Information passes from one stage to the next the more we rehearse it in our minds, but can fade away if we do not pay enough attention to it. Read More

Information enters the memory from the senses - for instance, the eyes observe a picture, olfactory receptors in the nose might smell coffee or we might hear a piece of music. This stream of information is held in the sensory memory store , and because it consists of a huge amount of data describing our surroundings, we only need to remember a small portion of it. As a result, most sensory information ‘ decays ’ and is forgotten after a short period of time. A sight or sound that we might find interesting captures our attention, and our contemplation of this information - known as rehearsal - leads to the data being promoted to the short-term memory store , where it will be held for a few hours or even days in case we need access to it.

The short-term memory gives us access to information that is salient to our current situation, but is limited in its capacity.

Therefore, we need to further rehearse information in the short-term memory to remember it for longer. This may involve merely recalling and thinking about a past event, or remembering a fact by rote - by thinking or writing about it repeatedly. Rehearsal then further promotes this significant information to the long-term memory store, where Atkinson and Shiffrin believed that it could survive for years, decades or even a lifetime.

Key information regarding people that we have met, important life events and other important facts makes it through the sensory and short-term memory stores to reach the long-term memory .

Learn more about Atkinson and Shiffrin’s Multi-Store Model

research study on memory

2 Levels of Processing

(craik & lockhart, 1972).

Fergus Craik and Robert Lockhart were critical of explanation for memory provided by the multi-store model, so in 1972 they proposed an alternative explanation known as the levels of processing effect . According to this model, memories do not reside in 3 stores; instead, the strength of a memory trace depends upon the quality of processing , or rehearsal , of a stimulus . In other words, the more we think about something, the more long-lasting the memory we have of it ( Craik & Lockhart , 1972). Read More

Craik and Lockhart distinguished between two types of processing that take place when we make an observation : shallow and deep processing. Shallow processing - considering the overall appearance or sound of something - generally leads to a stimuli being forgotten. This explains why we may walk past many people in the street on a morning commute, but not remember a single face by lunch time.

Deep (or semantic) processing , on the other hand, involves elaborative rehearsal - focusing on a stimulus in a more considered way, such as thinking about the meaning of a word or the consequences of an event. For example, merely reading a news story involves shallow processing, but thinking about the repercussions of the story - how it will affect people - requires deep processing, which increases the likelihood of details of the story being memorized.

In 1975, Craik and another psychologist, Endel Tulving , published the findings of an experiment which sought to test the levels of processing effect.

Participants were shown a list of 60 words, which they then answered a question about which required either shallow processing or more elaborative rehearsal. When the original words were placed amongst a longer list of words, participants who had conducted deeper processing of words and their meanings were able to pick them out more efficiently than those who had processed the mere appearance or sound of words ( Craik & Tulving , 1975).

Learn more about Levels of Processing here

research study on memory

3 Working Memory Model

(baddeley & hitch, 1974).

Whilst the Multi-Store Model (see above) provided a compelling insight into how sensory information is filtered and made available for recall according to its importance to us, Alan Baddeley and Graham Hitch viewed the short-term memory (STM) store as being over-simplistic and proposed a working memory model (Baddeley & Hitch, 1974), which replace the STM.

The working memory model proposed 2 components - a visuo-spatial sketchpad (the ‘inner eye’) and an articulatory-phonological loop (the ‘inner ear’), which focus on a different types of sensory information. Both work independently of one another, but are regulated by a central executive , which collects and processes information from the other components similarly to how a computer processor handles data held separately on a hard disk. Read More

According to Baddeley and Hitch, the visuo-spatial sketchpad handles visual data - our observations of our surroundings - and spatial information - our understanding of objects’ size and location in our environment and their position in relation to ourselves. This enables us to interact with objects: to pick up a drink or avoid walking into a door, for example.

The visuo-spatial sketchpad also enables a person to recall and consider visual information stored in the long-term memory. When you try to recall a friend’s face, your ability to visualize their appearance involves the visuo-spatial sketchpad.

The articulatory-phonological loop handles the sounds and voices that we hear. Auditory memory traces are normally forgotten but may be rehearsed using the ‘inner voice’; a process which can strengthen our memory of a particular sound.

Learn more about Baddeley and Hitch’s working memory model here

research study on memory

4 Miller’s Magic Number

(miller, 1956).

Prior to the working memory model, U.S. cognitive psychologist George A. Miller questioned the limits of the short-term memory’s capacity. In a renowned 1956 paper published in the journal Psychological Review , Miller cited the results of previous memory experiments, concluding that people tend only to be able to hold, on average, 7 chunks of information (plus or minus two) in the short-term memory before needing to further process them for longer storage. For instance, most people would be able to remember a 7-digit phone number but would struggle to remember a 10-digit number. This led to Miller describing the number 7 +/- 2 as a “magical” number in our understanding of memory. Read More

But why are we able to remember the whole sentence that a friend has just uttered, when it consists of dozens of individual chunks in the form of letters? With a background in linguistics, having studied speech at the University of Alabama, Miller understood that the brain was able to ‘chunk’ items of information together and that these chunks counted towards the 7-chunk limit of the STM. A long word, for example, consists of many letters, which in turn form numerous phonemes. Instead of only being able to remember a 7-letter word, the mind “recodes” it, chunking the individual items of data together. This process allows us to boost the limits of recollection to a list of 7 separate words.

Miller’s understanding of the limits of human memory applies to both the short-term store in the multi-store model and Baddeley and Hitch’s working memory. Only through sustained effort of rehearsing information are we able to memorize data for longer than a short period of time.

Read more about Miller’s Magic Number here

research study on memory

5 Memory Decay

(peterson and peterson, 1959).

Following Miller’s ‘magic number’ paper regarding the capacity of the short-term memory, Peterson and Peterson set out to measure memories’ longevity - how long will a memory last without being rehearsed before it is forgotten completely?

In an experiment employing a Brown-Peterson task, participants were given a list of trigrams - meaningless lists of 3 letters (e.g. GRT, PXM, RBZ) - to remember. After the trigrams had been shown, participants were asked to count down from a number, and to recall the trigrams at various periods after remembering them. Read More

The use of such trigrams makes it impracticable for participants to assign meaning to the data to help encode them more easily, while the interference task prevented rehearsal, enabling the researchers to measure the duration of short-term memories more accurately.

Whilst almost all participants were initially able to recall the trigrams, after 18 seconds recall accuracy fell to around just 10%. Peterson and Peterson’s study demonstrated the surprising brevity of memories in the short-term store, before decay affects our ability to recall them.

Learn more about memory decay here

research study on memory

6 Flashbulb Memories

(brown & kulik, 1977).

There are particular moments in living history that vast numbers of people seem to hold vivid recollections of. You will likely be able to recall such an event that you hold unusually detailed memories of yourself. When many people learned that JFK, Elvis Presley or Princess Diana died, or they heard of the terrorist attacks taking place in New York City in 2001, a detailed memory seems to have formed of what they were doing at the particular moment that they heard such news.

Psychologists Roger Brown and James Kulik recognized this memory phenomenon as early as 1977, when they published a paper describing flashbulb memories - vivid and highly detailed snapshots created often (but not necessarily) at times of shock or trauma. Read More

We are able to recall minute details of our personal circumstances whilst engaging in otherwise mundane activities when we learnt of such events. Moreover, we do not need to be personally connected to an event for it to affect us, and for it lead to the creation of a flashbulb memory.

Learn more about Flashbulb Memories here

research study on memory

7 Memory and Smell

The link between memory and sense of smell helps many species - not just humans - to survive. The ability to remember and later recognize smells enables animals to detect the nearby presence of members of the same group, potential prey and predators. But how has this evolutionary advantage survived in modern-day humans?

Researchers at the University of North Carolina tested the olfactory effects on memory encoding and retrieval in a 1989 experiment. Male college students were shown a series of slides of pictures of females, whose attractiveness they were asked to rate on a scale. Whilst viewing the slides, the participants were exposed to pleasant odor of aftershave or an unpleasant smell. Their recollection of the faces in the slides was later tested in an environment containing either the same or a different scent. Read More

The results showed that participants were better able to recall memories when the scent at the time of encoding matched that at the time of recall (Cann and Ross, 1989). These findings suggest that a link between our sense of smell and memories remains, even if it provides less of a survival advantage than it did for our more primitive ancestors.

8 Interference

Interference theory postulates that we forget memories due to other memories interfering with our recall. Interference can be either retroactive or proactive: new information can interfere with older memories (retroactive interference), whilst information we already know can affect our ability to memorize new information (proactive interference).

Both types of interference are more likely to occur when two memories are semantically related, as demonstrated in a 1960 experiment in which two groups of participants were given a list of word pairs to remember, so that they could recall the second ‘response’ word when given the first as a stimulus. A second group was also given a list to learn, but afterwards was asked to memorize a second list of word pairs. When both groups were asked to recall the words from the first list, those who had just learnt that list were able to recall more words than the group that had learnt a second list (Underwood & Postman, 1960). This supported the concept of retroactive interference: the second list impacted upon memories of words from the first list. Read More

Interference also works in the opposite direction: existing memories sometimes inhibit our ability to memorize new information. This might occur when you receive a work schedule, for instance. When you are given a new schedule a few months later, you may find yourself adhering to the original times. The schedule that you already knew interferes with your memory of the new schedule.

9 False Memories

Can false memories be implanted in our minds? The idea may sound like the basis of a dystopian science fiction story, but evidence suggests that memories that we already hold can be manipulated long after their encoding. Moreover, we can even be coerced into believing invented accounts of events to be true, creating false memories that we then accept as our own.

Cognitive psychologist Elizabeth Loftus has spent much of her life researching the reliability of our memories; particularly in circumstances when their accuracy has wider consequences, such as the testimonials of eyewitness in criminal trials. Loftus found that the phrasing of questions used to extract accounts of events can lead witnesses to attest to events inaccurately. Read More

In one experiment, Loftus showed a group of participants a video of a car collision, where the vehicle was travelling at a one of a variety of speeds. She then asked them the car’s speed using a sentence whose depiction of the crash was adjusted from mild to severe using different verbs. Loftus found when the question suggested that the crash had been severe, participants disregarded their video observation and vouched that the car had been travelling faster than if the crash had been more of a gentle bump (Loftus and Palmer, 1974). The use of framed questions, as demonstrated by Loftus, can retroactively interfere with existing memories of events.

James Coan (1997) demonstrated that false memories can even be produced of entire events. He produced booklets detailing various childhood events and gave them to family members to read. The booklet given to his brother contained a false account of him being lost in a shopping mall, being found by an older man and then finding his family. When asked to recall the events, Coan’s brother believed the lost in a mall story to have actually occurred, and even embellished the account with his own details (Coan, 1997).

Read more about false memories here

research study on memory

10 The Weapon Effect on Eyewitness Testimonies

(johnson & scott, 1976).

A person’s ability to memorize an event inevitably depends not just on rehearsal but also on the attention paid to it at the time it occurred. In a situation such as an bank robbery, you may have other things on your mind besides memorizing the appearance of the perpetrator. But witness’s ability to produce a testimony can sometimes be affected by whether or not a gun was involved in a crime. This phenomenon is known as the weapon effect - when a witness is involved in a situation in which a weapon is present, they have been found to remember details less accurately than a similar situation without a weapon. Read More

The weapon effect on eyewitness testimonies was the subject of a 1976 experiment in which participants situated in a waiting room watched as a man left a room carrying a pen in one hand. Another group of participants heard an aggressive argument, and then saw a man leave a room carrying a blood-stained knife.

Later, when asked to identify the man in a line-up, participants who saw the man carrying a weapon were less able to identify him than those who had seen the man carrying a pen (Johnson & Scott, 1976). Witnesses’ focus of attention had been distracted by a weapon, impeding their ability to remember other details of the event.

Which Archetype Are You?

Which Archetype Are You?

Are You Angry?

Are You Angry?

Windows to the Soul

Windows to the Soul

Are You Stressed?

Are You Stressed?

Attachment & Relationships

Attachment & Relationships

Memory Like A Goldfish?

Memory Like A Goldfish?

31 Defense Mechanisms

31 Defense Mechanisms

Slave To Your Role?

Slave To Your Role?

Which Archetype Are You?

Are You Fixated?

Are You Fixated?

Interpret Your Dreams

Interpret Your Dreams

How to Read Body Language

How to Read Body Language

How to Beat Stress and Succeed in Exams

research study on memory

More on Memory Psychology

Test your short-term memory with this online feature.

Memory Test

False Memories

How false memories are created and can affect our ability to recall events.

Why Do We Forget?

Why do we forget information? Find out in this fascinating article exploring...

Conditioned Behavior

What is conditioning? What Pavlov's dogs experiment teaches us about how we...

Interrupt To Remember?

Explanation of the Zeigarnik effect, whereby interruption of a task can lead to...

Sign Up for  Unlimited Access

Psychologist World

  • Psychology approaches, theories and studies explained
  • Body Language Reading Guide
  • How to Interpret Your Dreams Guide
  • Self Hypnosis Downloads
  • Plus More Member Benefits

You May Also Like...

Nap for performance, brainwashed, dark sense of humor linked to intelligence, psychology of color, master body language, making conversation, why do we dream, persuasion with ingratiation, psychology  guides.

Learn Body Language Reading

Learn Body Language Reading

How To Interpret Your Dreams

How To Interpret Your Dreams

Overcome Your Fears and Phobias

Overcome Your Fears and Phobias

Psychology topics, learn psychology.

Sign Up

  • Access 2,200+ insightful pages of psychology explanations & theories
  • Insights into the way we think and behave
  • Body Language & Dream Interpretation guides
  • Self hypnosis MP3 downloads and more
  • Behavioral Approach
  • Eye Reading
  • Stress Test
  • Cognitive Approach
  • Fight-or-Flight Response
  • Neuroticism Test

© 2024 Psychologist World. Home About Contact Us Terms of Use Privacy & Cookies Hypnosis Scripts Sign Up

  • U.S. Department of Health & Human Services

National Institutes of Health (NIH) - Turning Discovery into Health

  • Virtual Tour
  • Staff Directory
  • En Español

You are here

News releases.

News Release

Tuesday, June 8, 2021

Study shows how taking short breaks may help our brains learn new skills

NIH scientists discover that the resting brain repeatedly replays compressed memories of what was just practiced.

Christopher G. Thomas

In a study of healthy volunteers, National Institutes of Health researchers have mapped out the brain activity that flows when we learn a new skill, such as playing a new song on the piano, and discovered why taking short breaks from practice is a key to learning. The researchers found that during rest the volunteers’ brains rapidly and repeatedly replayed faster versions of the activity seen while they practiced typing a code. The more a volunteer replayed the activity the better they performed during subsequent practice sessions, suggesting rest strengthened memories.

“Our results support the idea that wakeful rest plays just as important a role as practice in learning a new skill. It appears to be the period when our brains compress and consolidate memories of what we just practiced,” said Leonardo G. Cohen, M.D., senior investigator at the NIH’s National Institute of Neurological Disorders and Stroke (NINDS) and the senior author of the study published in Cell Reports. “Understanding this role of neural replay may not only help shape how we learn new skills but also how we help patients recover skills lost after neurological injury like stroke.”

The study was conducted at the NIH Clinical Center . Dr. Cohen’s team used a highly sensitive scanning technique, called magnetoencephalography, to record the brain waves of 33 healthy, right-handed volunteers as they learned to type a five-digit test code with their left hands. The subjects sat in a chair and under the scanner’s long, cone-shaped cap. An experiment began when a subject was shown the code “41234” on a screen and asked to type it out as many times as possible for 10 seconds and then take a 10 second break. Subjects were asked to repeat this cycle of alternating practice and rest sessions a total of 35 times.

During the first few trials, the speed at which subjects correctly typed the code improved dramatically and then leveled off around the 11th cycle. In a previous study, led by former NIH postdoctoral fellow Marlene Bönstrup, M.D., Dr. Cohen’s team showed that most of these gains happened during short rests , and not when the subjects were typing. Moreover, the gains were greater than those made after a night’s sleep and were correlated with a decrease in the size of brain waves, called beta rhythms. In this new report, the researchers searched for something different in the subjects’ brain waves.

“We wanted to explore the mechanisms behind memory strengthening seen during wakeful rest. Several forms of memory appear to rely on the replaying of neural activity, so we decided to test this idea out for procedural skill learning,” said Ethan R. Buch, Ph.D., a staff scientist on Dr. Cohen’s team and leader of the study.

To do this, Leonardo Claudino, Ph.D., a former postdoctoral fellow in Dr. Cohen’s lab, helped Dr. Buch develop a computer program which allowed the team to decipher the brain wave activity associated with typing each number in the test code.

The program helped them discover that a much faster version – about 20 times faster - of the brain activity seen during typing was replayed during the rest periods. Over the course of the first eleven practice trials, these compressed versions of the activity were replayed many times - about 25 times - per rest period. This was two to three times more often than the activity seen during later rest periods or after the experiments had ended.

Interestingly, they found that the frequency of replay during rest predicted memory strengthening. In other words, the subjects whose brains replayed the typing activity more often showed greater jumps in performance after each trial than those who replayed it less often.

“During the early part of the learning curve we saw that wakeful rest replay was compressed in time, frequent, and a good predictor of variability in learning a new skill across individuals,” said Dr. Buch. “This suggests that during wakeful rest the brain binds together the memories required to learn a new skill.”

As expected, the team discovered that the replay activity often happened in the sensorimotor regions of the brain, which are responsible for controlling movements. However, they also saw activity in other brain regions, namely the hippocampus and entorhinal cortex.

“We were a bit surprised by these last results. Traditionally, it was thought that the hippocampus and entorhinal cortex may not play such a substantive role in procedural memory. In contrast, our results suggest that these regions are rapidly chattering with the sensorimotor cortex when learning these types of skills,” said Dr. Cohen. “Overall, our results support the idea that manipulating replay activity during waking rest may be a powerful tool that researchers can use to help individuals learn new skills faster and possibly facilitate rehabilitation from stroke.”

This study was supported by the NIH Intramural Research Program at the NINDS.

NINDS  is the nation’s leading funder of research on the brain and nervous system. The mission of NINDS is to seek fundamental knowledge about the brain and nervous system and to use that knowledge to reduce the burden of neurological disease.

About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit www.nih.gov .

NIH…Turning Discovery Into Health ®

Buch et al., Consolidation of human skill linked to waking hippocampo-neocortical replay, Cell Reports, June 8, 2021,  DOI: 10.1016/j.celrep.2021.109193

Connect with Us

  • More Social Media from NIH

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Journal of Law and the Biosciences logo

Evidence of memory from brain data

Emily r d murphy, jesse rissman.

  • Author information
  • Article notes
  • Copyright and License information

Corresponding author. E-mail: [email protected]

PhD, JD. Associate Professor of Law, University of California Hastings College of the Law. Dr Murphy is a neuroscientist and law professor specializing in the intersection of law, policy, brain, and behavior. She writes about brain-based technologies as evidence as well as the implications of advances in neuroscience and understanding of human behavior on various aspects of law and policy.

PhD Associate Professor, Psychology and Psychiatry and Biobehavioral Sciences, University of California, Los Angeles. Dr Rissman is a psychologist whose research focuses on the cognitive and neural mechanisms of human memory. He has pioneered novel memory research techniques using functional magnetic resonance imaging and machine learning, and also writes and teaches about the intersection of neuroscience, ethics, and policy.

Received 2020 May 14; Revised 2020 May 14; Accepted 2020 Jul 31; Collection date 2020 Jan-Dec.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Much courtroom evidence relies on assessing witness memory. Recent advances in brain imaging analysis techniques offer new information about the nature of autobiographical memory and introduce the potential for brain-based memory detection. In particular, the use of powerful machine-learning algorithms reveals the limits of technological capacities to detect true memories and contributes to existing psychological understanding that all memory is potentially flawed. This article first provides the conceptual foundation for brain-based memory detection as evidence. It then comprehensively reviews the state of the art in brain-based memory detection research before establishing a framework for admissibility of brain-based memory detection evidence in the courtroom and considering whether and how such use would be consistent with notions of justice. The central question that this interdisciplinary analysis presents is: if the science is sophisticated enough to demonstrate that accurate, veridical memory detection is limited by biological, rather than technological, constraints, what should that understanding mean for broader legal conceptions of how memory is traditionally assessed and relied upon in legal proceedings? Ultimately, we argue that courtroom admissibility is presently a misdirected pursuit, though there is still much to be gained from advancing our understanding of the biology of human memory.

Keywords: brain, court, evidence, fMRI, machine learning, memory detection

I. INTRODUCTION

In 2008, Aditi Sharma was convicted by an Indian court of killing her fiancé, Udit Bharati. 1 The court relied in part on evidence derived from the so-called Brain Electrical Oscillations Signature (BEOS) test. 2 To take this test, Sharma sat alone in a room wearing a skullcap with protruding wires that measured brain activity under her scalp. She listened to a series of statements detailing some aspects of the murder that were based on the investigators’ understanding. Throughout the test, she said not a word. But when she heard statements such as ‘I had an affair with Udit’, ‘I got arsenic from the shop’, ‘I called Udit’, ‘I gave him the sweets mixed with arsenic’, and ‘The sweets killed Udit’, the computer analyzing her brain activity purported to detect ‘experiential knowledge’ in her brain. 3 In the subsequent 6 months, the same lab provided evidence in the murder convictions of two more people. 4

Recent advances in peer-reviewed research on memory and its detection present an important question for evidence law: what would it mean to be able to detect the contents of a person’s memory? If a brain-based approach were scientifically reliable, would it be admitted as courtroom evidence? Admissibility in court and the persuasiveness (or prejudicial effect) of evidence is often the focus of legal analysis of the new brain science technology. 5 Indeed, some memory detection researchers perceive courtroom admissibility to be the sine qua non for forensic applications of memory detection technology. 6

Both the inventors of BEOS and some US-based researchers insist that their brain-based memory detection technologies are not ‘lie detection’ and should not be painted with the same brush of unreliability and thus inadmissibility. 7 Others have claimed that memory detection technology will soon be admissible as evidence. 8 But ‘admissibility’ is not an inherent quality of novel technology, but rather a complicated legal, factual, and scientific question in a particular case. Because admissibility is a recurrent focal point for some researchers and advocates, and because brain-based memory detection technology has been admitted in some jurisdictions, 9 we offer a framework for the assessment of evidentiary use of brain-based memory detection in trial proceedings. The primary focus for this analysis will be on memories of witnesses and suspects as the most likely targets of forensic applications. 10

The admissibility of memory detection evidence is a complex legal question not just because it requires careful judicial gatekeeping of complex scientific evidence, but also because it directly concerns one of the core ‘testimonial capacities’ of a witness, and thus directly bears on the jury’s task of assessing witness credibility. Part I introduces the conceptual framework for memory detection research being translated to forensic practice. Part II provides an overview of the current state of the technology, with further details for scientific readers published elsewhere. 11 Part III sets up the framework for assessing courtroom admissibility, establishing the nature of the factual and legal issues raised by the current technological capabilities and scientific understanding. Memory detection may also implicate individual constitutional and compulsory process rights, limiting or enabling its use as courtroom evidence. 12 Part IV then analyzes the potential objections to memory detection as courtroom evidence, starting with those that are surmountable challenges before moving onto stronger objections and normative considerations. Finally, Part V concludes by considering the use of the technology outside of the adversarial context. The question that this analysis presents is: if the science is in fact sophisticated enough to demonstrate that accurate, veridical memory detection is limited by biological, rather than technological, constraints, what should that understanding mean for broader legal conceptions of how memory is traditionally assessed and relied upon in legal proceedings? Ultimately, we argue that courtroom admissibility is presently a misdirected pursuit, though there is still much to be gained from advancing our understanding of the biology of human memory.

II. MEMORY DETECTION: THEORY TO FORENSIC PRACTICE

Human memory plays a critical role in many different aspects of law and legal proceedings, not the least of which is witness testimony. For example, decades of scientific research into the nature of memory have recently supported significant structural legal developments, such as changes in pretrial motion practice and jury instructions about evaluating eyewitness identification testimony. 13 Thus, periodic assessment of what we know about memory and can do in terms of its detection is important for legal and interdisciplinary scholars. Parts I and II synthesize several lines of research centering around the question: is it possible to detect the presence (or absence) of a specific memory? The answer depends both on advances in measurement (in technology and behavioral test design) and also on the better understanding of the biological (including psychological) nature and limitations of human memory—an ongoing subject of basic science research.

Significant technological advances have been made in neuroimaging since this literature was last comprehensively reviewed for an interdisciplinary audience, 14 and a key question now is whether remaining limitations are fundamentally technological or biological problems. Technological problems are of the kind that permits researchers and commentators to say ‘in the future, we may be able to…’ based on advances in technology. Biological problems, in contrast, may be true boundary conditions on what type of detection or characterization may be possible. As technology advances, what seemed like biological problems may yield to advanced scrutiny.

II.A. What Is (and Is Not) Memory Detection?

What do we mean when we refer to ‘memory detection’? Let us first distinguish memory detection from lie detection, or ‘truth verification’. 15 The existence (or absence) of a memory trace 16 could theoretically be detected regardless of whether the subject is affirmatively misrepresenting or concealing that information. 17 This article evaluates work aimed at detecting the presence or absence of recognition of some sort of stimuli, rather than deception per se .

Researcher Peter Rosenfeld and colleagues explain that a canonical version of the ‘guilty knowledge test’ used in most memory detection protocols ‘actually does not claim or aim to detect lies; it is instead aimed at detecting whether or not a suspect recognizes information ordinarily known only by guilty perpetrators and, of course, enforcement authorities’. 18 That is, present forms of memory detection require the human designing the test to know something about the ground truth of interest, and obtain or design stimuli and testing protocols to determine whether the subject also has that knowledge. Presently, no brain-based memory detection technology functions as one might imagine ‘mind-reading’ to work, via uncued reconstruction of the subjective contents of a subject’s memory. 19

In theory—and consistent with lived experience—our brains are generally able to distinguish autobiographical memories (such as having personally witnessed or participated in an event) from other sources of event knowledge (such as knowing details of people’s lives that we have read or heard about, or knowing the mere fact that an event occurred at a given time and place). 20 That is, we know the stories of our own lives and can generally tell the difference between our own lives and the events and information in the world around us. Of course, our brains are not perfect at this task, and normal people experience spontaneous memory errors (such as the powerful experience of déjà vu, the subjective feeling of having previously lived through a current, novel experience) 21 as well as imagined or suggested memory errors (sometimes referred to as ‘source confusions’ because we misattribute the source of our event knowledge). 22

Exactly how the brain distinguishes autobiographical memories from other memories, and the base rates of inaccuracies or distortions, is the subject of ongoing memory research. For example, very recent research suggests that different kinds of episodic memories—varying in their degree of autobiographical content—may be neurobiologically distinguishable. 23 A neurobiological distinction would not be surprising. Rather, it would be expected that distinct biological mechanisms underlie how these different types of memories are subjectively experienced. These findings, explored further below in Part II, may have significant import for assumptions about the ecological validity to be assumed from laboratory-based memory creation and detection, even using mock crime scenarios, and real-world memory creation and detection. Each of these characteristics contributes to a technique’s false positive and false negative rate, which are critical measures for any test used to classify an outcome as present/absent.

As with methods of ‘truth verification’, a potential forensic appeal of memory detection is based on an essentialization—the assumption that certain brain activity is more automatic, less under conscious control, and less subject to fabrication, reinterpretation, or concealment than subjective reports or even physiological measurements of the body such as skin conductance, heart rate, breathing rate, and even eye movements. 24 One way this assumption has been tested is through more rigorous attempts to quantify how vulnerable different kinds of memory detection are to countermeasures—deliberately applied behavioral or cognitive strategies for ‘beating the test’ or manipulating the test results. But active countermeasures are only one form of potential distortion; others come from the innate imperfections of human memory. 25 How these ‘sins’ such as transience, absent-mindedness, misattribution, and subsequent misinformation constrain memory detection is a question now squarely raised by the most recent research on functional magnetic resonance imaging (fMRI)-based memory detection, discussed in Part II.

Some proponents of forensic memory detection tend to approach the problem as one of technological limitations, and indeed the literature reviewed below is organized primarily by methodology. More complex technology (fMRI and machine-learning data analysis) has enabled major advances in memory detection and scientific knowledge about the nature of memory. But it has also brought us closer to asking whether the constraints on memory detection may be ‘biological problems’ as well as technological problems. That is, which limitations on memory detection come from the nature of memory itself?

Most obviously, for many aspects of our experiences, a memory beyond a brief sensory trace never gets formed. As an a priori theoretical matter, it will not be possible to detect a memory that was never formed because the information or stimulus was never attended to. But as an empirical matter, it is extremely difficult to test for the true absence of something, and thus it would be extremely difficult for a negative result from a memory detection test to be interpreted as the true absence of a memory and thus a true lack of personal experience. Conversely, lots of things we do not pay attention to may nonetheless get stored into memory at some level. These unattended stimuli are less likely to be consciously remembered and will sometimes be completely forgotten, but other times will influence behavior in subtle ways. 26

A second challenge is that, even if a memory were formed, it may be degraded or forgotten to a degree that it is no longer detectable. This boundary condition is at least theoretically easier to experimentally investigate—a memory trace could be detected at time one, and then be absent when probed at a later time. But the mere act of testing for the memory could, itself, modify or strengthen the memory, thus preventing ‘normal’ degradation and forgetting. 27 The reactivation of a memory through questioning or testing procedures could strengthen a degraded recognition, and/or a subject’s confidence in their recall. 28 How such influences impact the detectability of a memory will have a critical impact upon the marginal utility of brain-based memory detection versus testimony of compliant, but perhaps forgetful, witnesses.

II.B. What Kinds of Memories Are to Be Detected?

The next important concept is to define precisely what kind of memories we are most interested in detecting and what kind of memories various tests actually detect or could detect. For laypersons without a psychology background, this may seem obvious: we would like to detect true/veridical memories, rather than false, inaccurate, or distorted memories. False memories in legal proceedings are an important area of psychological study with many legal implications; detailed consideration of which has been undertaken elsewhere. 29 For our purposes, it is important to hone in on what kinds of memories forensic applications would likely be most interested in detecting: objectively true, autobiographical memories.

Broadly speaking, autobiographical memories can be semantic (facts about yourself such ask knowing the places you have lived and the names of your family members and friends) or episodic (recollection of events that happened to you at a specific time and place). 30 Both are a type of ‘explicit’ or ‘declarative’ memory: generally speaking, those that can be consciously recalled and described verbally. These may be contrasted with ‘implicit’ memories, where current behavior is influenced by prior learning in a non-conscious matter, such as the procedural skill of riding a bike. Although some researchers in this field have limited the categories of forensic interest to episodic memories, there may be types of semantic autobiographical memories of forensic interest such as names of criminal associates whom one has never personally met, targets of attack one has only been told, or one’s true name or place of birth. 31

The taxonomy just described is a drastic simplification of an active and ongoing area of research—just ‘how’ memory should be classified and operationalized—and one that does not seem to be in active dialogue with memory detection researchers. A recent review described a categorization of ‘personal semantic’ (PS) memory—knowledge of one’s past—as an intermediate form of memory between semantic and episodic memory. 32 PS memory has been ‘assumed to be a form of semantic memory, [thus] formal studies of PS have been rare and PS is not well integrated into existing models of semantic memory and knowledge’. 33 The authors describe four ways in which PS memory has been operationalized as autobiographical facts, self-knowledge, repeated events, and autobiographically significant concepts. Depending upon how PS memory is operationalized, in different studies it appears neurobiologically more similar to general semantic memory or episodic memory, meaning that tasks activating each show similar patterns of neural activation on functional brain imaging. Though a sophisticated field of research with a voluminous literature, the difficulties in coming to a consensus about the taxonomic nature of memory are in part because memory is extremely difficult to study. 34

One final thing must be said about a type of autobiographical memory that is not the subject of memory detection studies, despite being extremely relevant as courtroom evidence: memories, or beliefs, about one’s past mental states. Memories of ‘events that happened’ are the subject of brain-based memory detection research. This is not the case for memories of ‘past subjective mental state’, notwithstanding the concept’s obvious relevance to legal issues of mens rea . Of course, a past mens rea could potentially be inferred from the detection of a memory of a specific event or item. For instance, if brain-based memory detection could establish that a defendant had actually seen bullets in a gun’s chamber, such evidence might weigh against the defendant’s claim that he believed the gun was unloaded. But direct detection of past mental states, even if considered as a subtype of autobiographical memory, has not been attempted, and nor is it clear how it could be. Although a single recent study suggested that brain imaging could distinguish between present, active mental states of legal relevance (knowing vs. reckless, to the extent those legal concepts can be operationalized for behavioral study) 35 , the ‘time travel’ problem will not likely be resolved by increasing technological sophistication. 36

III. STATE OF THE ART IN BRAIN-BASED MEMORY DETECTION

With this conceptual framework in mind, brain-based memory detection has biological and technological axes to examine. 37 The biological axis considers and explores how memory actually works in the brain. The technological axis is concerned with the tools used to explore that biology. Advances in understanding on one axis reciprocally inform the other.

Organization of this part by technological method allows understanding of technological improvements and approaches to the challenges discussed above, which helps set the stage for understanding where limitations are biological, rather than technological, in nature. We start with electroencephalography (EEG)-based technology, which uses electrodes to detect electrical brain activity through a subject’s scalp. We then discuss fMRI, which uses powerful magnetic fields to non-invasively detect correlates of brain activity. Those readers who are interested in the high-level summary and analysis may skip ahead to Part II.C.

III.A. E‌EG-Based Technology

There is now a substantial body of research using EEG-based techniques to detect memories, reporting impressive results on accuracy of detection, often exceeding 90 per cent. EEG-based technologies are deployed in various commercial efforts 38 and are in forensic use internationally. 39 This section will first review the basic technique and theory behind EEG-based memory detection, and then report on recent findings and remaining limitations on forensic use. 40 Notably, a small handful of research groups have dominated the EEG-based memory detection research efforts. The large bulk of peer-reviewed publications comes from the lab of Peter Rosenfeld et al at Northwestern University. 41

EEG-based memory detection protocols measure electrical brain activity from dozens of small electrodes placed on the scalp as a subject is presented with a series of stimuli—typically words or pictures shown on a computer screen, or sounds played through headphones or speakers. EEG-based memory detection protocols vary, and subtle variations in those protocols are the subject of intense research because small changes in testing protocol have large impacts on results.

The fundamental concept relies on a test framework exploiting the brain’s different responses to personally meaningful versus non-meaningful stimuli, called the ‘orienting response.’ 42 The logic of EEG-based memory detection protocols is that the person administering the EEG can measure the subject’s brain responses to different stimuli and use them to discriminate between meaningful or non-meaningful information. In this context, a stimulus that is ‘meaningful’ is one that would be salient or significant only to someone with prior knowledge or experience with that stimulus, such that its ‘meaningfulness’ is at least partially based on recognition memory. Thus, the nature of the information presented in the stimuli—which must be carefully selected beforehand by the test administrator—is the most important methodological factor in EEG-based memory detection.

Nearly all EEG-based memory detection protocols attempt to detect a characteristic waveform of electrical activity evoked by meaningful/familiar/salient stimuli, and in most research this waveform is the P300 ‘event related potential’, or ERP. 43 An ERP is a particular spike or dip in brain voltage activity in response to a discrete event (such as a bright light or loud noise). The P300 is a positive ERP occurring between 300 and 1000 milliseconds after the onset of events that are both infrequent and meaningful, with a maximal effect typically observed for those electrodes situated over the parietal lobe. 44 A shorthand way to think of what the P300 represents is a response to stimuli that are ‘rare, recognized, and meaningful’. 45

As mentioned above, EEG-based detection relies on the comparison of brain responses between different categories of stimuli. The majority of protocols is based on the concealed information test (CIT; also known as the ‘guilty knowledge test’) and use three types of stimuli: (i) infrequent ‘probe’ stimuli with information relevant to the event of interest that would only be recognized by someone with prior knowledge (that is, meaningful to some people but meaningless to others); (ii) infrequent ‘target’ stimuli that subjects are explicitly instructed to monitor for and respond to with a unique button-press—these are stimuli that the subject is expected to know, recognize, or be familiar with (meaningful to everyone); and (iii) frequent ‘irrelevant’ stimuli, of no relevance to the event of interest nor of any particular importance to the subject (meaningless to everyone). For all subjects, the target stimuli should elicit a greater P300 response amplitude than the irrelevant stimuli, providing an internal check that the procedure is working and the data are of sufficient quality. Most critically, for ‘guilty’ examinees, probe stimuli should elicit a P300 similar in magnitude to that observed for target stimuli, indicating that their brain ‘knows’ the probe stimuli are not irrelevant. Conversely, the brains of ‘innocent’ examinees should show little no P300 response to the probes, such that this waveform will look more similar to that observed for irrelevant stimuli than for target stimuli. 46

An important point to appreciate is that for EEG-based memory detection, subtle variations in methods can substantially affect the results. The past several years of research have largely been dedicated to these methodological challenges. One methodological constraint of much of EEG-based memory detection deserves particular attention, which is that most studies report results on ‘blocks’ of trials for a given condition, rather than a single trial. Generally speaking, the reason for this is that EEG data are extremely noisy, such that a meaningful ‘signal’ can only be pulled out from background ‘noise’ when averaging responses over several trials of a given condition. 47 As such, to detect the ERP, researchers typically average the EEG samples of repeated presentations of the same stimulus, or of several stimuli in the same category. That is, relatively smooth-looking graphs of P300 responses should be understood to be averages of multiple repeated trials. Indeed, typically several hundred presentations of test items are required for reliable recording of the P300 ERP. 48

Researchers have been studying the use of the P300 ERP to detect ‘concealed information’ since the 1980. The first P300-based protocol was based on the CIT. 49 Eventually, the standard CIT task was determined to be vulnerable to countermeasures, 50 and Rosenfeld and colleagues developed a new countermeasure-resistant variant, which they dubbed the ‘Complex Trial Protocol’ (CTP). The CTP is designed to enhance the difficulty of the standard CIT and reduce the likelihood of success that a subject could secretly designate a category of ‘irrelevant’ probes as ‘relevant’, thus reducing the difference in P300 amplitudes between ‘probes’ and secretly relevant ‘irrelevants’ for guilty subjects—essentially beating the test by creating their own false positives. 51 The reasoning behind the countermeasure vulnerability of the older three-stimulus protocol is that when covert target and probe stimuli competed for attention resources, the P300 response is reduced. This proposed mechanism highlights a particular analytic weakness of needing to use an internal comparison between different categories of stimuli to detect whether one subset is recognized or recalled. 52

As discussed, much of the recent research in EEG-based memory detection has focused in recent years on subtle changes in protocol design. Those that are particularly significant for purposes of assessing forensic capacity and potential admissibility are methodological developments focused on more sophisticated countermeasure vulnerability, issues of ecological validity, and, most recently, the application of pattern recognition techniques to analyze EEG data. The highlights of those areas of research, and consideration of the outstanding limitations for forensic use, are briefly discussed below.

III.A.1. Sophisticated countermeasure vulnerability

Alternative mental actions as countermeasures . Several studies have shown that subtle physical countermeasures, such as instructing participate to wiggle a toe or finger during the presentation of certain stimuli (as a means to increase their salience), can be highly effective at preventing experimenters from obtaining an accurate assessment of concealed knowledge. 53 But countermeasures could be even more covert. Mental countermeasures such as silently ‘saying’ one’s own name in response to different ‘irrelevant’ stimuli would be nearly impossible to detect through even the closest observation of a test subject. 54

But does using mental countermeasures slow down task performance, such that countermeasure use could be detected? Early work with the CTP found that slower reaction times indicated countermeasure use. 55 This telltale slowing seems to work only when subjects execute the covert countermeasure separately from a required ‘I saw it’ motor response. If participants are trained to execute both at the same time, this ‘lumping’ strategy can eliminate the ability to use reaction time as an indication of countermeasure use, though the P300 ERP signal can still detect more than 80 per cent of the ‘guilty’ test subjects who were trained to use this strategy (note that this is still a significant reduction from reported detection thresholds exceeding 90 per cent). 56 That is, there may be countermeasures to countermeasure detection strategies, possibly able to characterize mendacious subjects, though without high degrees of certainty.

Voluntary suppression of memory. Can someone beat a memory detection test by voluntarily suppressing their memories? Two papers in 2013 and 2015 reported that the P300 signal for episodic information could be voluntarily suppressed. 57 But it may be possible to design the task in a slightly easier way so as to protect against this countermeasure. 58 Moreover, it may not be possible to suppress responses to probes targeting semantic memories, such as the knowledge of one’s true name. 59 At present, the effect of countermeasures on well-constructed CTP applications appears to be modest and perhaps available only to highly trained subjects. Nevertheless, a complete understanding of potential countermeasure vulnerability is critical to accurate forensic application of any P300-based memory detection protocol, especially research that can differentiate the effects of task demands from mechanisms of countermeasure deployment.

III.A.2. Ecological validity

Ecological validity is the extent to which laboratory conditions and results translate to those in the real world. A canonical memory detection study with low ecological validity is having a subject study a list of random words, then asking her to respond whether or not certain words were on that list, while attempting to conceal the actual studied list with false responses. Studying a list of random words is not a task that resembles real world or forensically relevant events, and it is unwise to generalize from this kind of task to a forensic task that involves presenting a subject with a list of words that may be relevant to a crime being investigated. Alternatively, studies aiming for greater ecological validity may engage their subjects in mock crime scenarios, such as ‘stealing’ a particular item and then attempting to conceal their ‘crime’ from the examiners. These types of studies began with the earliest efforts at EEG-based memory detection. 60 But with all studies using volunteers, the genuine motivation to engage in and then conceal truly criminal or antisocial actions cannot be replicated, as of course all laboratory ‘thefts’ are designed and sanctioned by researchers.

Use of real-world scenarios and detecting autobiographical memory in individuals. A fundamental question for forensic application of memory detection technology is the extent to which lab-created memories—even those with more realistic mnemonic content than a memorized word list, such as a mock crime—are similar to or meaningfully different from real-world memories. There are many reasons such memories could be different, including the amount of attention a subject is allocating to lab-based, instructed tasks such as stealing a document versus going about normal routines of the day when something unexpected happens.

A recent study has tried to address the detectability of incidentally acquired, real-world memories. 61 Meixner and Rosenfeld had subjects wear a small video camera for 4 hours, as they went about their normal daily routine. The next day, subjects returned to the lab and, while hooked up to the EEG, were shown words associated with activities they experienced the previous day (such as ‘grocery store’), as well as irrelevant words of the same category but not relating to the subject’s personal activities (such as ‘movie theater’ or ‘mall’). 62 The authors reported that the EEG data could be used to perfectly discriminate between the 12 ‘knowledgeable’ subjects who viewed words related to their personal activities and the 12 ‘non-knowledgeable’ subjects who simply viewed irrelevant items. Notably, this was the first P300 study to examine whether autobiographical memory could be detected at an individual level, since most psychological studies of autobiographical memory use group averaged data, which is not helpful for forensic purposes. But, it still presents the question of whether comparisons between subjects are necessary to make a decision about an individual’s ‘knowledgeable’ status. More importantly, the study design does not quite get at the question of whether real-life memories are different, at some important neural level, from lab-created memories (which most studies investigate)—it simply suggests that this particular tool can be used to discriminate between subjects who had or did not have discrete real-life experiences.

Innocent-but-informed participants. Other work focused on assessing the ecological validity of a P300-based CIT demonstrated that prior knowledge of crime details has an effect on detection rates, illustrating potential risks of using probes that may have become known to an innocent subject (such as someone who received instructions to steal a ‘ring’ but did not go through with the crime). 63 These ‘innocent-but-informed’ participants were ‘essentially indistinguishable’ from those who actually committed the mock crime: ‘Simple knowledge of probe items was sufficient to induce a high rate of false positives in innocent participants’. 64 Rosenfeld and colleagues counsel that crime details must remain secret, known only to police, perpetrators, eyewitnesses, and victims. But how feasible is it for investigators to know for certain that the probed details are, indeed, secret? This may be easier said than done, and it is not always possible to know the degree to which critical details about the event in question have been inadvertently disclosed. Further relevant to forensic application, the studies investigating variables reviewed in this section have not distinguished between witnesses (who may be innocent-but-informed) and participants (‘guilty’ subjects), such that variables such as time delays and quality of encoding are poorly understood in applications of detection of witness (rather than suspect) memory.

Time delay between event and test. In lab studies using mock crimes, participants are often tested immediately or a few days after the incident. In the real world, interrogation about an event may come weeks, months, or even years later. One study attempting to quantify the impact of a delay in testing asked students to ‘steal’ an exam copy from a professor’s mailbox. 65 Some students were tested using a P300-based CTP procedure immediately after the theft, whereas others returned to the lab a month later. Researchers found no difference in detection efficiency. This is an encouraging result, but more evidence is needed. The test only used a single probe item (the stolen exam), and—as with other mock crime scenarios—subjects were explicitly instructed to engage in the theft, likely heightening the salience of the central detail. What is presently unknown is how P300-based detection fares over time for peripheral crime details, which may be less robustly encoded, but more likely to be uniquely known only to a perpetrator, and thus useful to minimize a false positive result.

Quality of encoding and incidentally acquired information. Not all information is equally well remembered. This is, of course, an important feature of human memory, as it would be highly inefficient for a lawyer to remember where she parked her car two Tuesdays ago equally as well as the legal standard for summary judgment. How well information is remembered has substantial implications for how well it may be detected. Yet behavioral tests of people’s memory for incidental details of real-world experiences show that sometimes surprisingly little is retained. 66 Rosenfeld’s research group acknowledges that sensitivities in their P300-based tests are less with incidentally acquired information than with well-rehearsed information. 67 Strategies to improve the sensitivity of detection of incidentally acquired information under investigation include providing feedback to focus a participant’s attention on the probe 68 , using an additional ERP component, 69 and combining separately administered testing with the CTP. 70 This is a critically important problem for memory detection, as incident-relevant details important for determining guilt may be only ‘incidentally encoded’, particularly under conditions of stress. Is this a problem that can be addressed with technical advances, such as those explored by Rosenfeld et al? Or will the technical improvements simply reveal the outer boundaries of conditions under which subtle but critical details may be recalled and detected, given that they were not central to the incident but may be necessary for avoiding a false positive? It is well known that stimuli that are unattended during learning are often only weakly remembered, or not remembered at all, when assessed on later recognition test. 71

Existing research has focused on how the use of countermeasures by a guilty subject may lead to missed detection—a false negative because of countermeasures. What has not been firmly established is how often a guilty subject will fail to show a P300 response to a probe stimulus for another reason, such as the fact that he may not have encoded the particular details of a weapon used, because of intoxication, darkness, mental illness, impulsivity, or stress—attributes that may be more prevalent in a criminal defendant population than in a research population. 72

Quality of retrieval environment. A final concern for ecological validity is the effect of stress or other contextual factors on memory ‘retrieval’ as well as encoding. 73 Most P300 studies using volunteer subjects—even those who participate in a mock crime—cannot fully replicate the stress of a real-world, high-stakes memory probe of a suspect (or witness). Although studies generally suggest that stress has a negative effect on episodic memory retrieval, 74 the effect of real-world stress on EEG-based memory detection is not adequately studied.

III.B. fMRI Based Technology

Functional magnetic resonance imaging is a safe, non-invasive, and widely used research and clinical tool. The details of how it works have been reviewed extensively elsewhere in the legal literature, so only a high-level reorientation is provided here for the unfamiliar reader. 75 Functional magnetic resonance imaging uses powerful magnetic fields and precisely tuned radio waves to detect small differences in blood oxygenation that serve as a proxy for neural activity. When a population of neurons becomes active, the brain’s vascular system quickly delivers a supply of richly oxygenated blood to replenish the metabolic needs of those neurons. At present, some fMRI scanners are capable of collecting a snapshot of the blood oxygenation level-dependent signal across the entire human brain every 1 or 2 seconds at reasonably high spatial resolution (ie the images are comprised of 2–3 mm cubes called ‘voxels’). 76 Subjects must lie with their head very still in a large, loud tube, but can perform behavioral tasks during scanning by looking at visual stimuli presented via a projection screen (or a virtual reality headset) 77 , listening to audio stimuli over headphones, and/or making responses using a keypad or button-box.

Two attributes in particular distinguish fMRI-based memory-detection from EEG-based memory detection. The first is technological access to more and different biological sources: fMRI can provide data from the entire brain. Multiple, interconnected brain regions are involved in forming and retrieving memories. One physiological limitation with EEG-based memory detection technologies is that EEG predominantly measures cortical electrical responses—the outer covering of the brain closest to the scalp—but cannot measure signals from ‘deeper’ brain structures such as the hippocampus. Although fMRI offers slower temporal resolution than EEG (that is, brain activity levels are sample on the order of seconds in fMRI, instead of milliseconds in EEG), fMRI offers substantially greater spatial resolution across the entire brain.

The second attribute is technological with respect to analysis capability: the most advanced and interesting fMRI studies of memory detection leverage the power of massive amounts of data obtained from brain scans to actually assess complex network connections and use machine-learning algorithms to recognize subtle patterns in networks, rather than activation in local areas. 78 These analytic techniques, combined with experimental paradigms and research questions directly aimed at assessment of ecological validity concerns, represent significant advancement in the technological aspects of memory detection. These methodological advances permit assessment of where biological constraints may lie that put ultimate boundaries on forensic use.

III.B.1. Early fMRI work on true and false memories

Early fMRI work examining neural correlates of true and false memories reported dissociations—that is, unique differences—between brain areas activated in response to test stimuli in a recognition task asking subjects to remember which stimuli they had previously studied. 79 A recent review of this literature identified as a ‘ubiquitous’ finding the ‘considerable overlap in the neural networks mediating both true and false memories’ for recognition responses to items that share the same ‘gist’ as items that were actually studied during encoding. 80 Many studies also find differences—notably, increased activity in regions related to sensory processing for true as compared to false retrieval, leading to the ‘sensory reactivation hypothesis’ that true memories are associated with the bringing back to mine of more sensory and perceptual details than false memories. 81 Overall, empirical support for the intuitively appealing sensory reactivation hypothesis is mixed, and other dissociations such as different patterns of activity in the medial temporal lobe and prefrontal cortices have been reported. 82 Other work testing episodic memory retrieval in an fMRI scanner a week after viewing a narrative documentary movie reported that the coactivations of certain brain areas were greater when subjects responded correctly to factually accurate statements about the movie, but such coactivations did not differ between responses to inaccurate statements about the movie. 83 Collectively, this work provided some suggestion that activation patterns could differentiate between true and false recognitions, based on distinct memory processes, though no clear potential for diagnostic assessment of true or false memories emerged. That is, there is no particular ‘spot’ in the brain that serves as a litmus test for whether a memory is true or false.

III.B.2. Newer fMRI methods: advanced techniques reveal the biological limitations of memory detection

A limitation of the classic fMRI analysis paradigms is that memories do not exist in discrete regions of the brain. Memories are encoded and stored in networks of brain regions. 84 Moreover, EEG studies using event-related potentials and fMRI studies using classic ‘univariate’ contrasts of brain activity are only equipped to gage the relative level of activation across different regions of brain, and the level of activation provides only rudimentary information about one’s memory state. But newer fMRI analysis methods can make use of the massive amounts of data to assess complex network connections and use machine-learning algorithms to recognize subtle activity patterns in networks, rather than activation in local areas.

This is a substantially more powerful way to analyze complex data, and the remainder of this review will focus on fMRI studies that employ such methods. These methods offer the best hope for reliable forensic memory detection, the greatest insights into the biological limitations of memory detection, and the subtlest challenges for a fact finder to assess the credibility of the techniques used to make claims about the presence or absence of a memory of interest.

III.C. Multi-Voxel Pattern Analysis and Machine-Learning Classifiers

Classic fMRI analysis looks at chunks of the brain: clusters of voxels and regions of interest. By analyzing each voxel separately, this ‘univariate’ analysis approach ignores the rich information that is encoded in the spatial topography of the distributed activation patterns—that is, it spotlights a particular area, but misses patterns in the broader network of brain activity. But the brain is highly interconnected, and not organized as a highly localized series of components.

A newer ‘multivariate’ technique, multi-voxel pattern analysis (MVPA), tries to exploit the information that is represented in the distributed patterns throughout a brain region or even across the entire brain. It is a more sensitive method of analysis because it is more adept at detecting distributed networks of processing. MVPA techniques use machine-learning algorithms to train classifiers on data patterns from test subjects. The classifier learns the distributed ‘neural signatures’ that differentiate unique mental states or behavioral conditions. Once adequately trained, the classifier is then tested on new fMRI data (that it has not been trained on) to determine whether it can accurately classify the condition of a subject’s brain on a given trial based solely on brain data information. 85 Simply put, there is more informational content in fMRI activity ‘patterns’ than is typically detected with conventional fMRI analyses. The accuracy with which the classifier can discriminate trials from Condition A and Condition B gives a quantitative assessment of how reliably two putatively distinct mental states are differentiated by their brain activity patterns.

MVPA has enabled significant advances in memory detection research. In 2010, the first paper to apply an MVPA approach to memory detection in fMRI ‘evaluated whether individuals’ subjective memory experiences, as well as their veridical experiential history, can be decoded from distributed fMRI activity patterns evoked in response to individual stimuli’. 86 Before entering the scanner, participants studied 200 faces for 4 seconds each. In the scanner about an hour later, they were presented with the 200 studied faces interspersed with 200 unstudied faces and pressed a button to indicate whether or not they recognized a given face. Participants were accurate about 70 per cent of the time, giving researchers the ability to examine their brain activity both when they were correct and when their memory failed them. The classifier was first trained to differentiate brain patterns responding to old faces that subjects correctly recognized from brain patterns responding to new faces that they correctly judged to be novel—both situations in which the subjective experience and objective reality of the response were identical. The classifier performed well above chance, with a mean classification accuracy of 83 per cent, and rising to 95 per cent if only the classifier’s ‘most confident’ guesses were considered. But since subjects did not perform perfectly—sometimes misidentifying new faces as having been previously seen, or rejecting old faces as novel—the classifier could also be tested on the ‘subjective’ mnemonic experience. In those scenarios, the classifier performed relatively poorly—near chance—when applied to detect the ‘true’ experiential history of a given stimulus on those trials for which participants made memory errors. That is, the classifier proved to be very good at decoding a participant’s ‘subjective’ memory state, but not nearly as good at detecting the true, veridical, ‘objective’ experiential history of a given stimulus. Subjective recognition, of course, can be susceptible to memory interference—resulting in commonly experienced, but false, memories. 87

Three other aspects of this study deserve separate mention, as they are particularly relevant to assessing the forensic capabilities and limitations of fMRI/MVPA-based memory detection. First, though many stimuli were shown to subjects in order to train the classifier, once trained it could be applied to single trials—that is, a face shown just one time provided sufficient neural information for the classifier to make a categorization decision. 88 This is a significant advantage over EEG-based memory detection paradigms that require multiple presentations to detect an event-related potential, as well as ‘classic’ univariate fMRI analyses that assess averages of trials across conditions. A subsequent study confirmed that the MVPA classifier could decode the memory status of an individual retrieval trial, and attempted to assess the vulnerability of this single-trial assessment to countermeasures by instructing participants who had studied a set of faces to attempt to conceal their true memory state. 89 Participants were instructed to feign the subjective experience of novelty for any faces they actually recognized and to feign the experience of recognition for any faces they did not recognize (eg by recalling someone that the novel face reminded them of). Using only this easy-to-implement countermeasure strategy, participants were able to prevent the experimenters from accurately differentiating brain responses to studied versus novel faces, with the mean classification accuracy dropping to chance level. 90 That is, although MVPA classification of fMRI data can enable single-trial memory assessment, it may still be vulnerable to simple mental countermeasures. 91

Second, the researchers specifically considered whether the same classifier would work across subjects by training it on data from some individuals, but then testing it on data from others. This would be an important feature of any technology to be forensically applied, but depends on an assumption of some unknown degree of consistency across the brains of different people. The classifier worked well across individuals, ‘suggesting high across-participant consistency in memory-related activation patterns’. 92 This is a significant result in that it suggests that, biologically, different people’s brains may be similar enough in how they process memories for technological solutions to be somewhat standardized.

Third, recall that the ‘classifier’ is not a person making a judgment call—it is a machine-learning algorithm, in this case based on regularized logistic regression 93 —assessing patterns of neural data. What the 2010 study suggests is that memories that feel true but are objectively false may have neural signatures quite similar to memories that feel true and are true. 94 If that finding holds, it has substantial implications for the ability to detect true memories and avoid detection of false memories.

III.D. Advances in Experimental Paradigms with MVPA: Real-World Life Tracking, Single-Trial Detection, and Boundary Conditions Revealed

Lab-created memories such as studying a series of faces may be relatively impoverished, from a neural data perspective, as compared to real-life memories that have potential for context and meaning. It is possible that where MVPA-based detection met limits for lab-created memories (in particular, the inability to distinguish objectively false but subjectively experienced memories, and vulnerability to simple countermeasures), additional information available to the classifier from a richer memory set may make real-life memories more distinguishable. This enrichment of the memory experience, and delays between the experience and time of test, also address concerns about ecological validity of research for potential forensic application. 95

To address this, Rissman and colleagues had participants wear a necklace camera for a three-week period while going about their daily lives before returning a week later to be scanned while making memory judgments about sequences of photos from their own life or from others’ lives. 96 After viewing a short sequence of four photographs depicting one event, participants made a self/other judgment and then indicated how strong their experience of recollection or familiarity was for photos judged to be from their own life, or their degree of certainty about photos judged to be from someone else’s life.

Behaviorally, participants were quite good at this task, successfully distinguishing which events were from their own life or someone else’s life on around 80 per cent of trials, with the remaining 20 per cent of trials split between incorrect or ‘unsure’ responses. Indeed, participants performed so well that there was not sufficient data to assess whether the classifier could distinguish objectively true from objectively false (but subjectively experienced as true) memories. That is, the 2016 study was not helpful in distinguishing false from true memories, because there was not enough false memory data to work with. This is perhaps a side effect of the fact that real-life memories are richer than lab-created memories like pictures of faces or words, and perhaps less susceptible to spontaneous false-memory effects. Nevertheless, an fMRI classifier trained to distinguish one’s own life events from others’ life events performed extremely well, succeeding at classifying the self/other status of individual events 91 per cent of the time on average, with no subject’s classification performance below 80 per cent. 97 When the classifier was required to distinguish between trials where participants reported recollecting specific details from those in which they reported only familiarity for the event, it still performed well above chance, with mean accuracy of 72 per cent.

One feature of regularized logistic regression machine-learning classifiers is the ability to create ‘importance maps’, which permit researchers to assess which voxels (and thus, networks of brain areas) are important to the classifier making the decision. Based on these importance maps, self/other classifier distinctions relied on brain areas associated with mnemonic evidence accumulation and decision processes, whereas recollection/familiarity distinctions showed a very different pattern, involving brain regions associated with the retrieval of contextual details about an event.

This study design also addressed the issue of retention interval—that is, the amount of time that elapses between the formation of a memory and the probing of that memory in the magnetic resonance imaging (MRI) scanner. The classifier that was tested on memories that occurred 1–2 weeks before the scan performed just as well as a classifier tested on memories collected 3–4 weeks before the scan, at the beginning of the time participants started wearing the cameras. This suggests that recent memories are not necessarily easier to decode than more remote memories. 98 However, this study only assessed memories that were at most 1 month old. One earlier fMRI study that also used wearable cameras compared memories that were 5 months old to those that were only 36-hours old and found that the older memories evoked a somewhat different profile of brain activity, including less activation of the hippocampus and surrounding medial temporal lobe structures that commonly associated with episodic recollection. 99 Given that some, if not most, potential forensic applications will involve the need to probe memories that are months or years old, more research is needed to determine how reliably classifier-based memory detection will work on older memories, as well as whether classifiers are capable of estimating the age of a probed memory.

As with the previous study, researchers also confirmed that a classifier trained on some individuals will perform well when tested on different participants, suggesting that the underlying brain activity patterns are fairly consistent across subjects. Moreover, the researchers attempted to directly address the question of whether lab-based memories are different from autobiographical memories by using the classifier from the 2010 faces memory study, on brain data from the 2016 self/other life photograph study, and vice versa . In both situations, the classifier still succeeded at predicting a subject’s mnemonic judgment over two-thirds of the time, well above chance. 100

Although the 2016 Rissman et al study could not answer whether real-world true and false memories differed in neural activation patterns, a 2015 fMRI study from a separate research group used MVPA classification methods and a mock crime scenario similar to the EEG-based CIT to approach the issue. 101 One group of subjects (guilty intention) planned a realistic mock crime (a theft of money and a CD with important study information), but did not actually commit it. Another group (guilty action) planned and executed the ‘crime’. And a third group (informed innocent) was informed of half of the relevant details in a neutral context, but did not engage in any planning intention. Subjects were scanned during a CIT behavioral task. The MVPA analyses showed that although it was possible to reliably determine whether or not individual subjects possessed knowledge of crime-relevant details, the classifier was far less accurate (and indeed not significantly better than chance) at discriminating between the subjects in the three groups. 102 In other words, the classifier could not tell whether the presence of crime-related memories had been obtained by way of crime execution, crime planning, or merely reading about the crime-relevant details. Thus, much like the comparable EEG study discussed above, 103 even the informational richness of whole-brain fMRI brain activity patterns may be insufficient to prevent the risk of false positive identifications of innocent-but-informed individuals.

Of course, researching real-life memories in a laboratory setting is methodologically complex; what about the fact that looking at pictures is itself an autobiographical experience? How do ‘laboratory memories’ really diverge from ‘real-world memories’, and can a classifier tell the difference? 104 Recent work by the Rissman group manipulated the self/other task (using photographs from necklace-mounted cameras) by permitting subjects to preview some portion of photographs a day before scanning. 105 Essentially, the study asked whether there is a detectable neural difference between the experience of viewing a photograph and the experience of actually living a particular experience. This is indeed what was found; the classification analyses revealed a dissociation between the diagnostic power of each of two different large-scale brain networks. Specifically, activity patterns within the ‘autobiographical memory network’ were significantly more diagnostic than those within the ‘laboratory-based network’ as to whether photographs depicted one’s own personal experience, regardless of whether they had been viewed before scanning. In contrast, activity patterns within the laboratory-based memory network were significantly more diagnostic than those within the autobiographical memory network as to whether photographs had been previewed, regardless of whether they were from the participant’s own life. This dissociation provides some evidence for separate neural processes for retrieval of firsthand experience versus secondhand knowledge—a finding that has significant implications for how, in a forensic context, stimuli are selected and whether or not they can or should be previewed to subjects.

III.E. So Where Is Brain-Based Memory Detection Now?

The foregoing covered a lot of science, even while skipping over critical technical details such as receiver-operating characteristic curves, parametric versus non-parametric statistical testing, and a heavy amount of advanced math, not to mention the wide range of technical details about individual scanner parameters, data processing, and data analysis packages. 106 Without wanting to indicate that these details are unimportant—because in fact their accessibility is critically important should this technology ever be introduced in court, some of which will be discussed below—we provide here the high-level summary of the state of the art of brain-based memory detection technology in the context of what it might mean for legal applications.

The most advanced brain-based memory detection work leverages algorithmic classification of rich networks of brain activity. This work is substantially advancing the basic science research in memory studies—including helping determine what kinds of subtly different cognitive processes have distinct neural substrates. This work also leverages real-life experiences, is able to assess subjective mnemonic status at a single-trial, individual level, and can use a classifier trained on data other than from a subject of interest, indicating some conservation of neural networks for particular memory tasks across people. In short, the ‘technological’ aspects of the work are so sophisticated that we may start to be confident that MVPA-based memory detection research is beginning to reveal the ‘biological’ limitations of memory detection.

This is not to say that further technological developments could not reveal clear biological distinctions between true or false or modified memories. Indeed, the understanding of which machine-learning classifiers perform better when applied to different brain areas/networks is rapidly evolving, 107 and may give further insight into the underlying nature of autobiographical memory. But at this point, the technology confirms biological processes of memory that are congruent with our best understanding of memory encoding and retrieval processes consistent with years of psychological research—fundamentally, that memory is an inherently reconstructive process.

At present, the most salient limitations are these: that even with sophisticated technology able to detect distinctions in different types of autobiographical or episodic memory processes, there may be no way for a brain scanner and machine-learning algorithm to (i) distinguish between a false, but subjectively believed memory and an objectively true memory, (ii) detect, on a single-trial level, the deployment of simple mental countermeasures, and (iii) distinguish between someone who has knowledge of or even intention for, but did not participate in, a particular event. If these findings reflect ‘biological’ truths rather than ‘technological ’ limitations of detection, there is a serious boundary condition on the utility of brain-based memory detection to contribute to accurate fact-finding, if the issue is what really happened, rather than what a subject thinks or believes.

Moreover, in all research conducted thus far, researchers had perfect access to the veridical truth about the world—the memory of which is assessed in the scanner—such as by controlling the mock crime scenario or selecting the photographs from the wearable cameras. What remains unknown is how such technologies would work when investigators have varying degrees of uncertainty about which stimuli should match a person’s recollection or trigger recognition. For example, would photos of a terrorist training camp trigger recognition or recollection if they were from a vantage point that a suspect had never seen? 108 Would a years-old photo of an associate’s face, with a different hairstyle, eyewear, and countenance, elicit recognition? Were details known only to a crime participant and investigators, such as the paisley-patterned couch the victim was found on, really encoded by the perpetrator? 109 What if someone had committed a past burglary, but not the burglary under investigation, though a probe item such as a burglary tool elicited an essentially false positive recognition? 110 Or might memories be more reliably detected for crimes like trafficking and financial crimes, where a perpetrator has repeated exposures to a particular place, face, or documents? Also unknown, yet critical to determining the ‘accuracy’ (as in positive and negative predictive value) of any diagnostic test, are real-world base rates of inaccurate memories, such as false positives (memory present and detected, but not actually experienced) and false negatives (event experienced, but memory not present or detected). These ‘false memory’ base rates might even vary between different types of factual scenarios in which memory detection would want to be forensically applied, discussed below.

The residual uncertainty about accuracy in real-world applications is a familiar gap between research science and forensic science. Because of the inherently reconstructive nature of human memory, 111 it is not a gap likely to be completely bridged even by accessing the brain doing the reconstruction. Nevertheless, advances in the ability to use brain-based measures to detect real-life memories have potential value for deployment in legal and social contexts. But that value needs to be carefully assessed to be appropriately deployed in context.

IV. THE DOCTRINAL PATH TOWARDS COURTROOM USE: RELEVANCE, RELIABILITY, AND CREDIBILITY ASSESSMENT

The law relies on memory in myriad ways. 112 Some are about fact-finding: most obviously, memory is often the primary—or only—source of information about past events that have legal relevance. Others are assumptions about how memory works that are entrenched in doctrine and legal standards. 113 A memory detection technique that could confirm the presence or true absence of memories of disputed facts would have a massive impact on many types of legal proceedings, well beyond trial testimony.

Evidence law is of course focused on filtering the testimonial and other evidence that reaches a jury. Because ‘admissibility’ is not an inherent property of a particular technology, but rather a mixed question of facts, law, and science dependent on the purpose for which the evidence is offered, this Part lays the doctrinal path toward courtroom use of brain-based memory detection. The questions raised by potential courtroom applications are slightly different from the issues raised by technological and biological limitations described above, chiefly along the dimension that in different courtroom applications, varying degrees of knowledge about ground truths and certainty in assessment may be acceptable, depending on who is offering the memory detection evidence and for what purpose. Ultimately, this part and the following argue that courtroom admissibility should not be the focus of memory detection technology development, for reasons both pragmatic and doctrinal.

IV.A. Judicial Gatekeeping: Relevance, Reliability, and ‘Fit’ of Brain-Based Memory Detection

Next, we map the steps toward memory detection evidence being admitted in court: relevance, reliability, and the ‘fit’ for the offered purpose. The admissibility of memory detection evidence will be subject to judicial gatekeeping because it will be the subject of expert testimony. 114 As a preliminary matter, admissible expert evidence must be relevant. Relevance is a low threshold; as long as the memory detection evidence tends to make a fact in issue even more or less probable, it would be considered relevant. 115 A memory need not be disputed to be in issue, but if testimony about a memory were not in doubt 116 and not pertaining to a central fact in issue in the case, it is more likely that a judge would exclude the cumulative evidence on the grounds of wasting time. 117

For memory detection evidence to be relevant, we must consider what kind of memory it is capable of detecting, and thus what legal scenarios it is applicable to. As described above, brain-based memory detection can potentially detect autobiographical memories of an act or an event, or semantic memory such as unique factual knowledge that proves identity. What was not investigated in the work described above is the ability of brain-based technologies to detect past intent or past mental state; this is presently not possible, and may never be—though this is an issue going to inherent reliability, rather than relevance per se . 118 But the current state of knowledge means that memory detection evidence is probably not relevant in cases where the factual issue is proving mental state or intent, which is almost certainly more frequently disputed than the act or event of ‘what exactly happened’ or ‘who dun it’. 119

Still, memory detection evidence is likely relevant in civil cases where the disputed issue is ‘what happened?’—that is, what are the objective facts about past events—perhaps encompassing many torts, as well as employment harassment or discrimination cases. Memory detection of prior knowledge of a patent or prior art may be relevant in claims of willful infringement or inequitable conduct. 120 In criminal cases, the relevance and utility of memory detection may be more limited than proponents might think because memory detection cannot directly assess mens rea . If memory detection can only access memory of autobiographical events or semantic knowledge, it would be most useful in criminal cases for: corroboration or impeachment purposes, in the same way that some states permit polygraphs for such purposes; 121 eyewitness identification, where someone is misidentified, and the witness has an incorrect memory that can be proven to recognize someone else; alibis lacking corroboration, where a defendant says ‘it wasn’t me!’, but cannot offer other proof that s/he was elsewhere except with experiential memory from the same time period; and possibly in cases where a defendant claims a confession was coerced, in that they admitted to something they did not do, and would have no experiential memory of (though this would require confidence that the absence of a memory was truly evidence of the absence of the experience). And even in these hypotheticals where it might be relevant, there are of course reliability issues with whether any brain-based memory detection technology can ever distinguish between memories that are objectively true and those that are false, but subjectively believed to be true.

What about reliability? Memory detection evidence would be offered via the opinion of an expert, where reliability is assessed under Daubert and its progeny, Federal Rule of Evidence 702, or state law equivalents, discussed next. But imagine for a moment a world where memory detection evidence was offered directly by a party, because the technology had been around long enough that an expert’s specialized knowledge was no longer ‘helpful’ to a jury. 122 Direct evidence of memory detection would pose an analytical challenge fascinating to hearsay scholars. 123 If such evidence were scientifically valid and reliable, nearly to the point of infallible assessment of a veridical, objective truth, it could be considered free from one of the major sources of unreliability that normally infects hearsay—that of faulty memory that could not be revealed by cross-examination and juror assessment of witness credibility. Arguably, if memory detection were capable of validating a veridical, objective truth, it also addresses the other worried-about sources of unreliability: misperception, sincerity, and narration. 124 A perfect version of memory detection could solve all of those problems, eliminating reliability concerns that underscore the hearsay prohibition, but squarely presenting the question of what would be left for the jury. As a thought experiment, memory detection as a ‘hearsay solution’ raises interesting but familiar questions about the conflicting values, in addition to a search for accuracy, about our commitment to a jury system. These we address below in Part IV.

In terms of mechanics of assessing reliability, memory detection evidence would be offered as the opinion of an expert, probably the person who designed and administered the test. An expert (or two) would have to testify on two issues: whether the test is reliable and valid for the use to which it is being put (closely tied to the relevance considerations above), and if so, whether it was reliably employed in a given case to produce an accurate result. Expert testimony would be a necessary vehicle for memory detection test results given the scientific and technological expertise necessary to understand the psychological theory, choose appropriate stimuli, and perform the technological assessment of brain-based memory detection. Potential hearsay concerns about the machine’s direct output are for now set aside 125 ; an expert opinion can be based on evidence that would otherwise be inadmissible as hearsay. 126 In the federal system and many states, the Daubert trilogy of cases and Federal Rule of Evidence 702 (or state analogs) govern the method by which judges must do an initial, gatekeeping assessment of the admissibility of an expert’s testimony.

Daubert analyses of brain-based deception (lie) detection techniques abound in the literature, nearly all agreeing that such applications fail to clear the admissibility threshold for lack of understood error rates. 127 Courts that have considered brain-based deception detection technology have agreed, finding it inadmissible at present. 128 But some commentators argue that the distinctions between tests that purport to detect deception and those that simply detect recognition should ‘radically affect’ the admissibility analysis under Daubert . 129 This is analytically incorrect, for reasons explained infra Part III.B.1. To the contrary, the Daubert or Frye standards should at present continue to exclude expert testimony opining that brain-based memory detection proves the presence (or absence) of a particular memory in a given subject. 130 With respect to ‘general acceptance’ standards, leading researchers and many recent papers in the fMRI-based memory detection space caution that because of the limitations in existing studies and the nature of the findings about inherent limitations on memory detection, much more research is needed before forensic applications are pursued. 131 There is far from a ‘general acceptance’ of these technologies for forensic application at present.

Reliability will vary as a function of the type of memory being assessed. Memory detection will work best in situations where a subject has a repeated experience resulting in a sturdy, non-fragile memory. Based on what we know about the nature of memory, it is virtually certain that there would be different base rates in memory inaccuracies depending on the type of memory and a number of factors about its encoding and retrieval. The phenomenon of memory contamination or false memories is well studied, but incomplete as an epidemiological field of study. For example, we now know that even just ‘imagining an event that might have occurred in someone’s past can increase confidence or believe that the event actually occurred, lead individuals to claim that they performed actions that they in fact only imagined or result in the production of specific and detailed false memories of events that never actually happened’. 132 We also know that normal people, describing non-traumatic life events over successive interviews, show high degrees of variability in their autobiographical memory. 133 What we do not really know are the relevant base rates—that is, how often false or inaccurate memories happen in day-to-day life. 134 Not only are error rates of memory detection technologies unknown, but these base rates of different kinds of inaccuracies in memory are also unknown for real-world applications and may never be ascertained.

Nevertheless, validation studies will accumulate, and error rates will be proposed. Would they be sufficient to cross the admissibility threshold? Daubert is focused on reliability through validation. 135 But this standard is vulnerable to misinterpretation and thus insufficient scrutiny at the admissibility stage because validation studies for forensic memory detection would be inherently limited for at least two reasons.

First, they are necessarily conducted under idealized conditions with respect to the person being examined. In the research context, investigators can control for individual subject variables that may impact reliability such as mental health status, head injury status, intoxication status, stress level, and demographic features such as level of education. 136 Second, as with forensic DNA evaluation or other forensic machine evidence in the form of statistical estimates and predictive scores, validation studies are potentially an incomplete method of ensuring accuracy. 137 Real validation is feasible in laboratory procedures that show that a measured physical quantity, such as a concentration, consistently lies within an acceptable range of error relative to the true concentration. But where the ‘true concentration’ cannot be known—because investigators do not know the ground truth, or because individual memories are inherently reconstructed—there is no underlying ‘true’ value that exists. 138 This problem veers into the epistemic—how can we ever know what is true? But it is precisely because brain-based memory detection’s appeal depends upon an assumption of essentialization of lived experience to empirical truth 139 that this type of evidence deserves epistemological scrutiny as part of its validation.

That is, imperfect validation studies may be enough under existing law to get brain-based memory detection studies admitted, but they may be insufficient to permit the jury to decide, in a nuanced way, whether to believe and how much weight to assign to them. Even if Daubert-Frye prerequisites for brain-based memory detection were arguably met, it is important to recognize that a Daubert - Frye analysis does not, on its own, provide sufficient information for a fact finder to perform an adequate ‘credibility’ analysis, discussed infra Part III.B and Part IV, though it may supply the veneer of such scrutiny. 140

Finally, the outcome of the reliability gatekeeping scrutiny is framed by which party is offering the technology, and for what purpose—the issue of ‘fit’. Because criminal defendants have a constitutional right to compulsory process, such that evidentiary rules that would bar admission must sometimes yield, 141 a defendant offering brain-based memory detection in support of his defense may be able to offer brain-based memory detection that is perhaps less reliable than what the prosecution would have to put forward. When the Supreme Court considered lie detection technology in United States v. Scheffer, the Court held that a defendant does not have a right to present polygraph evidence, but reached this majority opinion because Justice Thomas’ fifth vote in concurrence was based on his conception of the jury as lie detector. 142 In a dissent largely driven by due process concerns—given that Scheffer wanted to offer his polygraph results in his own defense—Justice Stevens wrote that ‘evidence that tends to establish either a consciousness of guilt or a consciousness of innocence may be of assistance to the jury in making such [credibility] determinations’, and argued that juries could follow the instructions of a trial judge concerning the credibility of an expert witness. The polygraph’s bid for admissibility failed in Scheffer because it was ‘too’ unreliable to overcome constitutional due process concerns. 143 But if brain-based memory detection technology can do better than the polygraph in terms of accuracy and reliability, 144 it might be admissible if offered by a criminal defendant.

In contrast, memory detection technology may effectively be held to a higher standard of reliability if offered by the state in a criminal prosecution. Yet other constitutional due process concerns arise when the state is proffering. Although a brain-examined witness must consent to have the state physically examine their brain activity via skull cap electrodes or by lying perfectly still in a MRI machine, memory detection technology does not necessarily require any overt response from the subject, spoken or otherwise. 145 Completely unresolved is the constitutional question of whether the output of a memory detection device would be considered simply physical evidence, or ‘testimonial’ for purposes of the Confrontation Clause of the 6th Amendment. 146 This is a philosophical and doctrinal problem beyond the scope of this article, but suffice it to say that a defendant might raise a Confrontation Clause challenge to memory detection evidence of a state witness unavailable for cross-examination at trial.

It is possible to imagine that a highly accurate, highly reliable brain-based memory detection device could clear the judicial gatekeeping hurdle of admissibility. It would necessarily have to be a good ‘fit’, in terms of its relevance and reliability, in the factual and legal context in which it is offered. At this stage of our analysis, it seems those contexts would be rather limited, and thus a project of ‘admissible’ memory detection technology might not be pragmatic or efficient. 147 Moreover, constitutional due process and compulsory process concerns frame the ultimate question. What remains to be considered is that brain-based memory detection is essentially evidence of a witness’s credibility, and what this means for its courtroom use.

IV.B. Credibility Assessment: Brain-Based Memory Detection Is Evidence about Credibility, and Biological Limitations Mean that Brain-Based Memory Detection Tests Are Really Best at Assessing Witness Sincerity—Just Like Lie Detectors

Even if a gatekeeping judge could be satisfied that brain-based memory detection evidence survives the doctrinally required scrutiny for reliability and relevance, the task for the jury is not a binary one of accepting or rejecting the evidence. The jury must assign weight to the conclusions of the expert. For brain-based memory detection, this task would implicate two layers of credibility analysis, complicating the task of separately and appropriately weighing of each layer.

Credibility, in the context of evidence law, means simply whether a source of information is worthy of being believed 148 —not merely whether a witness is lying. 149 That is, credibility assessment, properly understood, is not limited to just the ‘hearsay danger’ of insincerity. 150 The other ‘testimonial capacities’ of ambiguity, memory loss, and misperception are tested in human witnesses by oath, physical confrontation, and cross-examination. Were brain-based memory detection admitted in court, via an expert witness, it should be recognized to be ‘double-credibility dependent proof’. 151 That is, a fact finder could be led to draw the wrong inference about the content of a person’s memory, and ultimately a relevant fact, because of potential infirmities of both sources: the memory itself (that is, witness credibility), and undetected ‘black box’ dangers leading to imprecise or ambiguous outputs and incorrect inferences from memory detection technology. 152

This ‘double credibility’ analysis is not sufficiently scrutinized by existing Daubert and Frye reliability requirements for expert methods. 153 Furthermore, the ‘credibility’—not just the sincerity—of human witnesses is canonically the province of the jury. Outsourcing credibility assessment to a brain-based memory detection technology and expert witness reporting raises serious, but familiar, issues about the role of a jury system. 154 These potential objections to the use of brain-based memory detection as courtroom evidence are addressed in Part IV.

This section argues that in many applied contexts, memory detection is probably not practically distinguishable from lie detection—and thus is subject to the same objections regarding the role of the jury as the ultimate assessor of credibility. The most advanced scientific and technological work in memory detection reviewed above presently suggests that no machine, no matter how sophisticated, could detect a false but subjectively believed memory—that is, an inaccurate memory that someone truly believes they experienced. 155 This is not of small consequence in a forensic context; the phenomenon of memory contamination or false memories is well studied. 156 As discussed above, what we do not really know are the relevant base rates—that is, how often false or inaccurate memories happen in day-to-day life. 157

The key point is that, depending upon the situation at hand, brain-based memory detection may offer little to no probative value in assessing the ‘accuracy’ of a witness’s memory due to the biological properties of memory itself. If that finding reflects a scientific truth—that brain activity for false and true memories cannot be reliably distinguished—then the utility of brain-based memory detection remains even more firmly in the zone of assessing the testimonial capacity of witness ‘sincerity’—based on a match or discrepancy between the results of the test and any other witness statements—and subject to the same objections as lie detector tests that it impermissibly impinges upon the role of the jury by bolstering or impeaching a witness’s credibility.

Not all commentators agree; John Meixner distinguishes ‘neuroscience-based credibility assessment’ from ‘evidence that directly assesses whether a witness is telling the truth’, 158   159 arguing that only the latter should be understood to invade the province of the jury, relying principally on language from a plurality in Scheffer . 160 In terms of the jury’s role and the law of evidence, this is a distinction without a difference. Credibility assessment is a broader task than detecting active deception or mendacity. Juries are meant to be lie detectors, but they are equally charged with assessing the amount of weight to give to each piece of admitted evidence. They do so by assessing four testimonial capacities of a witness: perception, memory, narration, and sincerity, and by considering a piece of evidence in context with all others. Thus, a technology that detects memories impinges just as much on the province of the jury to assess ‘memory’ as an element of credibility as does a lie detection technology impinges on the assessment of ‘sincerity’ . The point here is simply that memory detection technologies are, from a standpoint of permissible (and thus admissible) evidence, similarly situated to lie detection technologies. The issue of whether the role of credibility assessment is or should be exclusively one for the jury is discussed below in Part IV.A.1, as a weaker objection to the evidentiary use of memory detection technology.

V. OBJECTIONS TO COURTROOM USE OF BRAIN-BASED MEMORY DETECTION

John Henry Wigmore was optimistic that the courts would embrace something like a perfect memory detection device and did not limit the utility of such a device to deception or mendacity: ‘But where are these practical psychological tests, which will detect specifically the memory-failure and the lie on the witness-stand? … If there is ever devised a psychological test for the valuation of witnesses, the law will run to meet it…. Whenever the Psychologist is really ready for the Courts, the Courts are ready for him’. 161 A memory detection device, better than a psychologist, seems to be the epitome of what he had in mind.

Given the preference in the rules of evidence for admissibility of relevant evidence, we have mapped the path of legal and factual issues leading toward courtroom admissibility of brain-based memory detection. We turn now to remaining objections as to why this technology should not be used in court. These are organized from weak objections and surmountable challenges, some of which memory detection has in common with other types of scientific or expert evidence; to strong objections inherent to the technology and task; to normative and philosophical objections that encompass familiar concerns about our commitment to trial by jury (and its alternatives), the value of dignity in and outside of the courtroom, and the role of memory in the nature of personhood. Though the normative concerns are familiar, there is more utility than simply appreciating how new technologies illuminate longstanding tensions in evidence law. This is because new technologies may also be proffered as potential options that could rebalance tradeoffs between values such as accuracy and due process.

V.A. Weak Objections and Surmountable Challenges

V.a.1. that credibility assessment is exclusively a job for the jury.

A plurality of the Supreme Court, led by Justice Thomas, stated in Scheffer that in ‘our criminal trial system … “the jury is the lie detector.” Determining the weight and credibility of witness testimony, therefore, has long been held to be the ‘part of every case [that] belongs to the jury, who are presumed to be fitted for it by their natural intelligence and their practical knowledge of men and the ways of men’.’ 162 Other courts have expressed similar sentiments. 163 But why should evidence that ‘bears on’ credibility be so offensive to the system of trial by jury, particularly if juries are not very good at detecting when they are being lied to? (The analysis is slightly different for a ‘perfect’ memory detector, discussed infra Part V.C.) The modern role of the jury as a ‘lie detector’—rather than a broader conception of credibility assessor—was cemented in the seminal article by George Fisher. 164 But Fisher’s focus was on the historical development of the jury as an ‘error-erasing’, legitimacy-preserving function. 165 If the jury is not actually well-suited for the task of accurately assessing witness credibility—and if they already regularly consider other types of evidence that ‘bears on’ a witness or defendant’s credibility, such as character evidence 166 —a legal system prioritizing accurate results must seriously consider that even a ‘perfect’ credibility detector trumps ‘error-erasing’ concerns.

John Meixner argues that even if the role of credibility assessment is one exclusively for the jury, perhaps it should not be. 167 Arguing that ‘this determination is inevitably based on the accuracy of jurors’ credibility determinations’, he concludes that because social science indicates that people (both trained experts and lay individuals) are not very good at detecting when another is lying to them, expert testimony on ‘the credibility of witnesses’ should be permitted toward the goal of fact-finding accuracy. 168

The tension between the goals of accuracy and ‘error-erasing’ legitimacy (due to the perception of accuracy and the opacity of jury deliberations) is probably not solved by the type of memory detection device under consideration here—one that may still be vulnerable to inherent flaws in human memory. But the ‘credibility assessment is solely a task for the jury’ argument, without more, is weakest in that it ignores the fact that perceptions of accuracy feed directly into perceptions of legitimacy. If a perfect, or near-perfect, memory detection device was available, it is unlikely the public would perceive its exclusion in favor of assessment by a jury of peers as improving the accuracy of a verdict.

V.A.2. That the experts’ methods may have flaws or weaknesses

Brain-based memory detection results might be incorrect or misleading, because of human causes of ‘falsehood by design’. 169 In the hands of the wrong person, perhaps motivated to find incriminating information in the brain of a given suspect, a brain-based assessment could be designed ‘in a way they know, or suspect, will lead a machine to report inaccurate or misleading information’—perhaps by choosing incriminating stimuli of which an innocent suspect is already aware, or by failing to include appropriate comparator stimuli. Even in the hands of an honest and well-meaning expert interested only in accurate results, stimulus design choice could be influenced—maliciously or inadvertently—by incomplete or inaccurate information about an investigator’s understanding of the events of interest. At present, all memory detection technologies start from a hypothesis about the ground truth, and assess whether that truth is recognized or recollected in the brain of a subject being imaged. Where that truth is unknown, or uncertain, or ambiguous—as in real-world settings—potential for human error in test design is magnified accordingly. 170

Researchers also make choices about the degree of tolerance for uncertainty, risking a non-match to the one assumed by the fact finder, unless disclosed. 171 In the memory detection studies reviewed in Part II, researchers made choices about the tolerances for false positives and false negatives in setting signal detection thresholds. 172 Even if the tolerance for particular types of uncertainty is articulated to a fact finder, it is difficult to translate statistical thresholds in signal detection theory to concepts matching layperson judgments of certainty. 173

As for an expert’s chosen analytic methods, consensus in functional brain imaging communities tends to exist in narrow groups. Different laboratory groups use varying degrees of proprietary code to run their experiments and analyze the data. Even in the broader brain imaging community, data collection and analytic techniques are far from standardized. A recent episode illustrates this plainly. Data analysis in the fMRI community recently received scrutiny in the popular press after an article comparing two different types of analyses stated that their findings ‘question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results’. 174 Though initially overstated and later corrected, the episode highlights that ‘the validity of fMRI data analysis paradigms has not been uniformly established and needs continued in-depth investigation. fMRI is a complex process that involves biophysics, neuroanatomy, neurophysiology, and statistics (experimental design, statistical modeling, and data analysis)…. Linking statistical methodology development and fundamental fMRI research is crucial for developing more accurate analysis methods, attributing accurate scientific interpretations to results, and ensuring the reliability and reproducibility of fMRI studies’. 175 When assessing reliability of methods—a necessary but not sufficient step toward credibility assessment—brain imaging software engineers and statisticians should be consulted as part of the relevant scientific community in determining the reliability not only of the behavioral and analytic method of data collection, but of the software implementing the collection and analysis methods. 176 Although this consultation should be part of the admissibility analysis, credibility assessment requires that it is also addressed in front of the jury.

None of these problems are unique to brain-based memory detection evidence. Juries are presented with complicated scientific evidence all the time. Each of these problems may be addressable through ‘testimonial safeguards’, some of which are provided by rigorous peer review of the general methods and protocols that an expert relies upon, and others of which can be addressed by well-prepared cross-examination and opposing expert evidence.

V.A.3. That juries cannot be trusted to evaluate this complex scientific evidence about credibility

A final, surmountable objection to consider is actually an empirical claim. To this point, the argument against admissibility has been premised on an assumption that a sufficiently ‘reliable’ memory detection device would need to be nearly perfect in order to be sufficiently relevant and reliable. But what if the technique simply improves a fact finder’s guesses by providing some ‘useful’ or ‘helpful’ information, subject of course to cross-examination and opposing testimony exposing all the technology’s weaknesses? This analysis would cut in favor of its admissibility.

Two concerns arise: first, that a jury would overweight the memory detection evidence and it would be unduly prejudicial 177 ; second, that they might wholly abdicate wrestling with the deliberative, explanatory process going on inside the jury room because an unexplainable machine assessed the credibility of a witness for them. But this is a testable hypothesis, and current evidence for the proposition that neuroscience-as-evidence is unduly persuasive to lay decision-makers is mixed at best. 178 (Indeed, jurors may tend to ignore, rather than overweight, complex evidence that they do not understand, particularly when they feel they have their own intuitive assessment of witness credibility, as they do in their everyday lives.) Testing this hypothesis is difficult, but not impossible. 179 In light of the incompleteness of the empirical data, there is little to definitively say other than that reasonable courts and scholars presently differ as to their intuitive judgments about whether jurors would be able to properly assess probabilistic evidence coming from impressive and relatively opaque technology accessing the mysterious inner workings of the human brain.

V.B. Stronger Objections

V.b.1. that high reliability in memory detection may be biologically implausible.

Here, we essentially apply the import of the scientific findings above. The current scientific consensus is that memory is inherently reconstructive, flexible, and malleable and that there is no form of storage that is a permanent ‘engram’ (memory trace) like a video recording or digital computer file. We now understand that memories undergo a process known as reconsolidation every time they are retrieved. That is, every time a stored memory trace is accessed and ‘reactivated’, it temporarily becomes ‘labile’, and hence prone to some degree of updating. 180 While in this labile state, memory traces can be disrupted entirely by the administration of protein synthesis inhibitors or electroconvulsive shock therapy, or they can be modified by the introduction of new information. 181 This is, in fact, the normal mechanism of learning. But it also means that memory detection cannot be thought of as accessing a file stored somewhere without recognizing that that file has potentially been modified by virtue of being accessed, every single time it is accessed.

Moreover, the most sophisticated brain-based memory detection techniques reviewed above confirm that false but subjectively believed memories may be so biologically similar to ‘true’ (veridical) memories that accuracy in fact-finding is truly limited. Furthermore, countermeasures and false positives may be unavoidable and undetectable. At present, many more types of countermeasures have not yet been investigated, such as the potential for deliberate false memory creation, forgetting via deliberate rumination on an alternate narrative, or implanted false memory creation via questioning or providing information. Finally, it is important to remember that here (as is true generally), the absence of evidence is probably not going to be evidence of absence. That is, the absence of a memory in a detection paradigm cannot confidently be said to represent a true absence of memory or actual experience. False negatives may be acceptable in certain diagnostic scenarios, but their presence as a boundary condition on the technology’s utility as forensic evidence should be appreciated.

V.B.2. That even highly reliable memory detection may be so technologically complex as to be impenetrable to machine credibility assessment

The technological and biological complexity of sophisticated brain-based memory detection makes it exceedingly difficult—perhaps impossible—for mere laypersons to assess whether they should ultimately believe it as fact. This brings us to the second layer of credibility analysis for memory detection evidence. Even if a memory detection device, or a lie detection device, could perform reliably enough—that is, it satisfying repeated validation studies—such that it was normatively preferable (and assuming for the moment that accuracy is the only valued metric) to a jury in terms of assessing witness credibility, how do we know we should believe what the device says, or how much weight to give to the result, or the expert opinion based on it? Because the ultimate question upon which the test purports to aid the jury is one the jury is squarely charged with, the jury must also be able to assess the credibility of the machine and test itself.

‘Machine credibility’, a concept elucidated by Andrea Roth, is whether the fact finder draws the correct inference from information conveyed by a machine source; in short, ‘the machine’s worthiness of being believed’. 182 Courts and most scholars have not yet recognized that machines themselves can provide evidence that ‘merits treatment as credibility-dependent conveyances of information’. 183 Just as credibility testing of human witnesses encompasses more than simply assessing sincerity or mendacity—of which machines are theoretically incapable—Roth argues that ‘the coherence of ‘machine credibility’ as a legal construct depends on whether the construct promotes decisional accuracy’. 184 Limiting for the moment the task of credibility assessment to one of decisional accuracy, what this translates to in practical terms is how much ‘weight’ a jury should give to a machine conveyance, just as a jury is entitled to give varying weight to witness testimony.

Brain-based memory detection techniques fall squarely within the realm of credibility-dependent machine conveyances. They essentially convert the biological contents of a human skull (and, by extension, the subjective contents of a human mind) to an assertion by a machine, reported by a human expert. In the case of machine learning applied to fMRI or EEG data, the assertion is that the contents of a person’s mind, in response to certain stimuli presented, are a pattern match, within a given degree of confidence, to a subjective experience of recognition or recollection. 185 Such output merits treatment as a credibility-dependent conveyance of information, even though it is ultimately presented to the jury by an expert. ‘Just as human sources potentially suffer the so-called ‘hearsay dangers’ of insincerity, ambiguity, memory loss, and misperception’, machine-learning algorithms interpreting raw brain data potentially suffer ‘black box’ dangers that could lead a fact finder to draw the wrong inference from information conveyed by a machine source…. Just as the ‘hearsay dangers’ are more likely to arise and remain undetected when the human source is not subject to the oath, physical confrontation, and cross-examination, black box dangers are more likely to arise and remain undetected when a machine utterance is the output of an ‘inscrutable black box’. 186

Recall that fMRI-based lie detection methods are using machine learning in the form of MVPA to extract more and better information from subtle patterns in brain imaging data. 187 It is the technological sophistication of these methods that are revealing the biological limits of memory detection. But the technology itself brings evidentiary risks that should be the basis of strong objections to courtroom use.

Andrea Roth’s taxonomy of machine credibility warns of falsehood by ‘machine learned design’. 188 In the most advanced forms of brain-based memory detection, testimonial safeguards in the form of front-end protocol design may not be feasible, as ‘[d]ata scientists have developed very different ‘evaluation metrics’ to test the performance of machine-learning models depending on the potential problem being addressed’. 189 Human selection goes into choosing machine-learning algorithms and some of their parameters for training, but the algorithms themselves may present a form of opacity because of the ‘mismatch between mathematical procedures of machine-learning algorithms and human styles of semantic interpretation’ 190 —that is, the mechanisms of machine learning may not map neatly onto humans’ ability to explain them.

Machine learning is a critical part of the most advanced research into brain-based memory detection. Rissman and colleagues, in the first paper to apply MVPA to fMRI-based memory detection analysis, explored several machine-learning algorithms, ‘including two-layer back-propagation neural networks, linear support vector machines, and regularized logistic regression [RLR]’, electing the latter after they ‘found that RLR generally outperformed the other techniques, if by only a small amount’ in terms of classification accuracy. 191 This is, of course, a human design element—choosing which type of machine-learning algorithm to use. 192 Other types of classifiers are available and used by fMRI researchers interested in the detection of memory and intention. Peth and colleagues chose a linear support vector machine, 193 and a 2014 fMRI study from yet another research group attempting to decode true thoughts independent of an intention to conceal used a Gaussian Naïve Bayesian classifier. 194 Each type of classifier must first be ‘trained’ on data fed to it by a human—another source of potential inferential error, given the number of assumptions about shared features of training data and test data of interest—the result of which is a ‘matrix of weights that will then be used by the classifier to determine the classification for new input data’. 195 The classifier itself can report its degree of certainty in its classification decision, if given no particular threshold by its human programmers. 196

The choice of machine-learning algorithm by a particular set of researchers or forensic designers could have an impact on its interpretability. Although ‘machine learning models that prove useful (specifically, in terms of the “accuracy” of the classification) possess a degree of unavoidable complexity’, different machine-learning algorithms have different levels of opacity. 197 That is, different algorithms may have different degrees of ‘black-boxiness’ in the sense that they may be able to be subjected to credibility testing. In a linear classifier, such as RLR and linear support vector machines, it is possible to know roughly how much each brain voxel matters to the ultimate classification decision. Indeed, researchers using these classifiers can create ‘importance maps’ of voxels, which themselves could be sort of a window into the machine’s ‘thinking’. 198 A neural network classifier with multiple hidden layers would be less ‘knowable’ in terms of how the machine is thinking, because the knowledge relevant to the classification decision is both distributed and non-linear. 199 That is, some classifiers may be more ‘interpretable’ than others, and suggests a ‘testimonial safeguard’ for brain-based memory detection technologies that use machine-learning classifiers that reveals, in terms or diagrams laypeople can understand, how the classifier made the decision. 200

It is an empirical question whether such logic could be sufficiently explained to laypersons via direct or cross-examination (including whether the expert witnesses themselves even fully understand the machine’s classification decisions), or whether certain front-end procedures and protocols, including model jury instructions, could be designed to prepare and prime a jury to engage in sufficient scrutiny of a machine-learning-based expert conclusion. 201 Pretrial disclosure and access to raw data, analysis code, and even the scanner itself could potentially expose flaws leading to inaccurate inferences, 202 but such ‘adversarial design’ also assumes an expensive degree of expertise available to both sides.

But this empirical question is even more difficult to investigate than the objection above that jurors might not understand complex science—especially where it may be impossible for an expert to describe how the algorithm works, other than that it does, and therefore it should be trusted. Does this matter, in the sense that validation studies showing that ‘it works’ should be good enough for technological black boxes of any kind? That may be generally true and is certainly asserted by some proponents of technology in law, 203 but recall that sufficient validation studies in the memory detection context might be impossible in situations where the ground truth is truly disputed and cannot be known or otherwise corroborated. This is not a type of expert evidence where trial court acceptance of ‘tacit expertise’ should be permitted to suffice for validation studies. 204 Moreover, meaningful validation studies of error rates and diagnostic value will depend critically on base rates of false/inaccurate memories in analogous situations, which are not currently and may never be able to be known. 205

V.C. Witness Personhood and Evidentiary Relativism: Philosophical and Normative Considerations

V.c.1. philosophical.

If we had a ‘perfect’ brain-based memory detector, should we use it? There are values besides decisional accuracy that the system of trial by jury with testimonial evidence embodies. 206 While the plurality of values may be familiar, 207 are they uniquely implicated by jury assessment of memory detection evidence?

Our answer aligns with the intuition first articulated by Justice Linde of the Oregon Supreme Court: ‘I doubt that the uneasiness about electrical lie detectors would disappear even if they were refined to place their accuracy beyond question. Indeed, I would not be surprised if such a development would only heighten the sense of unease and the search for plausible legal objections’. 208 Having made the case above that memory detection is, for intents and purposes of credibility assessment, the same as lie detection, this section attempts to probe that intuition that in part motivated this article: that more is at stake in directly assessing witness memories than in most other forms of expert evidence, and these values may not be addressable by testimonial safeguards, adversarial design, or other tweaks for assessing machine credibility. 209 The strongest direct critique of Justice Linde’s argument is that accurate fact determination should be the dominant value in assessing evidence that can go to the jury, superseding all other considerations in the event that polygraphs (and, by extension, memory detection) became perfect. 210 But this is not a strong enough rebuttal to entirely set the personhood issue aside (discussed below), particularly in light of the fact that the scientific findings may never be able to reveal perfect truth because of the reconstructive nature of memory.

Perhaps it is the lack of explainability that is what is intuitively troublesome here—that we do not really know how memory works, and we may not be able to explain how memory detection works, but we might be willing to rely on it to assess the credibility of witnesses. We have claimed that memory detection devices may be impenetrable to credibility assessment if they are truly black boxes involving certain types of machine learning that cannot be adequately explained. What is lost, with respect to the adversarial trial context, if brain-based memory detection is (at some level) unexplainable?

Here we can look to the expanding literature considering automation in other aspects of legal decision-making. Kiel Brennan-Marquez recently argued in the context of probable cause that plausibility—the ability to ‘explain’ a probable cause decision—has independent value as a check on state power. 211 Yet juries are not only not required to explain their verdicts, but any explanations cannot be used as a basis for appeal, save for a few narrow circumstances. 212 Juries themselves are to act as a ‘black box’, for the sake of finality in conflict resolution, and only the inputs are constrained by the rules of evidence. Yet there is explanation going on, one assumes, inside the jury room during the deliberative process. It is perhaps this value, which Brennan-Marquez identifies as ‘prudence’, that we seek to preserve by avoiding jury abdication of credibility assessment by acceptance of a machine and expert’s say-so.

Is there more than the preservation of prudence within the jury deliberation room? Andrew Selbst and Solon Barocas recently identified three different values in opening the black box of machine-learning algorithms involved in legally relevant decision-making (though not as trial evidence per se ). The first is ‘a fundamental question of autonomy, dignity, and personhood’—explanation as an inherent good. 213 The second is instrumental: explanation as enabling future action of those that are subject to machine decisions. The third is about justification and exposing a basis for post hoc evaluation. Of these, only the first is relevant to our assessment of juror reliance on machine-based expert testimony in their decision-making, which is too many steps removed to inform the future action of witnesses and defendants, and which cannot be exposed to justification and post hoc evaluation, save for a few narrow reasons. 214 Selbst and Barocas link the personhood rationale to the concept of procedural justice, which Tom Tyler has demonstrated is a necessary condition for legitimacy in the legal system. 215 But is the personhood—the fallible, imperfect, reconstructed memories that constitute the person and underpin their ability to narrate their own story—of a ‘witness’ evaluated by a jury really so key to perceptions of procedural legitimacy?

Our tentative answer is yes. It seems that what may be at stake may be the fundamental value of personhood—as opposed to the reductionist, objectified readout of one’s brain—and its function as a cornerstone of procedural justice. 216 This takes seriously the personhood not only of witnesses, but also of jurors, and their ability to appreciate the personhood of a witness whose credibility they must assess. 217 The version of the personhood argument we adopt is specially informed by the findings of memory detection research and memory research more generally: that recollection of autobiographical experience that defines our personhood is not machine-like at all, but rather imperfect, dynamic, and reconstructive. Our autobiographical memories, and our subjective recollection and constant re-interpretation of them, is a fundamental part of our identity. Indeed, those who lose their memories can also lose their very sense of a dignified self. 218 Of course, this answer may not justify a blanket prohibition on memory detection evidence on the grounds of ‘personhood’, because the risks depend upon who is offering the evidence (eg a defendant volunteering exculpatory evidence, whose personhood is not threatened by exercising their autonomy) and the legal issues and theory of the case, as discussed above.

Personhood is in fact central to witness credibility, as evidenced by the history of witness competency rules and existing doctrine of impeachment. 219 The dark side of this is that social and behavioral status has long been—and still is in character-based impeachment doctrine 220 —a proxy for who is worthy of belief, with troubling disproportionate effects on persons of color and communities without privilege. 221 Julia Simon-Kerr suggests that this state of affairs means that, if reliable, lie detection science—presumably a more objective way of establishing truth—should replace status-based assessments of credibility in terms of impeachment doctrine. 222 But if the best we can do with science-based techniques is identify the biological correlates of a ‘subjective’ experience—filtered through the impressions and decisions of whomever is probing the witness’s brain—are we really any closer to establishing objective truth, or are we now assessing witness credibility based on the social status of the expert and her sophisticated machine? This situation may in fact be normatively preferable to discriminatory proxies for credibility that rely on racist, sexist, and classist devaluation of persons, but it is arguably no closer to respecting the dignity and personhood of the witness as the narrator of their own memory and experience.

Putting these pieces together: if personhood (of some sort, derived from sincere narration of subjective experience of memory) is critical to witness credibility, and assessments of personhood/credibility are critical to the procedural legitimacy of juries, and this legitimacy comes from a belief (a myth?) that juries engage in a prudent and deliberative, though opaque, process of credibility assessment, then brain-based memory detection directly challenges this edifice of trial by jury. Again, Justice Linde intuited this conclusion:

One of these implicit values surely is to see that parties and the witnesses are treated as persons to be believed or disbelieved by their peers rather than as electrochemical systems to be certified as truthful or mendacious by a machine…. A machine improved to detect uncertainty, faulty memory, or an overactive imagination would leave little need for live testimony and cross-examination before a human tribunal; an affidavit with the certificate of a polygraph operator attached would suffice. There would be no point to solemn oaths under threat of punishment for perjury if belief is placed not in the witness but in the machine. 223

Justice Linde and other commentators go so far as to suggest that a perfect lie detector—and by extension, a memory detection device—would render the jury entirely superfluous. 224 This ‘witness personhood’ concept suggests that to the extent we take the jury system seriously, we should resist introducing even ‘perfect’ black box brain-based memory detection evidence into the jury black box.

V.C.2. Normative

Evidentiary value is not a static quality in terms of probativeness, relevance, or reliability; it is inherently a relative quality. Normative assessment of the utility and permissibility of a particular piece of evidence must be context-driven. Sometimes, that context is one of alternatives much more horrific than premature or even ‘junk’ science.

Recall from the start of this article that about a decade ago, the international press reported on a murder story from India. Aditi Sharma had been accused of murdering her fiancé, Udit Bharati, with poison given to him in sweets. She was convicted largely on the basis of evidence from a brain-based memory detection system (here, the BEOS) that demonstrated her ‘experiential knowledge’ in response to statements read by investigators implicating her in the crime. According to initial news reports, the BEOS technology was neither published nor peer reviewed, but its inventors ‘claim[ed] the system can distinguish between peoples’ memories of events they witnessed and between deeds they committed’. 225 At the time, two Indian states had set up labs where BEOS could be used by investigators, and one in Maharashtra reported that over 75 suspects and witnesses had undergone the test in less than 2 years. 226 Of those, at least 10 resulted in confessions. 227

From what we know of the test, BEOS uses a proprietary set of 11 electrical signals in the brain from 30 EEG probes on the surface of the scalp to make up the ‘signature’ of experiential, first-hand, participatory ‘remembrance’ of an event, as differentiated from ‘knowledge’ or ‘recognition’ gained by learning about an event from another source. 228 Dr Champadi Raman Mukundan, a former university scientist, developed BEOS in conjunction with Axxonet, a private software-as-a-service company, partially at the request of Indian law enforcement authorities and India’s National Institute of Mental Health and Neuro-Sciences, and partially out of Dr Mukundan’s own interest in and frustration with the EEG-based P300-related techniques. 229

The technologies were not deployed without scientific skepticism in India. A month before Sharma’s conviction, a six-member peer review committee headed by India’s National Institute of Mental Health and Neuro-Sciences concluded that both BEOS and what they called ‘brain mapping’ (which refers to the ‘Brain Fingerprinting’ EEG-based technology promoted by Lawrence Farwell) 230 were ‘unscientific and … recommended against their use as evidence in court or as an investigative tool’. 231 This report was delivered to India’s chief forensic scientist, Dr MS Rao, who critiqued it for the committee’s failure to visit the forensic laboratories using BEOS. Dr Rao claimed that the technique’s results were ‘encouraging’, based upon results in actual cases. 232 Upon learning of the legal application of the BEOS technology in the Sharma case, US-based commentators critiqued the ‘shaky’ science for lack of peer review and ‘variously called India’s application of the technology to legal cases “fascinating,” “ridiculous,” “chilling,” and “unconscionable”’. 233

The current status of forensic brain-based memory detection in India is mixed. In May 2010, the Indian Supreme Court held that results of ‘deception detection tests’ including the Brain Electrical Activation Profile test 234 are not admissible, and neither are evidentiary fruits of such tests administered in an investigative context but without a subject’s voluntary consent. 235 Nevertheless, recent media and articles suggest that brain-based tests, including BEOS, are still being used in investigation and prosecution in lower courts, possibly as a result of coerced consent. 236 Even after the 2010 Supreme Court judgment, Indian forensic labs planned to continue their expansion of brain scanning techniques. 237

Why do brain-based investigation techniques persist in the absence of formal evidentiary admissibility, and the presence of significant scientific skepticism? The context of Indian policing is the key to the answer: the status quo alternative is too often literal torture. Indian policing is ‘rife with stories of physical torture and custodial deaths’. 238 BEOS proponents have suggested that even scientifically doubtful interrogation techniques may be a lesser of two evils. 239 Indeed, a BEOS technician at the Directorate of Forensic Science in Mumbai, said in an interview with Wired that subjects—even those accused of murder—are ‘so, so relieved to be here. They’re so happy to be here with us, because we’re not scary. We talk to them nicely. Just imagine… you can imagine in India the way the police must be dealing with them’. 240 Jinee Lokaneeta argues that in the context of brutal Indian criminal investigations and a disconnect between the law articulated by courts and routine impunity for police violence, courts have determined that the necessity of scientific evidence trumps concerns about scientific reliability. 241

This context takes the discussion of investigatory use of brain-based memory detection completely beyond the realm of scientific reliability and toward moral relativism. In a situation where the outcome would be the same—conviction and confinement—less intrusive means are obviously preferable. That is, an inaccurate brain-based test leading to a confession is morally preferable to torture leading to the same confession, even if that confession is factually wrong in both cases. In the aggregate, it is an empirical matter; it is not possible to say with any degree of certainty that inaccurate scientific methods of investigation would lead to ‘more’ wrongful convictions than a regime of torture-based interrogation.

Nevertheless, this is not to advocate for routine investigatory or courtroom use in the USA. Indeed, in a system where the vast majority of criminal charges are resolved by plea bargain—with many of those based on inadmissible evidence, as discussed below—there is not an obvious ‘greater evil’ of torture as the status quo to justify deployment of memory detection technology in investigations and, ultimately, plea bargaining. This calculus may be different in situations of urgent national security concerns or other truly exigent circumstances, but scientific limitations may exist there that make the utility of such application doubtful. 242

VI. MEMORY DETECTION IN FREE-PROOF SYSTEMS: INVESTIGATIONS AND ADMINISTRATIVE ADJUDICATIONS

Finally, how should brain-based memory detection be evaluated in free-proof systems, unconstrained by evidentiary rules because no jury is present? Application to investigative or administrative proceedings may be both legally permissible and normatively preferable, given the relaxed procedural requirements and increased specialization of decision-makers using the evidence. These less-formal contexts liberate experts from having to provide perhaps uncollectible data about real-world error rates, and from obtaining ‘general acceptance’ of the use for forensic purposes. Residual uncertainties about technological accuracy may not weigh so heavily, especially where the technology is presented as the only corroboration to thin testimony, or an alternative to methods known to be inaccurate, harmful, or abusive. Moreover, the decision-makers evaluating proffered evidence in investigations or administrative proceedings are not juries to be shielded from unreliable or un-assessable evidence, but experienced practitioners, who are often assumed—but not conclusively shown—to be less susceptible to reliance on inadmissible evidence. 243 Should we worry about brain-based memory detection used in a free-proof system, but with experienced decision-makers? Or should we hope for it?

Our answer is a qualified, and perhaps unsatisfying: maybe. Although it may be the case that investigatory and administrative settings lack the adversarial elements contributing to the ‘central myth’ of trial as a determination of truth, 244 and more contextual flexibility is available, the accuracy and limitations of memory detection technology must still be well-characterized to justify deployment in investigatory or administrative contexts. Even though the legal hurdles (or guardrails) are gone, the limitations of memory detection derived from the biological boundary conditions on accurate, veridical memories remain.

Investigative use of memory detection technology is not only easy to foresee, it is being actively pursued 245 and, in some jurisdictions, already in practice. In the USA, inadmissible technologies are routinely used in investigations. 246 Internationally, brain-based (and behavior-based) memory detection technology is currently used in a handful of jurisdictions. 247 Investigatory use of memory detection, untested by evidentiary hearings and cross-examination, may be an avenue to unwarranted plea deals or inaccurate confessions. 248 Moreover, the specter of false confessions is an unexplored area in memory detection research. Are subjectively experienced memories of an event created during the type of interrogations that result in false confessions, where investigators may supply details and suggestions? Based on current understanding of memory, it is definitely plausible and perhaps even likely. It is at this point impossible to know whether such an implanted memory would be detectable for what it is, or biologically indistinguishable from a true memory for the event. In such a scenario, the memory detection technology risks becoming a set of shackles rather than a neutral tool for fact development.

Application to administrative proceedings where credibility is central is also possible to imagine, such as asylum hearings and civil commitment proceedings. As to the former, an application for asylum can hinge upon credibility determinations made by trained officers and judges. 249 Credibility determinations often founder on an applicant’s inconsistencies in the retellings of their experiences, notwithstanding research indicating that discrepancies in autobiographical accounts are common among ordinary people, and more common among those who have suffered trauma. 250 If a memory detection expert could design a set of stimuli to detect whether an applicant had truly experienced events amounting to persecution, brain-based memory detection could potentially be a superior and consistent way to adjudicate such claims, free from the human biases and inconsistences between asylum officers. But claims that machines are less biased than humans should be viewed with suspicion, mindful of the rapidly expanding literature on algorithmic bias in many areas of automated decision-making. 251 Brain-based memory detection using machine learning has not even begun to be assessed for such forms of bias, nor is it immediately obvious what kind of human-like biases might infect algorithmic brain-based memory detection.

VII. CONCLUSION

This article has reviewed the current science of memory detection and ultimately argued that courtroom admissibility is a misdirected pursuit of that technology. At present, its admissibility would be precluded under Daubert , Frye , or state equivalents, primarily for lack of known error rates and lack of ‘general acceptance’ in the relevant scientific communities. But we have further argued that even if such error rates and acceptance accumulated, the most sophisticated brain-based memory detection devices may still not be suitable for courtroom use. This is first because the most advanced brain-based memory detection work suggests that only subjective experiences, rather than objective truths, may be accessible, rendering memory detection generally on the same footing as sincerity detection. Second, the method of acquiring that information requires machine-learning algorithms that may be opaque or even unexplainable to a jury, hindering their ability to assess the machine (and expert’s) credibility and assign appropriate weight. And even if a memory detection device worked perfectly well, would we want to use it? We have argued that if we take the jury system seriously, brain-based memory detection as evidence risks eliding the personhood of witnesses and thus undermining systemic, procedurally based legitimacy.

It is fair to ask whether we are letting the perfect be the enemy of the good-enough, or whether closing (but perhaps not barring) the doors to using brain-based memory detection as courtroom evidence will stagnate the interdisciplinary research enterprise. As to the latter, this concern is misplaced. There is still much to discover about the workings of memory itself, and prudent researchers are using the most sophisticated techniques to understand its contours and boundaries, rather than being driven by forensic application. As to the former, we think that prudence in fact-finding application is nevertheless warranted, and that systemic application of the knowledge of human memory gained from such work would have an overall greater effect than the handful of cases per year that would be litigated involving brain-based memory detection. That is, the advancing knowledge about human memory that is accumulating from memory detection research may be an aid to future evidence rulemaking, if not to individual fact-finding, just as research on eyewitness memory has finally begun to influence courtroom procedures and jury instructions. 252

ACKNOWLEDGEMENTS

Thanks to Holly Herndon, Chuck Marcus, and especially Bob Wu JD’21 for helpful research assistance. Thanks to Jeff Belin, Kate Bloch, Binyamin Blum, Kiel Brennan-Marquez, Dan Burk, Ed Chen, David Faigman, Chris Goodman, Eunice Lee, Alex Nunn, Roger Park, Anya Prince, Amanda Pustilnik, Andrea Roth, Reuel Schiller, Francis X. Shen, Julia Simon-Kerr, Howard Wasserman, participants in Biolawlapalooza 2019, participants in the Evidence Summer Workshop 2019, and participants at the University of Iowa College of Law faculty workshop in Sept. 2019. We also thank two anonymous reviewers for their thoughtful comments to improve the manuscript.

State of Maharashtra v. Sharma (June 12, 2008), Case No. 508/07, Sessions Court, Pune (India) (copy of decision on file with ERDM).

Anand Giridharadas , India’s Novel Use of Brain Scans in Courts is Debated , N.Y. Times, Sept. 14, 2008, at A10; see also Nitasha Natu, This Brain Test Maps the Truth , Times India, https://timesofindia.indiatimes.com/city/mumbai/This-brain-test-maps-the-truth/articleshow/3257032.cms (accessed Mar. 3, 2020) (reporting on the Sharma case and another murder case in which the accused was convicted of a brutal murder and robbery after a ‘BEOS test [was] positive,’ though a lead prosecutor stated that ‘while BEOS was a useful technique of examination, it could not achieve conviction by itself. “The technique needs to be corroborated with other evidence”’.)

Angela Saini, The Brain Police: Judging Murder with an MRI , Wired (May 27, 2009), https://www.wired.co.uk/article/guilty .

See eg Jennifer S. Bard, “Ah Yes, I Remember It Well”: Why the Inherent Unreliability of Human Memory Makes Brain Imaging Technology A Poor Measure of Truth-Telling in the Courtroom , 94 Or. L. Rev. 295, 351 (2016); Teneille Brown & Emily Murphy, Through a Scanner Darkly: Functional Neuroimaging as Evidence of a Criminal Defendant’s Past Mental States , 62 Stan. L. Rev. 1119 (2010); Brian Farrell, Cannot Get You Out of My Head: The Human Rights Implications of Using Brain Scans as Criminal Evidence , 4 Interdisc. J. Hum. Rts. L. 101 (2010); Lyn M. Gaudet & Gary E. Marchant, Under the Radar: Neuroimaging Evidence in the Criminal Courtroom , 64 Drake L. Rev. 577 (2016); J.R.H. Law, Cherry-Picking Memories: Why Neuroimaging-Based Lie Detection Requires A New Framework for the Admissibility of Scientific Evidence Under FRE 702 and Daubert , 14 Yale J.L. & Tech. 1 (2011); Michael L. Perlin, “His Brain Has Been Mismanaged with Great Skill”: How Will Jurors Respond to Neuroimaging Testimony in Insanity Defense Cases? , 42 Akron L. Rev. 885 (2009); Mark Pettit, Jr., FMRI and BF Meet FRE: Brain Imaging and the Federal Rules of Evidence , 33 Am. J.L. & Med. 319 (2007); Francis X. Shen et al., The Limited Effect of Electroencephalography Memory Recognition Evidence on Assessments of Defendant Credibility , 4 J.L. & Biosciences 330, 331 (2017); William A. Woodruff, Evidence of Lies and Rules of Evidence: The Admissibility of FMRI-Based Expert Opinion of Witness Truthfulness , 16 N.C. J.L. & Tech. 105 (2014); So Yeon Choe, Misdiagnosing the Impact of Neuroimages in the Courtroom , 61 UCLA L. Rev. 1502 (2014); Eric K. Gerard, Waiting in the Wings? The Admissibility of Neuroimagery for Lie Detection , 27 Dev. Mental Health L. 1 (2008); Jennifer Kulynych, Psychiatric Neuroimaging Evidence: A High-Tech Crystal Ball? , 49 Stan. L. Rev. 1249 (1997); Christina T. Liu, Scanning the Evidence: The Evidentiary Admissibility of Expert Witness Testimony on MRI Brain Scans in Civil Cases in the Post-Daubert Era , 70 N.Y.U. Ann. Surv. Am. L. 479 (2015); Adam Teitcher, Weaving Functional Brain Imaging into the Tapestry of Evidence: A Case for Functional Neuroimaging in Federal Criminal Courts , 80 Fordham L. Rev. 355 (2011). For further work on the impact and persuasiveness of brain imaging technology, see note 175.

See John B. Meixner, Jr., Admissibility and Constitutional Issues of the Concealed Information Test in American Courts: An Update , in Detecting Concealed Information and Deception 405, 406 (J. Peter Rosenfeld ed., 2018) (arguing that the evidentiary and constitutional limitations on the courtroom admissibility of versions of the Concealed Information test ‘should drive the research agenda of every CIT researcher interested in the practical use of their work.’); Gerson Ben-Shakar & Mordechai Kremnitzer, The CIT in the Courtroom: Legal Aspects , in Memory Detection: Theory and Application of the Concealed Information Test 276 (Bruno Verschuere et al. eds., 2011) (arguing that the CIT has the potential of meeting Daubert criteria for admissibility); J. Peter Rosenfeld et al., Detection of Concealed Stored Memories with Psychophysiological and Neuroimaging Methods, in Memory and Law 264, 290 (Lynn Nadel & Walter Sinnott-Armstrong eds., 2012) (providing recommendations that ‘would lead to enhanced implementation of the CIT and possibly also to its use as admissible evidence in courts’).

Interview with CR Mukundan, Inventor and Patent of the BEOS Profiling Technology, in Bangalore, India (Aug. 11, 2009); see also John Meixner, Liar Liar: Jury’s the Trier? The Future of Neuroscience-Based Credibility Assessment in the Court. 106 Nw. U. L. Rev. 1451, 1474–75 (2012).

Reviewing P300 complex trial protocol techniques in 2013, Rosenfeld et al. wrote that once the P300 CIT procedure was validated in a ‘field population,’ it would satisfy all Daubert criteria for admissibility. J. Peter Rosenfeld et al., Review of Recent Studies and Issues Regarding the P300-Based Complex Trial Protocol for Detection of Concealed Information , 90 Int’l J. Psychophysiology 118 (2013); see also Ben-Shakar & Kremnitzer, supra note 6.

See text accompanying notes 226–241, infra.

Memory matters to other actors in the legal system, including jurors and judges. See eg Anders Sandberg et al., The Memory of Jurors: Enhancing Trial Performance , in Memory and Law, supra note 6, at 213. But the categories of witnesses and suspects are most apt for evidentiary analysis, yet broad enough to encompass a range of witness types in both civil and criminal proceedings (such as victims, percipient witnesses, character witnesses, defendants, and co-conspirators), as well as inclinations (hostile, friendly, self-serving, or reluctant).

Jesse Rissman & Emily R.D. Murphy, Brain-based Memory Detection and the New Science of Mind Reading, in The Oxford Handbook of Human Memory (Michael J. Kahana & Anthony D. Wagner eds.) (forthcoming 2020).

See infra note 146 and accompanying text.

In 2011, the New Jersey Supreme Court substantially modified the state’s procedures for evaluating eyewitness identifications after appointing a Special Master to evaluate the scientific evidence behind such identifications. State v. Henderson, 27 A.3d 872 (N.J. 2011). After Henderson , New Jersey defendants who showed some evidence of ‘suggestiveness’ in a prosecution’s eyewitness identification are entitled to a pretrial hearing to evaluate the variables affecting that identification and decide its admissibility. The ‘Henderson’ court also requested that committees draft proposed revisions to model jury instructions on eyewitness identification that would address the myriad variables affecting identifications in order to guide how a jury should weigh the identification if it is admitted, ultimately reducing reliance on expensive and time-consuming expert testimony to explain to jurors the scientific consensus on how human memory works and the factors that affect it. Id. at 878. The report and recommended jury instructions, ultimately adopted in 2012, are available at https://njcourts.gov/courts/assets/criminal/outofcourtreport.pdf. In 2019 , the New Jersey Supreme Court modified the holding from ‘Henderson’ to entitle defendants to a pretrial hearing on the admissibility of a witness’ identification even without evidence of suggestiveness on the part of law enforcement, in circumstances where no electronic or contemporaneous, verbatim written recording of the identification procedure was prepared. State v. Anthony, 204 A.3d 229 (N.J. 2019). Other states have taken similar steps. In 2014, Pennsylvania’s Supreme Court held that expert testimony regarding eyewitness identification was no longer per se inadmissible, but rather a matter of trial court discretion. Commonwealth v. Walker, A.3d 766 (Pa. 2014). But efforts to implement model jury instructions similar to New Jersey’s (drafted in 2012) do not appear to have advanced further. See Jeannine Turgeon et al., Crafting Model Jury Instructions for Evaluating Eyewitness Testimony, The Pa. Law., Sept./Oct. 2014, at 49; see also State v. Guilbert, 306 Conn. 218, 234 (Conn. 2012) (disavowing earlier rulings which restricted expert testimony, stating that such previous rulings are ‘out of step with the widespread judicial recognition that eyewitness identifications are potentially unreliable in a variety of ways unknown to the average juror’.).

Daniel V. Meegan, Neuroimaging Techniques for Memory Detection: Scientific, Ethical, and Legal Issues , 8 Am. J. Bioethics 9 (2008) (commentaries follow the piece, on 21–36). In 2008, Daniel Meegan comprehensively reviewed neuroimaging techniques for memory detection, critically evaluating then-existing studies along four attributes essential to an ideal forensic memory detection test: specificity (the degree to which an effect in a test is specific to the stimuli of interest, like the murder weapon versus generic weapons in the same category), retrieval automaticity (the extent to which a memory retrieval is automatic, and thus presumed to be resistant to a countermeasure by attempting to respond to the stimulus as if it were new), encoding flexibility (the extent to which a memory detection test gives robust results in varied encoding conditions, closely related to the sin of ‘absent-mindedness’), and longevity (the length of time that an effect remains measurable after the original encoding of the memory, also known as the sin of ‘transience’).

See generally Henry T. Greely & Judy Illes, Neuroscience-Based Lie Detection: The Urgent Need for Regulation , Am. J.L. Med. 377 (2007); see also sources infra note 127; U.S. v. Semrau, 693 F.3d 510 (6th Cir. 2012).

A memory ‘trace’ is the physical record in the brain of a past experience. This is sometimes represented by the concept of the ‘engram’. See eg Lucas Kunz, et al., Tracking Human Engrams Using Multivariate Analysis Techniques , in 28 Handbook of Behavioral Neuroscience 481–508 (Denise Manahan-Vaughn, ed., 2018); Sheena A. Josselyn, Stefan Köhler, & Paul W. Frankland, Finding the Engram , 16 Nature Rev. Neurosci. 9, 521–34 (2015).

Meegan, supra note 14, at 9.

The CIT stands in contrast to a protocol called the Comparison Question test. See Rosenfeld et al., supra note 8.

Other research does attempt something closer to mind-reading. See eg Russell A. Poldrack, The New Mind Readers: What Neuroimaging Can and Cannot Reveal about Our Thoughts (2018); P.R. Roelfsema et al., Mind Reading and Writing: The Future of Neurotechnology , 22 Trends in Cognitive Sciences 598 (2018); T. Horikawa et al., Neural Decoding of Visual Imagery During Sleep , 340 Science 639, 639–42 (2013); U.S. Patent 9,451,883 (filed Dec. 21, 2012), https://patents.google.com/patent/US9451883B2/en ; S. Nishimoto et al., Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies ; 21 Current Biology 1641, 1641–66 (2011).

This is the working theory of the inventor of BEOS, see discussion accompanying notes 228–233, infra.

See Alan S. Brown, A Review of the Déjà Vu Experience , 129 Psychol. Bull. 394 (2003).

See eg Ira E. Hyman Jr. & Elizabeth F. Loftus, Errors in Autobiographical Memory , 18 Clinical Psychol. Rev. 933, 933–94 (1998); Elizabeth F. Loftus & Hunter G. Hoffman, Misinformation and Memory: The Creation of New Memories , 118 J. Exp. Psychol. 100 (1989).

Hung-Yu Chen et al., Are There Multiple Kinds of Episodic Memory? An fMRI Investigation Comparing Autobiographical and Recognition Memory Tasks , 37 J. Neurosci. 2764 (2017); see also Tiffany E. Chow et al., Multi-voxel Pattern Classification Differentiates Personally Experienced Event Memories from Secondhand Event Knowledge , 176 NeuroImage 110 (2018).

The 2003 National Research Council Report on the polygraph test, which traditionally uses autonomic nervous system measurements such as heart rate, blood pressure, and skin conductance, noted that ‘[c]ountermeasures pose a serious threat to the performance of polygraph testing because all the physiological indicators measured by the polygraph can be altered by conscious efforts through cognitive or physical means.’ Nat’l Research Council, The Polygraph and Lie Detection 3 (2003), https://www.nap.edu/read/10420/chapter/2#3 . The putative inventor of the ‘guilty knowledge test’ suggested that brain waves would be less vulnerable to countermeasures, largely because of their temporal brevity: ‘Because such potentials are derived from brain signals that occur only a few hundred milliseconds after the [guilty knowledge test] alternatives are presented… it is unlikely that countermeasures could be used successfully to defeat a GTK derived from the recording of cerebral signals’. David T. Lyyken, A Tremor in the Blood 293 (1998).

See generally Daniel Schacter, The Seven Sins of Memory: How the Mind Forgets and Remembers (2002).

See generally Jamie DeCoster & Heather M. Claypool, A Meta-Analysis of Priming Effects on Impression Formation Supporting General Model of Informational Biases , 8 Personality Soc. Psychol. Rev. 2 (2004); David Sleeth-Keppler, Taking the High (or Low) Road: A Quantifier Priming Perspective on Basic Anchoring Effects , 153 J. Soc. Psychol. 424 (2013); Nicole M. Long et al., Memory and Attention , in 1 Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 285 (John T. Wixted et al. eds., 4th ed. 2017).

See discussion accompanying notes 180–181.

See Henry L. Roediger III et al., The Curious Complexity between Confidence and Accuracy in Reports from Memory , in Memory and Law, supra note 6, at 91.

Elizabeth Loftus has written in great detail about the effects of false memories in legal proceedings. See Daniel M. Bernstein & Elizabeth F. Loftus, How to Tell if a Particular Memory is True or False, 4 Perspectives on Psychol. Sci. 370 (2009) (detailing measures lawyers take to determine whether an individual can be believed); Ira E. Hyman & Elizabeth F. Loftus, Errors in Autobiographical Memory, 18 Clinical Psychol. Rev. 933 (1998) (writing how memory is constructive and thus errors are likely to occur); Elizabeth F. Loftus, Planting Misinformation in the Human Mind: A 30-year Investigation of the Malleability of Memory , Learning & Memory 361 (2005) (detailing how misleading information can affect one’s memory in damaging ways); see also Schacter, supra note 25 (describing ‘sins’ of memory—‘transience, absent-mindless, blocking, misattribution, suggestibility, bias, and persistence’—that lead individuals to confidently claim they have experienced events that never occurred.)

See generally Endel Tulving, Episodic Memory: From Mind to Brain , 53 Ann. Rev. Psychol. 1 (2002).

See Bruno Verschuere & Bennett Kleinberg, Assessing Autobiographical Memory: The Web-based Autobiographical Implicit Association Test , 25 Memory 520, 527–28 (2016); see also J. Peter Rosenfeld et al., Instructions to Suppress Semantic Memory Enhances or Has No Effect on P300 in a Concealed Information Test (CIT) , 113 Int. J. Psychol. 29, 30 (2017).

Louis Renoult et al., Personal Semantics: At the Crossroads of Semantic and Episodic Memory , 16 Trends Cognitive Sci. 550, 550–56 (2012).

Consider, as an illustration of how it can be difficult to make inferences about an underlying cognitive process based on behavior, this example from Renoult et al., id. at 554, about how single episodes can later be retrieved as autobiographical facts: ‘I may know what occurred during my brother’s speech at my wedding, without having a perceptually rich, first-person re-experiencing of it. In other words, this single event could be retrieved with noetic awareness, that is, the type of awareness that is associated with knowing about the world and retrieving semantic knowledge, rather than with autonoetic awareness, that is, the type of awareness associated with subjectively re-experiencing events from [episodic memory.]’ Memory researchers attempt to overcome this difficulty with the ‘remember/know procedure, in which participants report on the type of awareness that they have experienced during retrieval’, along with other methods. Id.

A recent study demonstrated that fMRI with machine-learning algorithm analysis could predict with a high degree of accuracy whether a study subject was in a ‘knowing’ or ‘reckless’ state of mind ‘at the time’ they made a decision whether to carry a suitcase potentially or actually containing contraband through a security checkpoint that could potentially be searched. Iris Vilares, Michael Wesley, Woo-Young Ahn, Richard J. Bonnie, Morris B. Hoffman, Owen D. Jones, Stephen J. Morse, Gideon Yaffe, Terry Lohrenz, & Read Montague, Predicting the Knowledge-Recklessness Distinction in the Human Brain, 14 Proc. Nat’l Acad. Sci. 3222 (2017). See also Owen D. Jones, Gideon Yaffe, & Read Montague, Detecting Mens Rea in the Brain , 169 U. Pennsylvania L.R. (forthcoming 2020), writing ‘by combining fMRI brain-imaging techniques with a machine learning algorithm we were able to distinguish among guilty minds’. Importantly for this analysis, the proof of concept is that the culpable mental states of ‘knowing’ and ‘reckless’ are neurally and psychologically distinguishable, but not that they could be retroactively distinguished. ‘Our team’s neuroscientific techniques can discover brain-based differences between mental states that exist at the time of scanning, not at some prior time … our current experiment has implications for criminal justice policy, but not for forensic evaluation of individual defendants’ Id .

Brown and Murphy, supra note 5, at 1130–31, 1187–88.

Here we use ‘biological’ to encompass psychological phenomena as well, referring to mechanisms intrinsic to the person, rather than to the tools being used to assess the person.

See eg Overview, Farwell Brain Fingerprinting—Dr. Larry Farwell, https://larryfarwell.com/brain-fingerprinting-overview-dr-larry-farwell-dr-lawrence-farwell.html (accessed Mar. 12, 2020) and iCognative Technology , Brainwave Science, http://brainwavescience.com/icognative/ (accessed March 12, 2020). Farwell now complains that Brainwave Science stole his technology.

Courts in India and police services in Singapore have employed BEOS technology. David Cox, ‘Can Your Brain Reveal You are a Liar?’, BBC Future (Jan. 25, 2016).

See generally Meegan, supra note 14 (concluding that EEG-based memory detection techniques have forensic potential but require additional research and considerations of legal admissibility and ethical issues.)

Lawrence Farwell, an American scientist, markets his ‘brain fingerprinting’ technology as ‘admissible in court’ and sells it to international law enforcement departments. See Larry Farwell Brain Fingerprinting Laboratories, https://larryfarwell.com/brain-fingerprinting-laboratories-inc.html (accessed Mar. 20, 2020). Farwell was an early participant in peer reviewed research on memory detection, see Lawrence A. Farwell & Emanuel Donchin, The Truth Will Out: Interrogative Polygraphy (“Lie Detection”) with Event-Related Brain Potentials , 28 Psychophysiology 531 (1991); but his more recent and product-focused work (reporting suspicious 0 per cent error rates) has been critiqued by other researchers,. See J. Peter Rosenfeld, ‘Brain Fingerprinting’: A Critical Analysis , 4 Sci. Rev. Mental Health Prac. 20 (2005) for a critique of the vague definition of Farwell’s patented P300-MERMER response. In 2012, Farwell published Brain Fingerprinting: A Comprehensive Tutorial Review of Detection of Concealed Information with Event-Related Brain Potentials, 6 Cognitive Neurodynamics 115 (2012). Other researchers responded forcefully, arguing that Farwell’s ‘tutorial’ is ‘misleading and misrepresents the scientific status of brain fingerprinting technology’, And noting a paucity of peer-reviewed data and selection bias in Farwell’s reporting. See Ewout H. Meijer et al., A Comment on Farwell (2012): Brain Fingerprinting: A Comprehensive Tutorial Review of Detection of Concealed Information with Event-Related Brain Potentials, 7 Cognitive Neurodynamics 155 (2013). In 2013, Farwell and colleagues published a study comparing error rates and accuracy in four separate field studies, once again reporting an unbelievable 100 per cent accuracy, 0 per cent error rate, no false positives or false negatives, and no vulnerability to countermeasures for studies using P300-MERMER. Lawrence A. Farwell et al., Brain Fingerprinting Field Studies Comparing P300-MERMER and P300 Brainwave Responses in the Detection of Concealed Information, 7 Cognitive Neurodynamics 263 (2013).

‘Meaningful’ stimuli may seem a vague term. This is deliberate, because the psychological characteristics of such ‘meaningful’ stimuli that trigger a particular brain response may be ‘meaningful’ for different reasons. For example, it may be inaccurate to say something is ‘remembered’ just because a subject’s brain gives a characteristic response. The same P300 ERP response may be elicited by stimuli that are remembered, recognized, familiar, or even simply highly salient to a subject. For example, a gun encountered amidst a series of innocuous object stimuli would evoke a P300, but a gun encountered amidst an array of different weapons (e.g., dagger, switchblade, crow bar, etc.) would not automatically evoke a P300. It would only evoke a P300 if it was meaningful to the subject. But, this could be because the subject is a gun enthusiast, or because s/he thinks that the gun is the ‘guilty knowledge’ stimulus the investigators are interested in, even if s/he did not commit the crime. A more effective assessment using the P300 would be to flash a series of different guns, with only one being the gun actually found at the crime scene or known to be the murder weapon.

Rosenfeld describes the P300 as ‘a cortical signal of the recognition of meaningful information’. See J. Peter Rosenfeld, P300 in Detecting Concealed Information , in Memory Detection, supra note 6, at 63.

Meegan, supra note 14, at 12; see generally John Polich & Albert Kok, Cognitive and Biological Determinants of P300: An Integrative Review , 41 Biological Psychol. 103 (1995); see also Rosenfeld, supra note 43, at 64. A separate ERP response, an anterior signal the labeled N200 (or sometimes N2), was at one point suggested to respond selectively to probes v. irrelevants and thought to index cognitive control processes when lying about probes. Matthias Gamer & Stefan Berti, Task Relevance and Recognition of Concealed Information have Different Influences on Electrodermal Activity and Event-Related Brain Potentials , 47 Psychophysiology 355 (2010). But a subsequent study found that this effect was likely due to an experiment design flaw, and that the N2 differences were the result of stimulus differences rather than concealed information differences. Giorgio Ganis et al., Is Anterior N2 Enhancement a Reliable Electrophysiological Index of Concealed Information? , 143 Neuroimage 152 (2016).

Rosenfeld, supra note 43, at 64.

Meegan, supra note 14, at 12.

Running a single individual through multiple testing runs to obtain enough data for averages of different trial types is problematic, especially infrequent trial types such as probes and irrelevants. Repeating a test can create habituation effects that may confound probe effects, and previously irrelevant stimuli could become relevant through repetition. Rosenfeld, supra note 43, at 69. A statistical analytic method called ‘bootstrapping’ is used to assess differences between stimuli types for a single individual subject in one run. Bootstrapped averages are created by repeatedly resampling from the set of traces for a particular probe (or irrelevant) ‘with replacement’, meaning the same trace could be sampled more than once. For each resampled set, an average is created. The process is iterated many times to create a distribution of averages. Then, a t -test on the resampled averages is more sensitive than a test on single ERP traces, which have a large amount of noise. See Id. This technique effectively treats the raw data, which is in reality a sample of the ‘population’ of brain traces for a given individual, as the ‘population’ to be analyzed by resampling. This technique was introduced into the EEG memory detection literature by Farwell & Donchin, supra note 41, in 1991. Rosenfeld’s research group ‘typically require[s] that there must be at least .9 (90 per cent) level of statistical confidence that the average probe P300 is greater than the average of all irrelevant P300s before concluding that a subject recognizes concealed information germane to a crime’, but caution that ‘until a finalized and definitive P300-based test is developed and… optimized for maximum discrimination efficiency and accuracy, as confirmed in representative populations, one cannot arbitrarily set a bootstrap diagnostic criterion at some level for use in all future studies’. Rosenfeld, supra note 8, at 119.

Matthias Gamer et al., Combining Physiological Measures in the Detection of Concealed Information , 95 Physiology & Behav. 333, 333 (2007). But see recent evidence that single trial EEG data can be used to classify memory retrieval outcomes on individual trials with modest yet above-chance accuracy (56–61 per cent correct, with 50 per cent as chance). Eunho Noh et al., Single-Trial EEG Analysis Predicts Memory Retrieval and Reveals Source-Dependent Differences , 12 Frontiers Hum. Neurosci., Article 258 (July 2018).

For any version of the CIT, multiple choice options are offered. On any given trial, the likelihood of a false positive is determined by the number of options: 25 per cent if there are four choices, 20 per cent if there are five choices. By adding more questions with independent items of information, the likelihood of an overall false positive result decreases, by virtue of multiplying the likelihood of each independent trial. Test designers must choose a threshold for how many ‘hits’ across multiple trials would lead to a ‘guilty’ determination, accounting for the likelihood of false positives and a normatively determined acceptable threshold of error. Rosenfeld et al. tout this as an advantage of the CIT: ‘The greater the number of independent items, the greater the protection against false positive diagnoses.’ Rosenfeld et al., supra note 8, at 119. But requiring more independent items to be tested to lower the false positive probability to an acceptable threshold may prove extremely difficult in practice, where the set of ‘probes’ known only to a guilty subject and investigators may be limited.

An operating assumption behind brain-based memory detection research is that techniques may access psychological processes of recognition that may be ‘sufficiently automatic as to be relatively invulnerable to countermeasures’. Meegan, supra note 14, at 18. Meegan examined the potential for EEG-based studies to be vulnerable to countermeasures under the rubric of ‘retrieval automaticity’. Id. at 10–13. He described two strategies that could be used by a ‘guilty’ subject attempting to beat a P300-based test: (i) producing an irrelevant-like P300 for probes (creating false negatives), or (ii) producing a probe-like P300 for irrelevants (creating false positives). Meegan identified the former strategy as particularly challenging and not yet supported by existing evidence, given that it is ‘difficult to treat something meaningful as meaningless’. Id. at 13. The latter strategy has been examined, starting in 2004 when Rosenfeld et al. trained ‘guilty’ participants (who had committed a mock crime) to make irrelevants seem like task-relevant stimuli by making a distinct covert physical response (such as pressing a particular finger into their leg, or wiggling a toe) to different categories of irrelevants, resulting in a P300 similar to that of probes. J. Peter Rosenfeld et al., Simple, Effective Countermeasures to P300-based Tests of Detection of Concealed Information , 41 Psychophysiology 205 (2004).

J. Peter Rosenfeld et al., The Complex Trial Protocol (CTP): A New, Countermeasure-Resistant, Accurate, P300-based Method for Detection of Concealed Information , 45 Psychophysiology 906, 906–07 (2008).

Id. See also J. Peter Rosenfeld & Henry T. Greely , Deception, Detection of, P300 Event-Related Potential (ERP), in The Wiley Encyclopedia of Forensic Science (Allan Jamieson & Andre A. Moenssens eds., 2012).

Rosenfeld et al., supra note 50, at 208–09 (describing wiggling various toes or pressing various fingers); Ralf Mertens & John JB Allen, The Role of Psychophysiology in Forensic Assessments: Deception Detection, ERPs, and Virtual Reality Mock Crime Scenarios , 45 Psychophysiology 286 (2008).

Alexander Sokolovsky et al., A Novel Countermeasure Against the Reaction Time Index of Countermeasure Use in the P300-based Complex Trial Protocol for Detection of Concealed Information , 81 Int. J. Psychophysiology 60, 61 (2011).

Rosenfeld et al., supra note 51, at 907–08; Michael R. Winograd & J. Peter Rosenfeld, Mock Crime Application of the Complex Trial Protocol (CTP) P300-Based Concealed Information Test , 48 Psychophysiology 155 (2011).

Sokolovsky et al., supra note 54.

Zara M. Bergstrom et al., Intentional Retrieval Suppression Can Conceal Guilty Knowledge in ERP Memory Detection Tests , 94 Biological Psychol. 1 (2013). Bergstrom et al. used a three stimulus protocol; ‘guilty’ subjects were assigned to be cooperative and rehearse their response knowledge upon cue presentation, or be uncooperative and try to suppress knowledge coming to mind in response to the cue. Id. at 3–4. The P300 response was smaller in the latter group, but evidence for active suppression is confounded by the fact that the first group ‘actively rehearsed’, as opposed to being passively knowledgeable about their guilty knowledge. Id. at 4–10. In 2015, Hu et al. added a ‘simply knowledgeable’ group in a mock crime scenario tested with the CTP and found differences in the P300 between the ‘simply knowledgeable’ guilty group and the actively suppressing guilty group. Xiaoqing Hu, et al., Suppressing Unwanted Autobiographical Memories Reduces Their Automatic Influences: Evidence From Electrophysiology and an Implicit Autobiographical Memory Test , 26 Psychol. Sci. 1098, 1099–1104 (2015). But the differences were based on a particular method of measuring the P300 (baseline to peak), while a different method (peak to peak) is substantially more accurate in discriminating knowledgeable and naive subjects. See also Rosenfeld et al., supra note 31.

Anne C. Ward & J. Peter Rosenfeld, Attempts to Suppress Episodic Memories Fail but Do Produce Demand: Evidence from the P300-Based Complex Trial Protocol and an Implicit Memory Test , 42 Applied Psychophysiology Biofeedback 13, 24 (2014) (reporting that, with a single discrete change to the CTP used by the 2015 paper—changing the ratio of target to non-target items to make the task slightly easier—attempts to suppress reduction of P300 to probe responses did not give significant differences as compared to a ‘simply guilty’ control group).

See Rosenfeld et al., supra note 31, at 30.

Rosenfeld reported on an experiment where subjects pretended to steal 1 of 10 items from a box, and then were shown the names of all 10 items one at a time. The items that subjects pretended to steal evoked target-like P300s, whereas the non-stolen items did not. J. Peter Rosenfeld et al., Late Vertex Positivity in Event-Related Potentials as a Guilty Knowledge Indicator: A New Method of Lie Detection , 34 Int’l L.J. Neurosci. 125 (1987). Farwell and Donchin reported on two experiments, the first a mock espionage scenario in which, after extensive training, briefcases were transferred to confederates in code name operations. The second experiment tested four subjects who were admittedly guilty of minor transgressions on campus. Farwell & Donchin, supra note 41. Winograd and Rosenfeld conducted a mock crime experiment where some subjects were given instructions to steal a ‘ring’, whereas others were told to steal an ‘item’, finding that having knowledge of the crime created a high rate of false positives. Winograd & Rosenfeld, infra note 63. Hu et al. tested the effect of a delayed test, determining that subjects tested a month after being told to ‘steal’ an exam were detected at a similar rate as those who were tested immediately after the crime. Hu et al., infra note 65.

John B. Meixner & J. Peter Rosenfeld, Detecting Knowledge of Incidentally Acquired, Real-World Memories Using a P300-Based Concealed-Information Test, 25 Psychol. Sci. 1 (2014).

Id. at 2–4.

Michael R. Winograd & J. Peter Rosenfeld, The Impact of Prior Knowledge from Participant Instructions in a Mock Crime P300 Concealed Information Test , 94 Int’l J. Psychophysiology 473, 473 (2014); see also Winograd & Rosenfeld, supra note 55.

Rosenfeld et al., supra note 8, at 122.

Xiaoqing Hu et al., Combating Automatic Autobiographical Associations: The Effect of Instruction and Training in Strategically Concealing Information in the Autobiographical Implicit Association Test , 23 Psychol. Sci. 1079, 1080 (2012).

Pranav Misra et al., Minimal Memory for Details in Real Life Events , 8 Sci. Rep., Article 16,701 (2018).

Rosenfeld, supra note 8, at 125.

Xiaoqing Hu et al., N200 and P300 as Orthogonal and Integrable Indicators of Distinct Awareness and Recognition Processes in Memory Detection , 50 Psychophysiology 454, 462–65 (2013). Researchers have questioned whether supplementing P300-based protocols with feedback about a subject’s responses could improve the test’s detection efficiency. In a recent study using a mock-crime scenario and the CTP, providing non-veridical feedback directing deceptive subjects’ attention to the probe resulted in a larger P300 signal (probe-irrelevant difference) than did providing generic feedback regarding test performance. The study’s authors refer to the probed material as ‘incidentally acquired’, apparently meaning that the identity of the to-be-stolen item (a ring) was only acquired if the subject actually stole the ring, rather than via instructions to steal it (or not to steal it, for ‘innocent’ subjects). In the ‘high awareness’ condition, where feedback told the subject ‘based on your brainwaves, it seems that you are following task instructions well’, the P300 could effectively differentiate guilty from innocent participants. But in the ‘low awareness’ condition, where subjects received feedback that ‘based on your brainwaves, it seems you are following task instructions well’, detecting guilty subjects was not significantly different from chance.

Id. They also reported a frontally centrally distributed N200 ERP response to probes in the ‘high awareness’ group of guilty participants who had stolen the ring and were told that their brainwaves identified something as important, as compared to innocent or ‘low awareness’ groups (who were told their brainwaves indicated they followed directions well). Hu et al. theorize that processes other than recognition may be mechanisms by which memory detection works and that combining them may improve detection efficiency. Id. Further research on the mechanisms that elicit the N200 is needed.

Researchers have attempted to combine P300-based tests with autobiographical versions of the Implicit Association Test (aIAT). See Hu et al., supra note 65.

See generally Richard A. Carlson & Don E. Dulany, Conscious Attention and Abstraction in Concept Learning , 11 J. Exp. Psychol. 45 (1985); Eric Eich, Memory for Unattended Events: Remembering With and Without Awareness , 12 Memory & Cognition 105 (1984); Ronald T. Kellogg & Ruth S. Dare, Explicit Memory for Unattended Information , 27 Bull. Psychonomic Soc’y 409 (1989).

Consider a home break-in scenario, where perpetrators grab items perceived to have value without closely examining them, such as emptying a jewelry box. In such a scenario, a victim’s inventory of missing items may have little use in investigating a suspect, who may have never encoded the Art Deco earrings or mother-of-pearl brooch.

For a recent review, see Stephanie A. Gagnon & Anthony D. Wagner, Acute stress and episodic memory retrieval: neurobiological mechanisms and behavioral consequences , 1369 Ann. N.Y. Acad. Sci. 55–75 (2016). In general, acute stress can enhance encoding and consolidation of episodic memory, particularly for emotional events. Id. at 59. But the research findings are complex and nuanced, and not necessarily linear in terms of the relationship between stress and encoding, consolidation, recognition, and recall, and there are conflicting findings as to whether stress differentially affects emotional or neutral stimuli.

Id. at 60. However, there are differential impacts depending on the retrieval task, and it may be the case that probed memory detection paradigms that present information to be recognized would be least affected by stress at the time of retrieval: ‘Stress-related impairments of episodic retrieval tend to be greater on tests requiring free recall relative to cued recall and cued recall relative to recognition’. See also Stephanie A. Gagnon, Michael L. Waskom, Thackery I. Brown & Anthony D. Wagner, Stress impairs episodic retrieval by disrupting hippocampal and cortical mechanisms of remembering , 29 Cerebral Cortex 2947–2964 (2019) (Assessing recollection in a cued recall task under threat of electric shock, using fMRI with multivariate pattern analysis: ‘When stressed, people are less likely to recollect specific details about past events, and, even when expressing high confidence in what was remembered, they are more likely to produce less accurate memories’.).

Brown & Murphy, supra note 5.

Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Working Group, Brain 2025: A Scientific Vision 41–42 (2014), https://braininitiative.nih.gov/strategic-planning/brain-2025-report .

See Nicco Reggente et al., Enhancing the Ecological Validity of fMRI Memory Research Using Virtual Reality , 12 Frontiers Neurosci., Article 408 (June 2018).

For an accessible review of these methodological approaches, see Joey Ka-Yee Essoe & Jesse Rissman. Applications of Functional MRI in Memory Research , in Handbook of Research Methods in Human Memory 397–427 (H. Otani & B.L. Schwartz, eds., 2018).

This literature has been comprehensively reviewed in Nancy A. Dennis et al., Functional Neuroimaging of False Memories , in The Wiley Handbook on The Cognitive Neuroscience of Memory 150 (Donna Rose Addis et al. eds., 1st ed. 2015); see also Daniel L. Schacter et al., Neuroimaging of True, False, and Imaginary Memories , in Memory and Law, supra note 6, at 233.

Dennis et al., supra note 79, at 152.

Id. at 154–55.

Id. at 157–59.

Avi Mendelsohn et al., Signature of Memory: Brain Coactivations During Retrieval Distinguish Correct from Incorrect Recollection , 4 Frontiers Behav. Neurosci., Article 18 (April 2010) at 1.

Jesse Rissman & Anthony D. Wagner, Distributed representations in memory: insights from functional brain imaging , 63 Ann. Rev. Psychol. 101–28 (2012).

See generally James V. Haxby, Multivariate Pattern Analysis of fMRI: The Early Beginnings , 62 Neuroimage 852 (2012).

Jesse Rissman et al., Detecting Individual Memories Through the Neural Decoding of Memory States and Past Experience , 107 Proceedings Nat’l Acad. Sci. 9849, 9852 (2010).

The classifiers were also highly accurate in determining ‘whether participants’ recognition experiences were associated with subjective reports of recollection, a strong sense of familiarity, or only weak familiarity, with the discrimination between recollection and strong familiarity being superior to that between strong v. weak familiarity’. Id. And in a second experiment that probed implicit memory for oldness or newness, participants were required to study faces before scanning and then, in the scanner, make male/female judgments rather than old/new judgments. In this situation, the classification methods were ‘not capable of robustly decoding the OLD/NEW status of faces encountered during the Implicit Recognition Task’, leading the researchers to conclude that ‘a neural signature of past experience could not be reliably decoded during implicit recognition’. Id. at 9852–53. This suggests that encoding environment matters significantly for the detectability of details of life experiences that may only be incidentally encoded, rather than the focus of attention. Such a finding has implications for stimulus selection in any forensic context—perhaps only a murderer would know that the victim was found on a paisley-pattered couch, but if the murderer paid no attention to the couch pattern, such a unique detail may be only incidentally encoded and thus cannot be reliably detected. But see Brice A. Kuhl et al., Dissociable Neural Mechanisms for Goal-Directed Versus Incidental Memory Reactivation , 33 J. Neurosci. 16099, 16099–109 (2013) (finding that mnemonic information could be decoded even when participants are not instructed to attend to that information).

Rissman et al., supra note 86, at 9852–53.

Melina R. Uncapher et al., Goal-Direct Modulation of Neural Memory Patterns: Implications for fMRI-Based Memory Detection , 35 J. Neurosci. 8531 (2015).

Id. at 8537–39.

Id. at 8545.

Rissman et al., supra note 86, at 9851–52; see also 10 (Figure S5) in Supporting Information for Rissman et al. , PNAS, https://www.pnas.org/content/pnas/suppl/2010/04/30/1001028107.DCSupplemental/pnas.201001028SI.pdf (accessed Apr. 16, 2020) (hereinafter Supporting Information).

The authors discuss in the Supporting Information: ‘Pattern classification analyses were implemented in MATLAB using routines from the Princeton MVPA Toolbox and custom code…. A variety of machine learning algorithms have been successfully used to decode cognitive states from fMRI data. Here, we explored several algorithms, including two-layer back-propagation neural networks, linear support vector machines, and regularized logistic regression. Although all three performed well, we found that RLR generally outperformed the other techniques, if only by a small amount…. Thus, we elected to use RLR for all classification analyses reported in the manuscript’. Id. at 2, 4. The differences between these types of machine learning algorithms matter greatly for how extensively their credibility can be tested—or, conversely, how much like a true ‘black box’ they are such that even the experts employing them are unable to truly explain the analytical steps taken to reach a particular inference. See text accompanying notes 182–205, infra . Researchers do select cutoff thresholds, or decision boundaries to adjust the sensitivity or specificity of labeling examples of Class A or Class B and tolerances for false positives and false negatives.

Rissman et al., supra note 86 at 9853 (‘Although the predictive value of this classification was relatively poor (mean AUC = 0.59), the modest success of this classifier suggests that the neural signatures of true and false recognition are at least sometimes distinguishable’.).

For a review of recent memory work using camera technology to extend research findings of lab-based memories, see Tiffany Chow & Jesse Rissman, Neurocognitive Mechanisms of Real-World Autobiographical Memory Retrieval: Insights from Studies Using Wearable Camera Technology , 1396 Annals N.Y. Acad. Sci. 202 (2017). See also Peggy L. St. Jacques & Felipe De Brigard, Neural Correlates of Autobiographical Memory: Methodological Considerations , in The Wiley Handbook on The Cognitive Neuroscience of Memory, supra note 79, at 265.

Jesse Rissman et al., Decoding fMRI Signatures of Real-world Autobiographical Memory Retrieval , 28 J. Cognitive Neurosci. 604, 606–07 (2016).

Id. at 610–14. See also Chen et al., supra note 23 (reporting different patterns of network activation for recollection of studied pictures versus autobiographical events).

Rissman et al., supra note 96, at 615–17.

F. Milton, Nils Muhlert, C. R. Butler, A. Smith, A. Benattayallah & Adam Z. Zeman, An fMRI study of long-term everyday memory using SenseCam , 19 Memory 733–744 (2011).

Id. at 616–17.

Judith Peth et al., Memory Detection using fMRI—Does the Encoding Context Matter? 113 NeuroImage 164, 165–66 (2015).

Id. at 168–72. The authors also performed univariate analyses, reporting that those results ‘further support the assumption that memory and not deception is the key mechanism for successful detection of information with the CIT…. Taken together, univariate fMRI analyses indicate that the CIT can be primarily used to detect the presence of critical information but does not directly allow for determining the source of knowledge’.

Winograd & Rosenfeld, supra note 63.

See Chow et al., supra note 23, at 121 (‘[i]n some ways, the term “laboratory-based” may be a misnomer, since there is nothing intrinsically special about encoding information while participating in a psychology experiment versus encoding information outside of the lab (ie, in “real life”). Thus, the divergent patterns of brain activation observed during the retrieval of these kinds of memories may be driven to a large degree by differences in the mnemonic processes evoked (eg recognition as based on either contextual recollection or item familiarity), methodology (eg perceptual qualities of the stimuli used to probe memories), or even characteristics of the tested memories themselves (eg personal relevance or temporal remoteness)’.)

Id. at 112.

See Brown & Murphy, supra note 5.

See Maxwell A. Bertolero & Danielle A. Bassett, Deep Neural Networks Carve the Brain at Its Joints , pre-print (revised Sep. 9, 2020), https://arxiv.org/abs/2002.08891v2 .

See eg John B. Meixner & J. Peter Rosenfeld, A Mock Terrorism Application of the P300-Based Concealed Information Test , 48 Psychophysiology 149 (2011) (using the CIT in a mock terrorism scenario to effectively detect criminal information).

Probably not, according to classic findings by Christianson and colleagues that memories for events during increased arousal exhibit a concentration on central details with reduced recall of peripheral details. Sven Åke Christianson et al., Eye Fixations and Memory for Emotional Events , 17 J. Exp. Psychol. 693, 695–700 (1991).

This is the attribute of specificity discussed by Meegan, supra note 14.

See eg Bernstein & Loftus, supra note 29, at 373 (concluding that ‘[i]n essence, all memory is false to some degree. Memory is inherently a reconstructive process, whereby we piece together the past to form a coherent narrative that becomes our autobiography’); see also Demis Hassabis & Eleanor A. Maguire, The Construction System of the Brain, 364 Phil. Transactions Royal Soc’y B 1263 (2009).

See eg Memory and Law, supra note 6.

Hearsay rules are predicated, in part, on the fallibility of a human’s memory. See eg Anders Sandberg et al., The Memory of Jurors: Enhancing Trial Performance, in Memory and Law, supra note 6, at 213; see also Daniel L. Schacter & Elizabeth Loftus, Memory and Law: What Can Cognitive Neuroscience Contribute? , 16 Nature Neurosci. 119 (2013); Joyce W. Lacy & Craig E.L. Stark, The Neuroscience of Memory: Implications for the Courtroom, 14 Nature Rev. Neurosci. 649 (2013).

Fed. R. Evid. 701, Fed. R. Evid. 702, Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).

Fed. R. Evid. 401, Fed. R. Evid. 402.

For example, if the witness were to testify as that ‘it was sunny at the time I saw the accident’, and the corroborating weather report were available that day, there would be no practical need to validate or verify the witness’s memory as to the weather.

Fed. R. Evid. 403.

Of course, there may be cases where such a device would be helpful, such as concealed knowledge cases where detecting a memory would suggest the accuracy of the facts remembered because it would be an incredible coincidence to have formed a false, but subjectively believed, memory about the concealed facts. This, however, is an issue of ‘corroboration’—to be certain they are in such a situation, investigators must already know something about the ground truth, and, again, the person concealing the knowledge they possess is essentially deceiving investigators by declining to disclose it, so we are back in the realm of lie detection. Alternatively, Bayesians would point out that if one of three possibilities—true memory, false memory, and lie—can be eliminated (the lie), it is helpful for a decision maker to eliminate that possibility. Epistemologically, this may improve accuracy in decision-making, but raises the same process concerns as lie detection devices.

We thank Jason Rantanen for the example.

Yuhwa Han, Deception Detection Techniques Using Polygraph in Trials: Current Status and Social Scientific Evidence , 8 Contemp. Readings L. & Soc. J. 115 (2016)

See Fed. R. Evid. 702(a).

The ‘statement’ would be the report generated by the putative memory detection device, assuming such existed. The declarant would be the person running the examination, as a report produced exclusively by machine cannot be hearsay, as a machine cannot be a declarant. But, as seen above, any forensic memory detection test would require specific examiner decision-making and input as to stimuli, task design, and any programming or instructions provided to the examinee and/or device. The forensic memory examiner would certainly be the declarant for any resulting report produced. Some jurisdictions have evaded the ‘machine as declarant’ problem for highly standardized tests by making printouts that record the results of certain mechanical tests (eg blood or breath alcohol) admissible by statute. Technically, such laws create an exception to the hearsay rule, though they often treated as authentication issues. Arguably, whether the person being tested would also be considered a ‘declarant’, and whether such evidence would present double hearsay problems, may depend upon whether the person submitting to the examination does so intending to assert the contents of their memory as verifiable. This would generally be true for a self-interested defendant who solicited a scan to prove an alibi, or a lack of crime-relevant knowledge. But it would arguably not be true for a suspect compelled to submit to a test for which affirmative verbal or behavioral responses were not required, such as in the BEOS paradigm. Such compulsions would raise practical issues, but also legal privacy concerns and potentially constitutional concerns about self-incrimination. See eg Nita A. Farahany, Searching Secrets , 160 Penn. L. Rev. 1239 (2012). See also Bard, supra note 5 (outlining Fourth, Fifth, and Sixth Amendment concerns about neuroimaging-obtained information).

Past statements about a present state of mind or future intention are (perhaps nonsensically) considered an exception to the rule against hearsay precisely because they lack concerns about faulty memory or faulty perception. See eg Fed. R. Evid. 803(3); Mutual Life Insurance Co. v. Hillmon, 145 U.S. 285 (1892); John MacArthur Maguire, The Hillmon Case—Thirty-Three Years After , 38 Harv. L. Rev. 709 (1925).

See supra note 123. Several circuits have held that machine statements are not hearsay. See United States v. Lizarraga-Tirado, 789 F.3d 1107, 1110 (9th Cir. 2015); United States v. Lamons, 532 F.3d 1251, 1263 (11th Cir. 2008); United States v. Moon, 512 F.3d 359, 362 (7th Cir. 2008); United States v. Washington, 498 F.3d 225, 230 (4th Cir. 2007); United States v. Hamilton, 413 F.3d 1138, 1142 (10th Cir. 2005); United States v. Khorozian, 333 F.3d 498, 506 (3d Cir. 2003).

Fed. R. Evid. 705.

See eg Archie Alexander, Functional Magnetic Resonance Imaging Lie Detection: Is a Brainstorm Heading toward the Gatekeeper? 7 Hous. J. Health L. & Pol’y 1 (2007); Benjamin Holley, It’s All in Your Head: Neurotechnological Lie Detection and the Fourth and Fifth Amendments , 28 Dev. Mental Health L. 1 (2009); Leo Kittay, Admissibility of fMRI Lie Detection: The Cultural Bias Against “Mind Reading” Devices , 72 Brook. L. Rev. 1351 (2007); Daniel D. Langleben & Jane Campbell Moriarty, Using Brain Imaging for Lie Detection: Where Science, Law, and Policy Collide, 19 Psychol., Pub. Pol’y, L. 222, 231 (2012) (noting that the ‘most important missing piece in the puzzle is Daubert ’s “known error rate” standard. Determining the error rates for fMRI-based lie detection requires validation of the method in settings convincingly approximating the real-life situations in which legally significant deception takes place, in terms of the risk–benefit ration, relevant demographics, and the prevalence of the behavior in question’.); Joëlle Anne Moreno, The Future of Neuroimaged Lie Detection and the Law , 42 Akron L. rev. 717 (2009) (finding that a fMRI-detected increase in blood flow does not necessarily indicate lying; there are other emotional states that may cause the prefrontal cortex to activate); Jane Campbell Moriarty, Visions of Deception: Neuroimages and the Search for Truth , 42 Akron L. Rev. 739 (2009); Sean A. Spence & Catherine J. Kaylor-Hughes, Looking for Truth and Finding Lies: The Prospects for a Nascent Neuroimaging of Deception , 14 Neurocase 68 (2008); Sean A. Spence, Playing Devil’s Advocate: The Case Against fMRI Lie Detection , 13 Legal & Criminological Psychol. 11 (2008) (finding that most fMRI lie detection research lacks construct and external validity); Zachary E. Shapiro, Note, Truth, Deceit, and Neuroimaging: Can Functional Resonance Imaging Serve as a Technology-Based Method of Lie Detection? , 29 Harv. J. L. & Tech. 527 (2016). But see Daniel D. Langleben et al., Brain Imaging of Deception, in Neuroimaging in Forensic Psychiatry: From the Clinic to the Courtroom (Joseph R. Simpson ed., 2012); Frederick Shauer, Can Bad Science Be Good Evidence—Neuroscience, Lie Detection, and Beyond , 95 Cornell L. Rev. 1191 (2010); see also David H. Kaye et al., How Good is Good Enough: Expert Evidence Under Daubert and Kumho , 50 Case W. Res. L. Rev. 645 (2000).

United States v. Semrau, 693 F.3d 510 (6th Cir. 2012); Memorandum Opinion and Order at 5–6, Maryland v. Smith, No. 106589C, (Montgomery Cty, MD, Oct. 3, 2012); Wilson v. Corestaff Services, L.P., 900 N.Y.S.2d 639 (N.Y. Sup. Ct. 2010).

Meixner, Jr., supra note 6; see also Meixner, supra note 7; Rosenfeld et al., supra note 8.

A slightly different analysis would follow if a testifying expert such as a treating psychiatrist or psychologist claimed to ‘reasonably rely’ upon memory detection technology in developing their overall assessment of a subject’s mental state. Fed. R. Evid. 703. For this route to lead to admissibility, memory detection evidence would likely need to be commonly used by such experts in forming their opinion on the subject. This does not seem to be a ‘back door’ around Daubert , as it is unlikely that expert reliance would be ‘reasonable’ unless the technology were sufficiently reliable for such a purpose. The underlying purpose of FRE 703 is recognition that experts rely on (otherwise inadmissible) hearsay in forming their opinion. The output of the machine is not likely to be considered hearsay. See text accompanying notes 123–124.

Chow et al., supra note 23, at 122 (writing, ‘[w]e strongly caution against the direct translation of our protocol for use as a forensic tool in detecting memories for past events’.)

Schacter & Loftus, supra note 113 at 121 (internal citations omitted).

Stephen J. Anderson et al., Rewriting the Past: Some Factors Affecting the Variability of Personal Memories , 14 Applied Cognitive Psychol. 435 (2000). In both younger and older adults, a second recall of an autobiographical memory produced a version with less than 50 per cent of facts identical and new detail being added. Younger adults, when compared to older adults, showed more variation in content and output order. In addition, more recent memories showed greater variation than older ones. Together, these findings suggest a shift from dynamic reconstruction to a more fixed memory that is reproduced or recalled. Id.

See eg Bernstein & Loftus, supra note 29111, at 372–73 (recounting literature on false memories and noting that the challenge in applying laboratory studies to the real world is that in the real world, ‘we do not know who is telling the truth and who is lying or which memories are true and which are false’. Moreover, research focused on groups of memories, an individual reporting a memory, or a single ‘rich false memory’ does not indicate whether a particular memory is true or false. They conclude: ‘In essence, all memory is false to some degree. Memory is inherently a reconstructive process, whereby we piece together the past to form a coherent narrative that becomes our autobiography’.) See also C.J. Brainerd & V.F. Reyna, The Science of False Memory (Oxford Psychology Series eds., 1st ed. 2005); Deborah Davis & Elizabeth F. Loftus, Inconsistencies Between Law and the Limits of Human Cognition: The Case of Eyewitness Identification , in Memory and Law, supra note 6, at 29; Loftus, supra note 29; for a review of memory research (and demeanor evidence) for a legal audience, see Mark W. Bennett, Unspringing the Witness Memory and Demeanor Trap: What Every Judge and Juror Needs to Know About Cognitive Psychology and Witness Credibility , 64 Am. U. L.R. 1331 (2015).

Daubert, 509 U.S. at 592–93 (1993) (adopting the view of scientific validity based on ‘falsifiability’). Andrea Roth summarizes why surviving a Daubert hearing is necessary but not sufficient for providing adequate scrutiny of the basis for the expert’s testimony: ‘[f]or machines offering “expert” evidence on matters beyond the ken of the jury, lawmakers should clarify and modify existing Daubert and Frye reliability requirements for expert methods to ensure that machine processes are based on reliable methods and implemented in a reliable way. Daubert-Frye hearings are a promising means of excluding the most demonstrably unreliable machine sources, but beyond the obvious cases, these hearings do not offer sufficient scrutiny. Judges generally admit such proof so long as validation studies can demonstrate that the machine’s error rate is low and that the principles underlying its methodology are sound. But validation studies are often conducted under idealized conditions, and it is precisely in cases involving less-than-ideal conditions… that expert systems are most often deployed and merit the most scrutiny. Moreover, machine conveyances are often in the form of predictive scores and match statistics, which are harder to falsify through validation against a known baseline’. Andrea Roth, Machine Testimony , 126 Yale L.J. 1972, 1981–82. (2017).

Roth, supra note 135, at 2033–34.

Id. at 2034.

Id. (citing Christopher D. Steele & David J. Balding, Statistical Evaluation of Forensic DNA Profile Evidence , 1 Ann. Rev. Stat. And Its Application 361, 380 (2014)). Roth criticizes the admission of two expert, proprietary DNA systems that have come to different conclusions in a single case, arguing that the basic Daubert and Frye reliability tests ‘unless modified to more robustly scrutinize the software, simply do not—on their own—offer the jury enough context to choose the more credible system’. Id. at 2035.

See text accompanying notes 60–72, supra .

See Roth, supra note 135, at 2035 (writing about competing versions of DNA analysis software: ‘These basic reliability tests, unless modified to more robustly scrutinize the software, simply do not—on their own—off the jury enough context to choose the more credible system’.)

Rock v. Arkansas, 483 U.S. 44 (1987); Chambers v. Mississippi, 410 U.S. 284 (1973).

See eg United States v. Scheffer, 523 U.S. 303 (1998) (holding that a defendant does not have a right to present polygraph evidence). Justice Thomas, in concurrence, wrote that a ‘fundamental premise of our criminal trial system is that “the jury is the lie detector.” Determining the weight and credibility of witness testimony, therefore, has long been held to be the “part of every case [that] belongs to the jury, who are presumed to be fitted for it by their natural intelligence and their practical knowledge of men and the ways of men””. Id. at 313 (Thomas, J. concurring) (in this part of the concurrence, Thomas was joined by Rehnquist, Scalia, and Souter); see also Fisher,, at 571 (‘it is likely that Thomas’s words do represent a majority sentiment among judges and practitioners’); see also Wilson, 900 N.Y.S.2d at 640 (rejecting expert testimony on fMRI based lie detection because ‘credibility is matter solely for the jury’ and the expert testimony impinged upon the credibility of the witness); see also State v. Porter, 241 Conn. 57, 118 (1997) (‘[T]he importance of maintaining the role of the jury … justifies the continued exclusion of polygraph evidence … [P]olygraph evidence so directly abrogates the jury’s function that its admission is offensive to our tradition of trial by jury’.); Aetna Life Insurance Co. v. Ward, 147 U.S. 76 (1981); United States v. Barnard, 490 F.2d. 790 (9th Cir. 1973); State v. Williams, 388 A.2d 500, 502–03 (Me. 1978) (noting that ‘[l]ie detector evidence directly and pervasively impinges upon that function which is so uniquely the prerogative of the jury as fact-finder: to decide the credibility of witnesses. The admissibility of lie detector evidence therefore poses the serious danger that a mechanical device, rather than the judgment of the jury, will decide the credibility’.)

Scheffer, 523 U.S. at 309–10.

See Daniel D. Langleben et al., Polygraphy and Functional Magnetic Resonance Imaging in Lie Detection: A Controlled Blind Comparison Using the Concealed Information Test , 77 J. Clinical Psychol. 1372 (2016) (a blind, controlled, within-subjects study comparing the accuracy of fMRI and polygraphy in the detection of intentionally concealed information, wherein subjects were questioned about a number they selected using the CIT paradigm and experts made determinations of the concealed information using polygraphy and fMRI data, reporting that fMRI experts were 24 per cent more likely to detect the concealed number than polygraphy experts).

While in principle no overt response is required, as in the BEOS procedure, more research would be needed to examine the efficacy of a purely passive viewing (or listening) task procedure. Most of the brain-based memory detection studies published to date have asked participants to make a button-press response to each stimulus.

See Rock, 483 U.S. at 61 (vacating an Arkansas rule that categorically excluded hypnotically refreshed testimony because it prevented a criminal defendant from testifying on her own behalf. Chambers, 410 U.S. at 297–98 (rejecting Mississippi’s argument that its ‘voucher rule’ did not violate Confrontation Clause because the petitioner was unable to cross-examine and establish a defense on a witness who accused him of the crime); Marc Jonathan Blitz, Searching Minds by Scanning Brains: Neuroscience Technology and Constitutional Privacy Protection (2017); Kiel Brennan-Marquez, A Modest Defense of Mind Reading , 15 Yale J. L. & Tech. 214, 218 (2013) (arguing that a perfected mind reading device would not result in “testimonial” evidence under the Fifth Amendment); Nita A. Farahany, Incriminating Thoughts , 64 Stan. L. Rev. 351 (2012) (recommending that society adopts protections to safeguard liberties as neuroscience advances); Farahany, supra note 123 (applying Fourth Amendment concerns to evidence); Matthew B. Holloway, Note, One Image, One Thousand Incriminating Words: Images of Brain Activity and the Privilege Against Self-Incrimination , 27 Temp. J. Sci. Tech. & Envl. L. 141, 144 (2008) (arguing that the Fifth Amendment protects suspects from being compelled to undergo brain scans to produce fMRI data).

In terms of pragmatic analysis, the cost of fMRI scanning is relevant. A typical scanning session lasts 1 hour, and most university imaging centers charge $400–600 per hour. It is unclear whether prosecutors and defense lawyers would contract with universities to buy time on the scanner (and presumably hire an fMRI expert to run the scan session and analyze the data), or whether there would be dedicated state forensic labs or even for-profit companies with MRI scanners to meet the demand. The high cost would mean that public defenders would not likely be able to routinely access fMRI memory detection scans for their clients, but wealthier defendants may have access to such experts and technology.

Credibility, Black’s Law Dictionary (10 th ed. 2014) (defined as ‘worthiness of belief’).

Julia Simon-Kerr makes a compelling case that a witness’s credibility, or worthiness of belief, is a social construct based on ‘his or her culturally-recognized moral integrity or honor. In other words, people are worthy of belief because they comply with norms of worthiness’. Julia Simon-Kerr, Credibility by Proxy , 85 Geo. Wash. L.R. 152, 155 (2017). The definition of credibility we employ in assessing ‘machine credibility’ is not this socially-constructed understanding of credibility, which could only attach to witnesses and not to machine output, but rather that something is worthy of being believed because it is what it purports to be—that is, it is close to a veridical fact.

See eg Edmund Morgan, Hearsay Dangers and the Application of the Hearsay Concept , 62 Harv. L. Rev. 177 (1948)

Roth, supra note 135, at 1979 (discussing certain machine conveyances as ‘credibility-dependent proof’).

Id. at 1977–78. See discussion at notes 182–206, infra.

Id. at 1981–82. See discussion at notes 182–206, infra .

George Fisher, The Jury’s Rise as Lie Detector , 107 Yale L.J. 575 (1997).

As an example, consider this recent set of events recounted in the New York Times. See Jim Dwyer, Witness Accounts in Midtown Attack Show the Power of False Memory , N.Y. Times, May 14, 2015 (reporting on two witnesses, O’Grady and Khalsa, who independently witnessed a police officer shoot a man in the middle of Eighth Avenue in Manhattan. Immediately afterwards, O’Grady reported to a Times reporter that the wounded man was in flight from the officers when shot. Nearly contemporaneously, Khalsa reported to the Times newsroom that the wounded man was handcuffed when shot. The police released a surveillance videotape 5 hours after the shooting, showing that both were wrong: the wounded man had been chasing an officer onto Eighth Avenue, and was shot by that officer’s partner from behind. Instead of being handcuffed, he was shot while freely swinging a hammer, then lying on the ground with his arms at his side, before being handcuffed. As the paper reported, ‘There is no evidence that the mistaken accounts of either person were malicious or intentionally false’.)

See text accompanying notes 132–135, supra .

See eg Bernstein & Loftus, supra note 29111, at 372–73. See also Davis & Loftus, supra note 133, at 29; Brainerd & Reyna, supra note 134; Loftus, supra note 134; for a review of memory research (and demeanor evidence) for a legal audience, see Bennett, supra note 134.

Meixner, supra note 7, at 1454. Rosenfeld also advocates that researchers in the field make this distinction for purposes of advancing admissibility goals: ‘If courts continue to consider credibility assessment evidence inadmissible based the [sic] “jury as the lie detector” standard, the CIT [concealed information test]’s admissibility rests on being able to show the judge that the CIT does not assess whether the witness is telling the truth, but only what he recognizes. In the past, loaded terms like “lie detection” have been features in the titles of CIT experiments. This type of language is likely to mislead courts in the future, and the field would do well to draw a distinction between lie detection and memory detection, as Verschuere et al. do in their book. Bruno Verschuere et al., Memory Detection: Theory and Application of the Concealed Information Test, supra note 6.

Meixner is not alone in this elision. For example, Meegan’s 2008 review of neuroimaging techniques for memory detection argued that ‘[s]ome have erroneously implied that memory detection tests are actually lie detection tests. As should be clear from the previous review, priming effects, old/new effects, and P300 effects measure recognition rather than deception. Moreover, they can (and should) be measured without dishonest responding’. Meegan, supra note 14, at 18. See also Bennett, supra note 134.

Meixner, supra note 7, at 1454; see discussion surrounding Scheffer , supra note 142. Five justices disagreed with Thomas. In concurrence, Kennedy wrote that ‘the principal opinion overreaches when it rests its holding on the additional ground that the jury’s rule in making credibility determinations is diminished when it hears polygraph evidence. I am in substantial agreement with Justice Stevens’ [dissenting] observation that the argument demeans and mistakes the role and competence of jurors in deciding the factual question of guilt or innocence’. Scheffer, 523 U.S. at 318. Kennedy points out that the rule against permitting the jury to hear ‘a conclusion about the ultimate issue in the trial’ was loosened with Federal Rule of Evidence 704(a). Id. at 319.

John H. Wigmore, Wigmore on Evidence § 875 (2d ed. 1923).

Scheffer, 523 U.S. at 313 (internal citations omitted) (in this part of the opinion, Thomas was joined by Rehnquist, Scalia, and Souter).

See discussion in note 142.

Fisher, supra note 154.

Id. at 578–79 (writing, ‘[A]lthough the jury does not guarantee accurate lie detecting, it does detect lies in a way that appears accurate, or at least in a way that hides the course of any inaccuracy from the public’s gaze. By permitting the jury to resolve credibility conflicts in the black box of the jury room, the criminal justice system can present to the public an “answer”—a single verdict of guilty or not guilty—that resolves all questions of credibility in a way that is largely immune from challenge or review. By making the jury its lie detector, the system protects its own legitimacy’.)

Fed. R. Evid 404(b); Fed. R. Evid 413–415.

Meixner, supra note 7, at 1462.

Id. at 1473 (noting ‘[e]ven the most optimistic studies that are remotely applicable to the courtroom suggest that individual are just over 60 per cent accurate in credibility assessments. Yet courts continue to hold onto the shaky assumption that the jury is capable of being the sole assessor of credibility. This unsophisticated notion should be put to rest… and the only factor concerning the admissibility of expert testimony related to credibility should be its reliability under Daubert or Frye ’.); see also Max Minzner, Detecting Lies Using Demeanor, Bias and Context, 29 Cardozo L. Rev. 2557, 2558 (2008) (writing that ‘[j]udges have generally assumed juries make accurate credibility decisions and believe demeanor is the mechanism for deciding whether a witness is telling the truth…. Starting in the early 1990’s, though, legal academics broke from this consensus based on a series of social science studies demonstrating that the test subjects in laboratory experiments correctly determined when a person was lying only slightly more than half the time’.)

Roth, supra note 135, at 1990.

Id. at 1998. (‘Most “expert systems”—programs rending complex analysis based on information fed to it by humans—require inputters to provide case-specific information, and those types of machines might misanalyze events or conditions if fed the wrong inputs…. The potential for error stemming from expert systems’ reliance on the assertions of human inputters is analogous to the potential for error from human experts’ reliance on the assertions of others’.)

Id. at 1992.

See eg Supporting Information, supra note 92, at 4–5.

See Roth, supra note 135, at 1992 (describing a hypothetical situation where an eyewitness states they are ‘damn sure’ that a particular suspect robbed them; if cross-examined in court, the eyewitness would clarify that ‘damn sure’ to them means a certainty of 80 per cent. Without that testimony, a fact finder might associate the term with a higher level of subjective certainty. The same defect lies in machine conveyances: if a supercomputer said it was ‘most likely’ that a death occurred from a particular condition, such a term can create different inferences on the supercomputer’s certainty.); Lawrence H. Tribe, Trial By Mathematics: Precision and Ritual in the Legal Process , 84 Harv. L. Rev. 1329, 1330 (1971) (writing that trying to reconcile statistical technology in fact finding processes may distort or destroy ‘important values which that society means to express or to pursue through the conduct of legal trials’.)

Anders Elkund et al., Cluster Failure: Why fMRI Inferences for Spatial Extent Have Inflated False-Positive Rates, 113 Proceedings Nat’l Acad. Sci. 7900, 7900 (2016) (note that the online version was corrected in an erratum, found in 113 Proceedings Nat’l Acad. Sci. at E4929 (Aug. 16, 2016)). This report led to headlines in the popular press. See eg Bec Crew, A Bug in fMRI Software Could Invalidate 15 Years of Brain Research, ScienceAlert (July 6, 2016); Kate Murphy, Do You Believe in God or is that a Software Glitch? , N.Y. Times (Aug. 27, 2016), Simon Oxenham, Thousands of fMRI Brain Studies in Doubt Due to Software Flaws , New Scientist (Jul. 18, 2016).

See Emery N. Brown & Marlene Berhmann, Letter, Controversy in Statistical Analysis of Functional Magnetic Resonance Imaging Data, 114 Proc. Nat’l Acad. Sci. E3368 (2017). Functional brain imaging analysis is inherently multidisciplinary. The recent report of the NIH Brain Initiative, BRAIN 2025 , recognizes this and ‘recommends fostering interdisciplinary collaborations among neuroscientists, physicists, engineers, statisticians, and mathematicians to properly collect, analyze, and interpret the data that result from the development of new neuroscience tools’, including fMRI and EEG. Id. at E3369 (citing BRAIN Working Group, supra note 76).

Roth, supra note 135, at 1982.

Fed. R. Evid. 403; see Brown & Murphy, supra note 5.

See eg Darby Aono, Gideon Yaffe & Hedy Kober, Neuroscientific Evidence in the Courtroom: A Review , 4 Cogn. Research: Principles and Implications 40 (2019); Nicholas Scurich, What Do Experimental Simulations Tell Us About the Effect of Neuro/genetic Evidence on Jurors?, 5 J.L. & Biosci. 204 (2018); Shen et al., supra note 5 at 332 (‘Across nearly 30 previous studies, including over 50 unique experiments, the only result researchers can agree upon is that there are “conflicting results”’.); Denise A. Baker et al., Making Sense of Research on the Neuroimage Bias , 26 Pub. Understanding Sci. 251, 251, 258 (2015) (reviewing findings on ‘all sides of the neuroimage bias question’ and concluding that ‘when neuroimages do sway judgments, it is only under specific conditions that are not yet well understood; as such, an overarching theory is still out of reach’.); Cayce J Hook & Martha J. Farah, Look Again: Effects of Brain Images and Mind-brain Dualism on Lay Evaluations of research , 25 J. Cog. Neurosci. 1397–1405 (2013); Nicholas Schweitzer et al., Neuroimages As Evidence in a Mens Rea Defense: No Impact , 17 Psychol., Pub. Pol’y, & L. 357 (2011).

See eg Shari Seidman Diamond, How Jurors Deal with Expert Testimony and How Judges Can Help , 16 J.L. & Pol’y 47 (2007); David H. Kaye & Jonathan J. Koehler, Can Jurors Understand Probabilistic Evidence? 154 J. Royal. Statistical Soc. A 75 (1991).

See Daniela Schiller & Elizabeth A. Phelps, Does Reconsolidation Occur in Humans? , 5 Frontiers Behav. Neurosci., Article 24 (May 2011), at 1.

See Iona D. Scully et al., Does Reactivation Trigger Episodic Memory Change? A Meta-Analysis , 142 Neurobiology Learning and Memory 99 (2017).

Roth, supra note 135, at 1977, 1983. Credibility testing is the next step after an admissibility decision: ‘The purpose of credibility-testing mechanisms is not primarily to exclude unreliable evidence, but to give jurors the context they need to assess the reliability of evidence and come to the best decision’. Id. at 2023.

Id. at 1977.

Id. at 1988–89.

See text accompanying notes 106–111, supra .

Roth, supra note 135, at 1977–78.

See text accompanying notes 75–79, supra .

Roth, supra note 135, at 1991.

Id. at 2026.

Jenna Burrell, How the Machine ‘Thinks’: Understanding Opacity in Machine Learning Algorithms , Big Data & Society (Jan.-Jun. 2016), at 1–2. The two most relevant types of opacity here are ‘opacity stemming from the current state of affairs where writing (and reading) code is a specialist skill’, and ‘an opacity that stems from the mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of human-scale reasoning and styles of semantic interpretation’. Id. at 2.

Supporting Information, supra note 92, at 4. RLR was used in subsequent studies from the Rissman group, supra notes 89, 96, and 104.

Choice of machine learning algorithm for assessment of specific brain functions is an evolving area of study. A recent preprint suggests that deep neural networks outperform linear regression when applied to regional level connectivity, but linear regression is more accurate when applied to system-level brain connectivity. See Bertolero & Bassett (preprint), supra note 107. The work by Bertolero & Bassett suggests an even more complex way to apply machine-learning algorithms to memory tasks of recall and recognition, with the potential to understand regional and system-level contributions to discrete subfunctions of autobiographical memory.

Peth et al., supra note 101, at 167.

Zhi Yang et al., Using fMRI to Decode True Thoughts Independent of Intention to Conceal , 99 NeuroImage 80, 82 (2014).

Burrell, supra note 190, at 5.

See Rissman et al., supra note 86, at 9853 (using the classifier’s ‘most “confident” predictions’.)

Burrell, supra note 190, at 5–8.

See Rissman et al., supra note 86; Uncapher et al., supra note 89; Rissman et al., supra note 96; and Chow et al., supra note 104; though for some kinds of classifications, the importance maps may be uninterpretable to the extent they do not correspond with brain areas known from other work to be characteristically involved in certain cognitive processes. Even for linear classifiers, importance values suffer from interpretability issues. See Pamela K. Douglas & Ariana Anderson, Feature Fallacy: Complications with Interpreting Linear Decoding Weights in fMRI, in Explainable AI: Interpreting, Explaining, and Visualizing Deep Learning 353–-78 (Cham, ed., 2019).

Burrell, supra note 190, at 5–7 (giving an example of opacity in a neural network classifier, which is applied to problems ‘for which encoding an explicit logic of decision-making functions very poorly’. In contrast, a support vector machine is essentially a form of linear regression, and shares features closer to how humans reason.)

Id. at 9 (noting that one ‘approach to building more interpretable classifiers is to implement an end-user facing component to provide not only the classification outcome, but also exposing some of the logic of this classification’.).

For example, courts could attempt to explain to jurors the core details of machine-learning based conclusions in a similar fashion to how eyewitness testimony is currently explained. See discussion on the New Jersey Supreme Court’s attempt at doing so, supra note 13.

Roth, supra note 135, at 2027–29.

Elizabeth A. Holm, In Defense of the Black Box , 364 Sci. 26 (Apr. 2019).

See Curtis Karnow, The Opinion of Machines, 19 Colum. Sci. & Tech. L.R. 136, 166–70 (2017).

See text accompanying notes 60–72, supra.

Decisional accuracy is the touchstone for Meixner and for the coherence of Roth’s taxonomy of machine credibility. Although it is not the only value in a jury system, it is probably the most important, as inaccurate verdicts, or the widespread perception of tolerance of inaccurate verdicts, risks undermining the legitimacy of the system. But elsewhere Roth writes about the pressure that automation of trial-relevant decision-making puts on the ‘soft’ systemic values of ‘dignity, equity, and mercy’. Andrea Roth, Trial by Machine , 104 Geo L.J. 1245, 1282–90 (2016).

Fisher, supra note 154, at 705 (writing that juries inherently are a ‘reliable source of systemic legitimacy. …. Moreover, whether by tradition or conscious design, the jury’s verdict has been largely impenetrable. There never has been a mechanism by which the defendant or anyone outside the system could command the jury to reveal its decision making processes. The jury’s secrecy is an aid to legitimacy, for the privacy of the jury box shrouds the shortcomings of its methods’.)

State v. Lyon, 744 P.2d 231, 238 (Or. 1987) (Linde, J., concurring).

Kiel Brennan-Marquez, “Plausible Cause”: Explanatory Standards in the Age of Powerful Machines , 70 Vand. L. Rev. 1249 (2017); see also Jennifer L. Mnookin, Repeat Play Evidence: Jack Weinstein, “Pedagogical Devices”, Technology, and Evidence , 64 DePaul L. Rev. 571, 572, 577–78 (2015) (proposing that computer animations/simulations be ‘cross-examined’ prior to admissibility decisions to permit testing of alternative assumptions).

James R. McCall, The Personhood Argument Against Polygraph Evidence, Or “Even If the Polygraph Really Works, Will Courts Admit the Results?” , 49 Hastings L.J. 925, 942 (1998).

Using Fourth Amendment ‘suspicion decisions’ and policing as an object of study, he essentially argues that judges, ‘as supervisors of state power’, require the context-sensitivity that is enabled by explanations to make their decisions. Brennan-Marquez, supra note 209 at 1256. This is the fundamental reason to resist automation (that is, the machine-learning enablement) of suspicion decisions, even if they are statistically more accurate. Brennan-Marquez makes a more general point that ‘[e]xplanations matter—and explanatory standards ought to be preserved in the age of powerful machines –because they enable consideration of two sets of values beyond accuracy’. The first is constitutional constraints, and the second is due process, ‘at some level suffused throughout legal decision making, which separates lawful uses of state power from ultra vires conduct…. Accuracy is not the-all and end-all of sound decision making. This does not mean that accuracy is irrelevant. It is certainly a value we care about. But it is not the only value that we care about. Other values matter. And explanatory standards allow conflict between divergent values to be managed’. Id. at 1280–81.

Fed. R. Evid 606(b).

Andrew Selbst & Solon Barocas, The Intuitive Appeal of Explainable Machines , 87 Fordham L. Rev. 1085, 1117–18 (2018).

Selbst & Barocas, supra note 213, at 1118–19 (citing Tom R. Tyler, What Is Procedural Justice? Criteria Used by Citizens to Assess the Fairness of Procedures , 22 Law & Soc’y Rev. 103 (1988)); Tom R. Tyler, Procedural Justice, Legitimacy, and the Effective Rule of Law , 30 Crime & Just. 283 (2003); Tom R. Tyler, Why People Obey the Law (revised ed. 2006).

Tyler , supra note 215.

Cf. McCall, supra note 210, at 943.

For example, dementia affects an individual’s perspective on their own personal dignity. See How Dementia Affects Personal Dignity: A Qualitative Study on the Perspective of Individuals With Mild to Moderate Dementia , 71 J. Gerontology 491 (2016).

Simon-Kerr, supra note 149, at 161–66.

See Id. at 186 (nothing that the ‘link between credibility, reputation, and criminality drawn in today’s impeachment rules thus continues to reflect the notion that the indicia of being a bad person, however defined, is also the indicia of a liar’.)

Id . at 189–91.

Id. at 158.

Justice Linde wrote, ‘One of these implicit values surely is to see that parties and the witnesses are treated as persons to be believed or disbelieved by their peers rather than as electrochemical systems to be certified as truthful or mendacious by a machine…. A machine improved to detect uncertainty, faulty memory, or an overactive imagination would leave little need for live testimony and cross-examination before a human tribunal; an affidavit with the certificate of a polygraph operator attached would suffice. There would be no point to solemn oaths under threat of punishment for perjury if belief is placed not in the witness but in the machine’. Lyon, 744 P.2d at 240 (Linde, J. concurring).

Julie Seaman, Black Boxes: fMRI Lie Detection and the Role of the Jury , 42 Akron L. Rev. 931, 936–37 (2009).

Giridharadas, supra note 2.

Documents received from C.R. Mukundan (2009) (on file with ERDM) indicate that the Forensic Sciences Laboratory (FSL) of Gandhinagar (in Gujrat) conducted a total of 174 BEOS tests between Sept. 2009 and Dec. 2009. The FSL of Maharashtra (in Mumbai) conducted a total of 108 BEOS tests between 2007 and May 2009.

Saini, supra note 3.

Interview with C.R. Mukundan, in Bangalore, India (Aug. 11–12, 2009) (transcript and notes on file with ERDM). See C.R. Mukundan, Brain Experience: Neuroexperiential Perspectives of Brain-Mind 190 (2007). Mukundan writes, ‘Autobiographic remembrance is a recall of experiences, which may be composed of awareness of experiences consisting of sensations, proprioceptive sensations actions, emotions, and visual and other forms mental imageries’.) Id. at 202. See also D.A. Puranik et al., Brain Signature Profiling in India: It[]s Status as an Aid in Investigation and as Corroborative Evidence—As Seen From Judgments , in Proceedings of All India Forensic Sci. Conference 815 (2009) (paper presented among others during a conference on Nov. 15–17, 2009, in Jaipur, India); see also Saini, supra note 3.

Axxonet has funded the research and development into BEOS. Interviews with C.R. Mukundan and colleagues in in Bangalore , India (Aug. 11–12, 2009) (transcript and notes on file with ERDM); see also Dr Mukundan’s work on the underlying psychological theories of BEOS, detailed in C.R. Mukundan, Neural Correlates of Experience , in Health Psychology 46 (S. Agarwala et al. eds., 2009), and his 2007 book, Brain Experience, supra note 228.

With respect to the P300 related techniques, BEOS was and is not the only brain-based forensic memory detection technology in use in India. Variously called ‘brain mapping’, ‘brain fingerprinting’, and the ‘Brain Electrical Activation Profile (BEAP)’, the other version of brain-based tests derives from work by Lawrence Farwell on P300 event-related potentials. Interviews with Dr M.S. Rao, Director and Chief Forensic Scientist, Directorate of Forensic Sciences, in New Delhi, India (July 30, 2009) (notes on file with ERDM). Documents received from C.R. Mukundan (2009) (on file with ERDM) indicate that the Forensic Sciences Laboratory (FSL) of Karnataka (in Bangalore) conducted 1131 ‘brain mapping’ or ‘P300’ tests between 2000 and 2009. According to a May 13, 2008 letter from India’s chief forensic scientist, Dr M.S. Rao, to the director of the National Institute of Mental Health and Neuro-Sciences, the Karnataka lab refused to participate in meetings with other FSL directors from other states, ‘stating that their findings are still being evaluated in Germany and U.S.’ Dr Rao also mentioned in that letter that the scientific review committee was ‘not at all satisfied by the available P300 technique at Karnataka’.

M. Raghava, Director of Forensic Sciences Not to Accept Panel’s Findings on Brain Mapping, Hindu, https://www.thehindu.com/todays-paper/tp-national/tp-karnataka/Directorate-of-Forensic-Sciences-not-to-accept-panelrsquos-findings-on-brain-mapping/article15298701.ece (accessed Mar. 7, 2020). See also M. Raghava, Stop Using Brain Mapping for Investigation and as Evidence , Hindu, https://www.thehindu.com/todays-paper/lsquoStop-using-brain-mapping-for-investigation-and-as-evidencersquo/article15297427.ece (accessed Mar. 7, 2020). The panel’s conclusions, as reported in The Hindu, roughly followed a Daubert -like analysis, in that ‘the concept on which it was based did not have the support of the scientific community’, ‘rigorous research needed to be done in the area of cognitive processes’, ‘the relevance of these procedures for an Indian setting (for example, the influence of various languages) needs to be established’, ‘recording procedures [need to] satisfy optimal standards’, ‘experiments needed to be carried out in standardised [sic] laboratories satisfying established guidelines’, and ‘the operational procedures needed to be uniform across various laboratories, and the explicit criteria for interpretation and report need to be established with valid scientific basis’.

Id.; see also Letter from Dr. M.S. Rao to Dr. Nagaraia (May 13, 2008) (on file with ERDM).

Giridharadas, supra note 2. The New York Times article quoted J. Peter Rosenfeld, a psychologist and neuroscientist whose work is extensively reviewed in Part II, as saying ‘Technologies which are neither seriously peer-reviewed nor independently replicated are not, in my opinion, credible. The fact that an advanced and sophisticated democratic society such as India would actually convict persons based on an unproven technology is even more incredible’. Id. Neuroscientist Michael Gazzaniga reviewed ‘promotional dossier’ about BEOS and said, ‘Well, the experts all agree. This work is shaky at best’. Id.

Discussed supra note 230.

Selvi v. State of Karnataka (2010) AIR 2010 SC 1974 (India). The Court held that such tests could not be involuntarily administered because of the right against self-incrimination in the Indian constitution and concerns about the reliability of the results of compulsorily administered tests. The Court also held that results obtained through the BEAP test, though no overt response is required from subjects, be treated as testimonial evidence, and thus fall within the scope of constitutional Article 20(3)‘s provision against self-incrimination. ‘[T]he compulsory administration of the impugned tests impedes the subject’s right to choose between remaining silent and offering substantive information’. Selvi at paragraphs 161, 221, 223. Involuntarily administered test results are inadmissible, as are the evidentiary fruits of mere investigatory, but compulsory, use. Id . Further, the Court held that test results of voluntary examinations ‘cannot be admitted as evidence because the subject does not exercise conscious control over the responses during the administration of the test’, but evidentiary fruits of such voluntary tests may be admissible. Id. at 223. BEOS was not specifically considered in the Selvi case, but the description and reasoning about the BEAP makes it likely that the holding would squarely apply to BEOS.

See Naveen Kumar to Undergo Neuro-Psychological Tests, Hindu, https://www.thehindu.com/todays-paper/tp-national/tp-karnataka/gauri-murder-naveen-kumar-to-undergo-neuropsychological-tests/article23552024.ece (accessed Mar. 7, 2020) (reporting that a defendant arrested for his participation in a murder had retracted his alleged consent to undergo ‘polygraph and brain electrical oscillations signature profiling (brain mapping)’ at the Forensic Sciences Laboratory in Gandhinagar. In his retraction, he alleged that ‘police had coerced him to give his consent to the magistrate with the promise that they would help him get bail’.). See also Teenager gets 20 Years in Jail for Rape, Murder of a Minor, Hindu, https://www.thehindu.com/todays-paper/tp-national/teenager-gets-20-years-in-jail-for-rape-murder-of-minor/article24990618.ece (accessed Mar. 7, 2020) (quoting a state police spokesman as saying ‘It was first-of-its kind case in which [the state of] Haryana Police had got narco analysis test, polygraph and brain mapping test of the accused done from Forensic Sciences Laboratory, Gandhinagar in Gujarat’.) Gujarat is a different Indian state and one of the few that had BEOS laboratories set up, see discussion supra note 230.

Jinee Lokaneeta, Creating a Flawed Art of Government: Legal Discourses on Lie Detectors, Brain Scanning, and Narcoanalysis in India , 14 Law, Culture, and Hum. (2014) at 13.

Jinee Lokaneeta, Why Narco, Brain Scan & Lie Test Should Be Junked , Times India, New Delhi, https://timesofindia.indiatimes.com/city/delhi/why-narco-brain-scan-lie-test-should-be-junked/articleshow/61211439.cms (accessed Mar. 7, 2020). See also Id. at 2 (reporting on the emergence of brain scanning and ‘narcoanalysis’ in India ‘in a context where physical and mental torture is normalized, and more than a thousand custodial deaths occur each year, despite a strong formal regime of laws and powerful judicial pronouncements’.).

Interview with C.R. Mukundan, in Bangalore, India (Aug. 11, 2009). See also Lokaneeta, supra note 237, at 12 (quoting a former Director of Forensic Science Laboratories in Karnataka: ‘When the public and human rights activists protest that investigating agencies adopt “third degree” methods to extract information from the accused, it is time the agencies took recourse to the scientific methods of investigation described above’.).

See Lokaneeta, supra note 237, at 14–15 (‘[T]he mere invocation of the term science still appears to be adequate to authorize the techniques. This may be the case because faith in scientific techniques as substitutions for torture has been a recurring feature even in the past. As Justice Katju, a Supreme Court justice wrote: “In western countries scientific methods of investigation are used… Hence, in western countries torture is not formally used during investigation and the correct facts can usually be ascertained without resorting to torture”’..

For example, Rosenfeld’s CTP was attempted in a mock terrorist scenario. See Meixner & Rosenfeld, supra note 108. The challenge for administering a memory detection protocol in a situation to prevent a terrorist attack is, of course, the uncertain nature of the information available to the investigators. Where might an attack take place? A list of cities is a starting point, but what about the myriad number of targets in a given city? Perhaps a target could be narrowed to an airplane, but which flight? How does one choose stimuli to narrow a range of dates or times in stimulus design?

The rules governing the admissibility of evidence are, of course, designed to keep unreliable, un-assessable, and unduly prejudicial evidence out of the fact finder’s purview. Wigmore on Evidence §§ 4b, 9, supra note 161. Whether judges are less susceptible than juries to being unduly influenced by such evidence is an empirical question. Research on whether judges are able to disregard inadmissible evidence has some bearing on this issue. It is a reasonable hypothesis that judges might be better than jurors at ‘compartmentaliz[ing] admissible evidence from inadmissible evidence’ to influence their decision, for several reasons: better education, superior abilities to perform a difficult cognitive task, legal training and understanding of the purpose of exclusionary rules, and substantial experience making legal decisions. See Andrew J. Wistrich, Chris Guthrie & Jeffrey J. Rachlinski, Can Judges Ignore Inadmissible Information? The Difficulty of Deliberately Disregarding , 153 Univ. Penn. L.R. 1251, 1277. See also Id. at 1256, note 21–22 (2005), collecting courts and commentators who have argued that judges are better able than jurors to ignore inadmissible evidence, and noting that ‘[j]udges themselves often apply evidentiary rules more loosely in bench trials than in jury trials on the theory that “the judge, a professional experienced in evaluating evidence, may more readily be relied upon to sift and to weigh critically evidence which we fear to entrust to a jury”’. (internal citation omitted). But other courts and commentators are skeptical that judges are better than jurors at disregarding inadmissible evidence, Id. at 1257, and ‘[s]till others assert that judges can disregard inadmissible information in some circumstances, but not in others’. Id . at 1258. In psychological experiments simulating different types of judicial decisions, the authors found mixed results: ‘some types of highly relevant, but inadmissible, evidence influenced the judges’ decisions. We also found, however, that the judges were able to resist the influence of such information in at least some cases, namely those directly implicating constitutional rights’. Id. at 1259. Unsurprisingly, the empirical data on how mock juries react to instructions to deliberately disregard inadmissible evidence is also full of divergent outcomes. Id. at 1270–75.

Lyon, 744 P.2d at 240 (Linde, J., concurring).

An interdisciplinary research team based in New Zealand has completed a pilot phase of testing ‘forensic brainwave analysis technology’, attempting to replicate and extend the work of Lawrence Farwell and Peter Rosenfeld. See Robin Palmer, Time to Take Brain-Fingerprinting Seriously? A Consideration of International Developments in Forensic Brain Wave Analysis (FBA), In the Context of the Need for Independent Verification of FBA’s Scientific Validity, and the Potential Legal Implications of its Use in New Zealand , Te Wharenga—N.Z. Crim. L. Rev., 330 (2018). The pilot project was funded by the New Zealand Law Foundation, which announced a terminal funding round of June 2020 for all projects. See The Law Foundation , New Zealand Law Foundation ,   https://www.lawfoundation.org.nz (last visited Apr. 17, 2020, 10:08 AM). In addition to the pilot study, researchers are focusing on the ‘legal, ethical and cultural impacts of FBA testing is a crucial corollary to the attempted scientific validation of the science underpinning forensic brainwave analysis. This is because legal challenges to the admissibility in court of FBA evidence will not be confined to attacks on FBA’s scientific reliability and accuracy: admissibility challenges based on alleged rights violations flowing from the use of FBA technology at both investigation and trial stages are just as likely’. Palmer, supra note 245, at 355.

Two examples will suffice. First, while polygraphs are excluded as evidence in most jurisdictions (absent stipulation), they are routinely used in investigations. See Nat’l Research Council, supra note 24, at 3. Second, roadside drug tests are inadmissible in nearly every jurisdiction, yet are routinely used in the field and accepted as the only evidence in plea deals by prosecutors and judges in numerous jurisdictions. See R. Gabrielson & T. Sanders, How a $2 Roadside Drug Test Sends Innocent People to Jail , N.Y. Times Mag. at 9 (joint investigative report with Pro Publica), https://www.nytimes.com/2016/07/10/magazine/how-a-2-roadside-drug-test-sends-innocent-people-to-jail.html (accessed Apr. 17, 2020).

In Japan, a physiological (but not brain-based) version of the CIT is used in both investigations and courtrooms. See Akemi Osugi, Daily Application of the Concealed Information Test, in Memory Detection: Theory and Application of the Concealed Information Test, supra note 6, at 253–75; see also Izumi Matsuda et al., Broadening the Use of the Concealed Information Test in the Field , 10 Frontiers Psychiatry, Article 24 (Feb. 2019). Regarding India, see Lyn M. Gaudet, Brain Fingerprinting, Scientific Evidence, and Daubert: A Cautionary Lesson from India , 51 Jurimetrics, J.L., Sci., and Tech. 293 (2011).

See eg Amanda Pustilnik, Evidence Without Law , work in progress (2020), (writing, ‘As the case of the roadside drug test demonstrates, an inadmissible, unreliable form of evidence, which has resulted in potentially thousands of false and faulty convictions, remains law enforcement’s best friend. Inadmissibility based on substantive unreliability is no barrier to securing guilty pleas and closing cases. Nor does inadmissibility of the drug tests appear to bear heavily on defender behavior: Defenders do not refuse to plead on the grounds that the state cannot prove its case with the inadmissible test. Instead, they counsel their clients to plead. There is no reason to think that the dynamic would be otherwise with newer, apparently more sophisticated and reliable forms of technological evidence’.)

While claiming no expertise in the complexities of U.S. asylum law, given current events it is worth pointing out that asylum hearings centrally depend upon credibility assessments. To qualify for asylum, a claimant must have a ‘well-founded’ fear of persecution that is the primary motivation for seeking refuge, the persecution must be on account of one of the statutorily specified bases of the refugee definition, and the alien must be unwilling or unable to return to their country of origin because of persecution or a well-founded fear of persecution. Richard D. Steel, Steel on Immigration Law § 8.8 (2018–19 ed.) The testimony of the applicant is sufficient to sustain this burden only if the adjudicator is satisfied that the testimony is credible, persuasive, and refers to specific facts sufficient to demonstrate that the applicant is a refugee. Id. In Oct. 2017, Attorney General Jeff Sessions decried the surge in asylum claimants as evidence of ‘rampant abuse and fraud because of unmeritorious claims of fear’. See Jeff Sessions, Attorney General of the USA, Remarks to the Executive Office for Immigration Review (accessed Apr. 17, 2020) (transcript of remarks at https://www.justice.gov/opa/speech/attorney-general-jeff-sessions-delivers-remarks-executive-office-immigration-review ). Sessions’ claim, of course, fails to account for changes in the base rate—that is, changes in world or country conditions that drastically increase the number of persecuted persons legitimately seeking refugee status.

See eg Juliet Cohen, Errors of Recall and Credibility: Can Omissions and Discrepancies in Successive Statements Reasonably Be Said to Undermine Credibility of Testimony? 69 Medico-Legal J. 25 (2001) (reviewing research that it is unusual for recall to be accurately reproduced, and that stories change for many reasons that do not necessarily indicate prevarication); Jane Herlihy, Peter Scragg, and Stuart Turner, Discrepancies In Autobiographical Memories—Implications for the Assessment of Asylum See kers: Repeated Interviews Study , 342 British Med. J. 324 (2002) (reporting that discrepancies in an individual’s accounts were common, more so in individuals with high levels of post-traumatic stress when the length of time between interviews increased, with more discrepancies in peripheral rather than central details).

See eg Safiya U. Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (2018); Tal Zarsky, The Trouble with Algorithmic Decisions: An Analytic Road Map to Examine Efficiency and Fairness in Automated and Opaque Decision Making , 41 Sci., Tech., Hum. Values 118 (2015); Reuben Binns, Algorithmic Accountability and Public Reason , 31 Philo. & Tech. 543 (2017).

See Tribe, supra note 173, at 1378–93 (describing, in Part II of the piece, the role that mathematics can play in the design of procedural trial rules); see also the efforts of a few state court systems to update trial procedures and jury instructions to systematically account for scientific understanding of eyewitness memory, supra note 13.

Contributor Information

Emily R D Murphy, University of California Hastings College of the Law, 200 McAllister Street, San Francisco, CA 94131.

Jesse Rissman, Psychology and Psychiatry & Biobehavioral Sciences, University of California, Los Angeles.

  • View on publisher site
  • PDF (756.4 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

Featured Topics

Featured series.

A series of random questions answered by Harvard experts.

Explore the Gazette

Read the latest.

The Barberton Greenstone Belt in South Africa.

What happened when a meteorite the size of four Mount Everests hit Earth? 

many people with different things going on inside their brains

When we say ‘smart,’ what do we mean?

research study on memory

Big discovery about microscopic ‘water bears’

Neuronal network.

Making memories

Kevin Jiang

HMS Communications

Study sheds light on how neurons form long-term memories

On a late summer day in 1953, a young man who would soon be known as patient H.M. underwent experimental surgery. In an attempt to treat his debilitating seizures, a surgeon removed portions of his brain, including part of a structure called the hippocampus. The seizures stopped.

Unfortunately, for patient H.M., so too did time. When he woke up after surgery, he could no longer form new long-term memories, despite retaining normal cognitive abilities, language and short-term working memory. Patient H.M.’s condition ultimately revealed that the brain’s ability to create long-term memories is a distinct process that depends on the hippocampus.

Scientists had discovered where memories are made. But  how  they are made remained unknown.

Now, neuroscientists at Harvard Medical School (HMS) have taken a decisive step in the quest to understand the biology of long-term memory and find ways to intervene when memory deficits occur with age or disease.

Reporting in  Nature  on Dec. 9, they describe a newly identified mechanism that neurons in the adult mouse hippocampus use to regulate signals they receive from other neurons, in a process that appears critical for memory consolidation and recall.

The study was led by  Lynn Yap , HMS graduate student in neurobiology, and  Michael Greenberg , chair of neurobiology in the Blavatnik Institute at HMS.

“If we can better understand this process, we will have new handles on memory and how to intervene when things go wrong, …” Michael Greenberg, Blavatnik Institute at HMS

“Memory is essential to all aspects of human existence. The question of how we encode memories that last a lifetime is a fundamental one, and our study gets to the very heart of this phenomenon,” said Greenberg, the HMS Nathan Marsh Pusey Professor of Neurobiology and study corresponding author.

The researchers observed that new experiences activate sparse populations of neurons in the hippocampus that express two genes, Fos and Scg2. These genes allow neurons to fine-tune inputs from so-called inhibitory interneurons, cells that dampen neuronal excitation. In this way, small groups of disparate neurons may form persistent networks with coordinated activity in response to an experience.

“This mechanism likely allows neurons to better talk to each other so that the next time a memory needs to be recalled, the neurons fire more synchronously,” Yap said. “We think coincident activation of this Fos-mediated circuit is potentially a necessary feature for memory consolidation, for example, during sleep, and also memory recall in the brain.”

Circuit orchestration

In order to form memories, the brain must somehow wire an experience into neurons so that when these neurons are reactivated, the initial experience can be recalled. In their study, Greenberg, Yap and team set out to explore this process by looking at the gene Fos.

First  described  in neuronal cells by Greenberg and colleagues in 1986, Fos is expressed within minutes after a neuron is activated. Scientists have taken advantage of this property, using Fos as a marker of recent neuronal activity to identify brain cells that regulate thirst,  torpor , and many other behaviors.

Scientists hypothesized that Fos might play a critical role in learning and memory, but for decades, the precise function of the gene has remained a mystery.

To investigate, the researchers exposed mice to new environments and looked at pyramidal neurons, the principal cells of the hippocampus. They found that relatively sparse populations of neurons expressed Fos after exposure to a new experience. Next, they prevented these neurons from expressing Fos, using a virus-based tool delivered to a specific area of the hippocampus, which left other cells unaffected.

Mice that had Fos blocked in this manner showed significant memory deficits when assessed in a maze that required them to recall spatial details, indicating that the gene plays a critical role in memory formation.

The researchers studied the differences between neurons that expressed Fos and those that did not. Using  optogenetics  to turn inputs from different nearby neurons on or off, they discovered that the activity of Fos-expressing neurons was most strongly affected by two types of interneurons.

Neurons expressing Fos were found to receive increased activity-dampening, or inhibitory, signals from one distinct type of interneuron and decreased inhibitory signals from another type. These signaling patterns disappeared in neurons with blocked Fos expression.

“What’s critical about these interneurons is that they can regulate when and how much individual Fos-activated neurons fire, and also when they fire relative to other neurons in the circuit,” Yap said. “We think that at long last we have a handle on how Fos may in fact support memory processes, specifically by orchestrating this type of circuit plasticity in the hippocampus.”

Imagine the day

The researchers further probed the function of Fos, which codes for a transcription factor protein that regulates other genes. They used single-cell sequencing and additional genomic screens to identify genes activated by Fos and found that one gene in particular, Scg2, played a critical role in regulating inhibitory signals.

In mice with experimentally silenced Scg2, Fos-activated neurons in the hippocampus displayed a defect in signaling from both types of interneurons. These mice also had defects in theta and gamma rhythms, brain properties thought to be critical features of learning and memory.

Previous studies had shown that Scg2 codes for a neuropeptide protein that can be cleaved into four distinct forms, which are then secreted. In the current study, Yap and colleagues discovered that neurons appear to use these neuropeptides to fine-tune inputs they receive from interneurons.

Together, the team’s experiments suggest that after a new experience, a small group of neurons simultaneously express Fos, activating Scg2 and its derived neuropeptides, in order to establish a coordinated network with its activity regulated by interneurons.

“When neurons are activated in the hippocampus after a new experience, they aren’t necessarily linked together in any particular way in advance,” Greenberg said. “But interneurons have very broad axonal arbors, meaning they can connect with and signal to many cells at once. This may be how a sparse group of neurons can be linked together to ultimately encode a memory.”

The study findings represent a possible molecular- and circuit-level mechanism for long-term memory. They shed new light on the fundamental biology of memory formation and have broad implications for diseases of memory dysfunction.

The researchers note, however, that while the results are an important step in our understanding of the inner workings of memory, numerous unanswered questions about the newly identified mechanisms remain.

“We’re not quite at the answer yet, but we can now see many of the next steps that need to be taken,” Greenberg said. “If we can better understand this process, we will have new handles on memory and how to intervene when things go wrong, whether in age-related memory loss or neurodegenerative disorders such as Alzheimer’s disease.”

The findings also represent the culmination of decades of research, even as they open new avenues of study that will likely take decades more to explore, Greenberg added.

“I arrived at Harvard in 1986, just as my paper describing the discovery that neuronal activity can turn on genes was published,” he said. “Since that time, I’ve been imagining the day when we would figure out how genes like Fos might contribute to long-term memory.”

Additional authors include Noah Pettit, Christopher Davis, M. Aurel Nagy, David Harmin, Emily Golden, Onur Dagliyan, Cindy Lin, Stephanie Rudolph, Nikhil Sharma, Eric Griffith, and Christopher Harvey.

The study was supported by the National Institutes of Health (grants R01NS028829, R01NS115965, R01NS089521, T32NS007473 and F32NS112455), a Stuart H.Q. and Victoria Quan fellowship, a Harvard Department of Neurobiology graduate fellowship, an Aramont Fund.

Share this article

You might like.

Giant impact had silver lining for life, according to new study

Findings series

Computer scientist says we should shift focus to ‘educability’

research study on memory

Bit of happenstance, second look at ancient fossils leads to new insights into evolution of tardigrade, one of most indestructible life forms on planet 

Does academic writing have to be boring?

English professor, journalist says first step to better prose is being aware that no one has to read you

How to fight depression? Faster.

Hope flags when medications fail, isolating and endangering patients. Backed by a major grant, 2 Harvard scientists are focused on reducing the distance between diagnosis and recovery.

ScienceDaily

Memory News

Top headlines, latest headlines.

  • How Do You Remember How to Ride a Bike?
  • Talking About Parents in Therapy: Emotions
  • Our Lives in the Mirror of Our Data
  • Neuroscientists Reactivate Memory Circuit in ...
  • How Are Pronouns Processed in Our Brains?
  • Cognitive Deficits from Meth and PCP
  • Skilled Movement Control and Hippocampus
  • Math and Dyspraxia
  • Delayed Feedback for Traumatic Brain Injury
  • Variety Is the Spice of Learning

Earlier Headlines

Tuesday, september 10, 2024.

  • Games, Puzzles and Reading Can Slow Cognitive Decline in the Elderly -- Even in Those With Mild Cognitive Impairment

Wednesday, September 4, 2024

  • Heavy Metal Cadmium May Be Tied to Memory Issues for Some
  • Music Can Reveal Which Areas of the Brain Are Affected by Aging

Wednesday, August 28, 2024

  • Neuroscientists Explore the Intersection of Music and Memory

Monday, August 26, 2024

  • Why Children Can't Pay Attention to the Task at Hand

Thursday, August 22, 2024

  • Researchers Have Discovered the Brain Circuit That Controls Our Ability to Recall Information and Memories
  • Taking a Trip Down Memory Lane Could Be the Key to Drinking Less Alcohol

Wednesday, August 21, 2024

  • Discovery of 'item Memory' Brain Cells Offers New Alzheimer's Treatment Target

Thursday, August 15, 2024

  • Sleep Resets Neurons for New Memories the Next Day
  • The Brain Creates Three Copies for a Single Memory

Wednesday, August 14, 2024

  • Singing from Memory Unlocks a Surprisingly Common Musical Superpower

Friday, August 9, 2024

  • Memory Problems in Old Age Linked to a Key Enzyme, Study in Mice Finds

Wednesday, August 7, 2024

  • Memory Loss in Aging and Dementia: Dendritic Spine Head Diameter Predicts Memory in Old Age
  • Molecule Restores Cognition, Memory in Alzheimer's Disease Model Mice
  • Processing Traumatic Memories During Sleep Leads to Changes in the Brain Associated With Improvement in PTSD Symptoms

Thursday, July 25, 2024

  • How Epigenetics Influence Memory Formation

Wednesday, July 17, 2024

  • Patients With Alzheimer's Disease Have Higher Frequency of Mental Health Symptoms Which Can Precede Memory Problems, Study Finds
  • Scientists Define New Type of Memory Loss in Older Adults
  • Cuttlefish Can Form False Memories, Too

Wednesday, July 10, 2024

  • Holiday Season Already? Anticipation Might Make Time Seem to Fly
  • Loneliness Increases Risk of Age-Related Memory Loss

Monday, July 8, 2024

  • Erasing 'bad Memories' To Improve Long Term Parkinson's Disease Treatment

Monday, July 1, 2024

  • Individuals Can Tell If Their Memories Are Trustworthy

Thursday, June 27, 2024

  • Vaccination May Reduce Memory Loss from COVID-19 Infections

Wednesday, June 26, 2024

  • How Do Our Memories Last a Lifetime? New Study Offers a Biological Explanation

Thursday, June 20, 2024

  • Can AI Learn Like Us?

Monday, June 17, 2024

  • Maternal Inheritance of Alzheimer's Disease Tied to Increased Risk of Developing Disease

Friday, June 14, 2024

  • Vitamin B6: New Compound Delays Degradation

Thursday, June 13, 2024

  • Studies Uncover the Critical Role of Sleep in the Formation of Memories

Wednesday, June 12, 2024

  • Does Having a Child With Low Birth Weight Increase a Person's Risk of Dementia?
  • Depressive Symptoms in Young Adults Linked to Thinking, Memory Problems in Midlife

Tuesday, June 11, 2024

  • Depressive Symptoms May Hasten Memory Decline in Older People

Friday, June 7, 2024

  • How Does Oxygen Depletion Disrupt Memory Formation in the Brain?

Wednesday, June 5, 2024

  • Higher Blood Pressure Is Associated With Poorer Cognition in Adolescence

Thursday, May 30, 2024

  • In the Brain at Rest, Neurons Rehearse Future Experience
  • Astrocytes Induce Sex-Specific Effects on Memory

Wednesday, May 29, 2024

  • First Hints of Memory Problems Associated With Changes in the Brain

Friday, May 24, 2024

  • Understanding a Broken Heart

Monday, May 20, 2024

  • Exercise Spurs Neuron Growth and Rewires the Brain, Helping Mice Forget Traumatic and Addictive Memories

Wednesday, May 15, 2024

  • Eurasian Jays Can Use 'mental Time Travel' Like Humans, Study Finds
  • The Crystallization of Memory: Study Reveals How Practice Forms New Memory Pathways in the Brain

Tuesday, May 14, 2024

  • Study Links Sleep Apnea Severity During REM Stage to Verbal Memory Decline
  • People Without an Inner Voice Have Poorer Verbal Memory

Monday, May 13, 2024

  • What Makes a Memory? It May Be Related to How Hard Your Brain Had to Work

Thursday, May 9, 2024

  • Study Shows Heightened Sensitivity to PTSD in Autism

Wednesday, May 8, 2024

  • 'Mathematical Microscope' Reveals Novel, Energy-Efficient Mechanism of Working Memory That Works Even During Sleep

Wednesday, May 1, 2024

  • Dynamic DNA Structures and the Formation of Memory
  • Losing Keys and Everyday Items 'not Always Sign of Poor Memory'

Wednesday, April 24, 2024

  • Network Model Unifies Recency and Central Tendency Biases
  • Good Heart Health in Middle Age May Preserve Brain Function Among Black Women as They Age

Monday, April 22, 2024

  • Eye-Opener: Pupils Enlarge When People Focus on Tasks

Wednesday, April 17, 2024

  • Does Using Your Brain More at Work Help Ward Off Thinking, Memory Problems?
  • Workings of Working Memory Detailed

Monday, April 15, 2024

  • Take It from the Rats: A Junk Food Diet Can Cause Long-Term Damage to Adolescent Brains

Thursday, April 11, 2024

  • Brainless Memory Makes the Spinal Cord Smarter Than Previously Thought

Wednesday, April 10, 2024

  • New Origin of Deep Brain Waves Discovered

Tuesday, April 9, 2024

  • New Technique Sheds Light on Memory and Learning

Tuesday, April 2, 2024

  • Blended Antioxidant Supplement Improves Cognition and Memory in Aged Mice

Thursday, March 28, 2024

  • Mechanism Found to Determine Which Memories Last

Wednesday, March 27, 2024

  • Making Long-Term Memories Requires Nerve-Cell Damage
  • A Decade of Aphantasia Research: What We've Learned About People Who Can't Visualize

Monday, March 25, 2024

  • Artificial Nanofluidic Synapses Can Store Computational Memory

Friday, March 22, 2024

  • Early Intervention After the First Seizure May Prevent Long-Term Epilepsy and Associated Cognitive Deficits

Thursday, March 21, 2024

  • The Power of Neighbors: Neighboring Synapses Shape Learning and Memory

Wednesday, March 20, 2024

  • Keto Diet Prevents Early Memory Decline in Mice

Monday, March 18, 2024

  • Researchers Find Unanticipated Complexity in Aging Brain's Memory Decline

Friday, March 15, 2024

  • Fatty Food Before Surgery May Impair Memory in Old, Young Adults

Tuesday, March 12, 2024

  • Study: Best Way to Memorize Stuff? It Depends ...

Monday, March 11, 2024

  • Researchers Identify Gene Involved in Neuronal Vulnerability in Alzheimer's Disease

Friday, March 8, 2024

  • Brain Waves Travel in One Direction When Memories Are Made and the Opposite When Recalled

Wednesday, March 6, 2024

  • A Noninvasive Treatment for 'chemo Brain'

Monday, March 4, 2024

  • Sleep Apnea Symptoms Linked to Memory and Thinking Problems

Tuesday, February 27, 2024

  • Long-Term Memory and Lack of Mental Images
  • Learning and Memory Problems in Down Syndrome Linked to Alterations in Genome's 'dark Matter'

Monday, February 26, 2024

  • Yoga Provides Unique Cognitive Benefits to Older Women at Risk of Alzheimer's Disease

Wednesday, February 21, 2024

  • Sleep Improves Ability to Recall Complex Events

Tuesday, February 20, 2024

  • Blocking Key Protein May Halt Progression of Alzheimer's Disease
  • Can a Single Brain Region Encode Familiarity and Recollection?

Thursday, February 15, 2024

  • The Brain Is 'programmed' For Learning from People We Like

Tuesday, February 13, 2024

  • Oxytocin: The Love Hormone That Holds the Key to Better Memory
  • Neural Prosthetic Device Can Help Humans Restore Memory
  • Are You Depressed? Scents Might Help

Tuesday, February 6, 2024

  • Improving Quality of Life and Sleep in People With Memory Problems Without Using Drugs

Monday, February 5, 2024

  • Fatty Acids Hold Clue to Creating Memories

Thursday, February 1, 2024

  • Scientists Discover a Potential Way to Repair Synapses Damaged in Alzheimer's Disease

Wednesday, January 31, 2024

  • Polycystic Ovary Syndrome Tied to Memory, Thinking Problems
  • Did Dementia Exist in Ancient Greek and Rome?

Monday, January 29, 2024

  • Playing an Instrument Linked to Better Brain Health in Older Adults

Thursday, January 25, 2024

  • Researchers Discover a New Role for a Protein That Helps Form Memories

Tuesday, January 23, 2024

  • Could Bizarre Visual Symptoms Be a Telltale Sign of Alzheimer's?

Monday, January 22, 2024

  • How Aging Alters Brain Cells' Ability to Maintain Memory

Friday, January 19, 2024

  • Research Into the Nature of Memory Reveals How Cells That Store Information Are Stabilized Over Time
  • Generative AI Helps to Explain Human Memory and Imagination

Thursday, January 18, 2024

  • Don't Look Back: The Aftermath of a Distressing Event Is More Memorable Than the Lead-Up
  • Third Major Study Finds Evidence That Daily Multivitamin Supplements Improve Memory and Slow Cognitive Aging in Older Adults
  • Physical Exercise Boosts Motor Learning -- And Remembering What One Has Learned

Tuesday, January 16, 2024

  • Amnesia Caused by Head Injury Reversed in Early Mouse Study

Tuesday, January 2, 2024

  • Researchers Identify New Coding Mechanism That Transfers Information from Perception to Memory

Monday, December 18, 2023

  • AI's Memory-Forming Mechanism Found to Be Strikingly Similar to That of the Brain
  • Memory Research: Breathing in Sleep Impacts Memory Processes
  • LATEST NEWS
  • Top Science
  • Top Physical/Tech
  • Top Environment
  • Top Society/Education
  • Health & Medicine
  • Mind & Brain
  • Disorders and Syndromes
  • ADD and ADHD
  • Alzheimer's
  • Bipolar Disorder
  • Borderline Personality Disorder
  • Brain Injury
  • Hearing Impairment
  • Huntington's Disease
  • Mad Cow Disease
  • Multiple Sclerosis
  • Obstructive Sleep Apnea
  • Parkinson's
  • Schizophrenia
  • Sleep Disorders
  • Education & Learning
  • Brain-Computer Interfaces
  • Educational Psychology
  • Infant and Preschool Learning
  • Intelligence
  • K-12 Education
  • Language Acquisition
  • Learning Disorders
  • Illegal Drugs
  • Crystal Meth
  • Psychedelic Drugs
  • Living Well
  • Anger Management
  • Child Development
  • Consumer Behavior
  • Dieting and Weight Control
  • Gender Difference
  • Nutrition Research
  • Racial Issues
  • Relationships
  • Spirituality
  • Mental Health
  • Eating Disorders
  • Smoking Addiction
  • Neuroscience
  • Child Psychology
  • Social Psychology
  • Space & Time
  • Matter & Energy
  • Computers & Math
  • Plants & Animals
  • Earth & Climate
  • Fossils & Ruins
  • Science & Society
  • Business & Industry

Strange & Offbeat

  • How Mammals Got Their Stride
  • New Parasite Discovered in Foxes
  • Cancer Prevalence Across Vertebrates
  • Arid Regions: Capturing Water Vapor from Air
  • Saturn's Moon Titan Has Thick Insulating Crust
  • 'Well-Man' Thrown from Castle 800 Years Ago
  • Dandelion-Shaped Supernova and Zombie Star
  • Gut Bacteria Transfer Genes to Beat ...
  • Tiny Electric Currents May Cut Skin Infections
  • Grain-Sized Soft Robots for Drug Delivery

Trending Topics

Brief, daily meditation enhances attention, memory, mood, and emotional regulation in non-experienced meditators

Affiliations.

  • 1 New York University, Center for Neural Science, 4 Washington Place, Room 809, New York, NY 10003, United States; Virginia Tech Carilion Research Institute, Center for Transformative Research on Health Behaviors, 1 Riverside Circle, Suite 104G, Roanoke, VA 24016, United States. Electronic address: [email protected].
  • 2 New York University, Center for Neural Science, 4 Washington Place, Room 809, New York, NY 10003, United States.
  • 3 New York University, Center for Neural Science, 4 Washington Place, Room 809, New York, NY 10003, United States. Electronic address: [email protected].
  • PMID: 30153464
  • DOI: 10.1016/j.bbr.2018.08.023

Meditation is an ancient practice that cultivates a calm yet focused mind; however, little is known about how short, practical meditation practices affect cognitive functioning in meditation-naïve populations. To address this question, we randomized subjects (ages of 18-45) who were non-experienced meditators into either a 13-min daily guided meditation session or a 13-min daily podcast listening session (control group) for a total duration of 8 weeks. We examined the effects of the daily meditation practice relative to podcast listening on mood, prefrontal and hippocampal functioning, baseline cortisol levels, and emotional regulation using the Trier Social Stress Test (TSST). Compared to our control group, we found that 8 but not 4 weeks of brief, daily meditation decreased negative mood state and enhanced attention, working memory, and recognition memory as well as decreased state anxiety scores on the TSST. Furthermore, we report that meditation-induced changes in emotional regulation are more strongly linked to improved affective state than improved cognition. This study not only suggests a lower limit for the duration of brief daily meditation needed to see significant benefits in non-experienced meditators, but suggests that even relatively short daily meditation practice can have similar behavioral effects as longer duration and higher-intensity mediation practices.

Keywords: Breathing; Cognition; Consciousness; Executive function; Mindfulness; Stress.

Published by Elsevier B.V.

Publication types

  • Research Support, Non-U.S. Gov't
  • Affect / physiology*
  • Anxiety / psychology
  • Attention / physiology*
  • Cognition / physiology
  • Emotions / physiology*
  • Meditation / psychology*
  • Memory, Short-Term / physiology*
  • Middle Aged
  • Mindfulness
  • Stress, Psychological / psychology
  • Young Adult

Stanford researchers observe memory formation in real time

research study on memory

Why is it that someone who hasn’t ridden a bicycle in decades can likely jump on and ride away without a wobble, but could probably not recall more than a name or two from their 3rd grade class?

This may be because physical skills — dubbed motor memories by neuroscientists — are encoded differently in our brains than our memories for names or facts.

Now, a new study by scientists with the Wu Tsai Neurosciences Institute is revealing exactly how motor memories are formed and why they are so persistent. It may even help illuminate the root causes of movement disorders like Parkinson’s disease.

“We think motor memory is unique,” said Jun Ding , an associate professor of neurosurgery and of neurology. “Some studies on Alzheimer’s disease included participants who were previously musicians and couldn’t remember their own families, but they could still play beautiful music. Clearly, there’s a huge difference in the way that motor memories are formed.”

Memories are thought to be encoded in the brain in the pattern of activity in networks of hundreds or thousands of neurons, sometimes distributed across distant brain regions. The concept of such a memory trace — sometimes called a memory engram — has been around for more than a century, but identifying exactly what an engram is and how it is encoded has proven extremely challenging. Previous studies have shown that some forms of learning activate specific neurons, which reactivate when the learned memory is recalled. However, whether memory engram neurons exist for motor skill learning remains unknown.

Ding and postdoctoral scholars Richard Roth and Fuu-Jiun Hwang wanted to know how these engram-like groups of cells get involved in learning and remembering a new motor skill.

“When you’re first learning to shoot a basketball, you use a very diverse set of neurons each time you throw, but as you get better, you use a more refined set that’s the same every time,” said Roth. “These refined neuron pathways were thought to be the basis of a memory engram, but we wanted to know exactly how these pathways emerge.”

In their new study, published July 8, 2022 in Neuron , the researchers trained mice to use their paws to reach food pellets through a small slot. Using genetic wizardry developed by the lab of Liqun Luo , a Wu Tsai Neurosciences Institute colleague in the Department of Biology, the researchers were able to identify specific neurons in the brain’s motor cortex — an area responsible for controlling movements — that were activated during the learning process. The researchers tagged these potential engram cells with a fluorescent marker so they could see if they also played a role in recalling the memory later on.

When the researchers tested the animals’ memory of this new skill weeks later, they found that those mice that still remembered the skill showed increased activity in the same neurons that were first identified during the learning period, showing that these neurons were responsible for encoding the skill: the researchers had observed the formation of memory engrams.

But how do these particular groups of neurons take on responsibility for learning a new task in the first place? And how do they actually improve the animal’s performance?

To answer these questions, the researchers zoomed in closer. Using two-photon microscopy to observe these living circuits in action, they observed the so-called “engram neurons” reprogram themselves as the mice learned. Motor cortex engram cells took on new synaptic inputs — potentially reflecting information about the reaching movement — and themselves formed powerful new output connections in a distant brain region called the dorsolateral striatum — a key waystation through which the engram neurons can exert refined control over the animal’s movements. It was the first time anyone had observed the creation of new synaptic pathways on the same neuron population — both at the input and the output levels — in these two brain regions.

Graphical abstract summarizing the current study

The ability to trace new memories forming in the mouse brain allowed the research team to weigh in on a long-standing debate about how skills are stored in the brain: are they controlled from one central memory trace, or engram, or is the memory redundantly stored across many different brain areas? Though this study cannot discount the idea of centralized memory, it does lend credibility to the opposing theory. Another fascinating question is whether the activation of these engram neurons is required for the performance of already learned motor tasks. The researchers speculated that by suppressing the activity of neurons that had been identified as part of the motor cortex memory engram, the mice probably still would be able to perform the task.

“Think of memory like a highway. If 101 and 280 are both closed, you could still get to Stanford from San Francisco, it would just take a lot longer,” said Ding.   

These findings suggest that, in addition to being dispersed, motor memories are highly redundant. The researchers say that as we repeat learned skills, we are continually reinforcing the motor engrams by building new connections — refining the skill. It’s what is meant by the term muscle memory — a refined, highly redundant network of motor engrams used so frequently that the associated skill seems automatic.

research study on memory

Ding believes that this constant repetition is one reason for the persistence of motor memory, but it’s not the only reason. Memory persistence may also be affected by a skill being associated with a reward, perhaps through the neurotransmitter dopamine. Though the research team did not directly address it in this study, Ding’s previous work in Parkinson’s disease suggests the connection.

“Current thinking is that Parkinson’s disease is the result of these motor engrams being blocked, but what if they’re actually being lost and people are forgetting these skills?” said Ding. “Remember that even walking is a motor skill that we all learned once, and it can potentially be forgotten.”

It’s a question that the researchers hope to answer in a follow-up study, because it may be the key to developing effective treatments for motor disorders. If Parkinson’s disease is the result of blocked motor memories, then patients should be able to improve their movement abilities by practicing and reinforcing these motor skills. On the other hand, if Parkinson’s destroys motor engrams and inhibits the creation of new ones — by targeting motor engram neurons and their synaptic connection observed in the team’s new study — then a completely different approach must be taken to deliver effective treatments.

“Our next goal is to understand what’s happening in movement disorders like Parkinson’s,” Ding said. “Obviously, we’re still a long way from a cure, but understanding how motor skills form is critical if we want to understand why they’re disrupted by disease.”

The research was published July 8 in Neuron: https://doi.org/10.1016/j.neuron.2022.06.006

Study authors were Fuu-Jiun Hwang, Richard H. Roth, Yu-Wei Wu, Yue Sun, Destany K. Kwon, Yu Liu, and Jun B. Ding.

The research was supported by the National Institutes of Health (NIH) and National Institute for Neurological Disease and Stroke (NINDS); the Klingenstein Foundation's Aligning Science Across Parkinson’s initiative; and GG gift fund, the Stanford School of Medicine Dean’s Postdoctoral Fellowship; and Parkinson’s Foundation Postdoctoral Fellowship.

Related News

research study on memory

Memory lane

Protein in human umbilical cord blood propels old mice’s sputtering memory to ne..., you, too, can become a memory ace — and it will change your brain.

  • Subscribe to BBC Science Focus Magazine
  • Previous Issues
  • Future tech
  • Everyday science
  • Planet Earth
  • Newsletters

Memory: a timeline of discoveries

Memory has fascinated scientists and philosophers for thousands of years, so how did we crack the nature of our information archive?

Helen Thomson

500 BC – Ancient Greek poet Simonides of Ceos develops what is now known as the method of loci: a memory technique that world memory champions can use to remember pi to 70,000 digits.

300 BC – Philosophers Plato and Aristotle put forward the first theories of memory, describing it as something akin to etchings on a wax tablet.

Detail of Plato (left) and Aristotle from The School of Athens by Raphael (Photo by © Ted Spiegel/CORBIS/Corbis via Getty Images)

Hermann Ebbinghaus (1850-1909) created a series of nonsense words that provided a way of testing different aspects of memory and forgetting, leading to early definitions of sensory, short-term and long-term memory.

German experimental psychologist Hermann Ebbinghaus.

Donald Hebb (1904-1985) proposed that brain cells that are active at the same time form new and stronger connections – a theory that is now known to underlie our ability to create long-term memories.

Read more about memory:

  • Where do memories form and how do we know?
  • Memory and the brain – the key discovery
  • What happens in your brain when you make a memory?

1906 – Physicians Santiago Ramón y Cajal and Camillo Golgi share a Nobel Prize in Physiology or Medicine for their work on staining techniques that provided the first clear images of individual neurons.

Coloured neurons in the hippocampus of a mouse © AFP via Getty Images

Wilder Penfield (1891-1976) used electrical currents to stimulate the brain during surgery while his patients were awake. He discovered that you could evoke a memory merely by stimulating parts of the cortex.

Dr Wilder Penfield © Getty Images

Henry Molaison (1926-2008) – After having both sides of his hippocampus removed in an attempt to cure his epilepsy, Molaison experienced profound amnesia. He became one of neuroscience’s most studied individuals, providing key insights into where memories are stored in the brain.

research study on memory

1990s – Throughout the 1990s, Elizabeth Loftus and her colleagues demonstrate the malleability of memory, specifically how false memories can be implanted in our minds.

Prof. Elizabeth Loftus © Getty Images

2002 – Neuroscientist Eleanor Maguire scanned the world’s best memorisers and found their brains did not differ from anyone else’s. They were better at remembering because they used a mnemonic device called the method of loci.

2017 – Researchers use optogenetics to discover that long-term memories are created in the brain at the same time as short-term memories, overturning a decades-old theory of how long-term memories form.

  • This article first appeared in issue 314 of BBC Focus magazine

Share this article

© Getty Images

research study on memory

  • Terms & Conditions
  • Privacy policy
  • Cookies policy
  • Code of conduct
  • Magazine subscriptions
  • Manage preferences

Memory Studies – Development, Debates and Directions

  • Living reference work entry
  • Later version available View entry history
  • First Online: 30 May 2021
  • Cite this living reference work entry

research study on memory

  • Aline Sierp 9  

296 Accesses

10 Citations

This chapter introduces the reader to the field of Memory Studies. It traces the emergence of memory as a topic of investigation and gives an overview of the development of the field. It highlights the main actors and institutions, describes the most influential debates that have shaped the field and illustrates the institutional structures that sustain it. In doing so, it argues that Memory Studies has started to display all the features characteristic of an established discipline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

research study on memory

Memory Studies

research study on memory

Memory Methods: An Introduction

research study on memory

Introduction: Regions of Memory in Theory

Assmann, J. 1992. Cultural Memory and Early Civilization . Cambridge: Cambridge University Press.

Google Scholar  

Blight, D. W. 2009. The Memory Boom: Why and Why Now? In Memory in Mind and Culture , Hrsg. P. Boyer und J. V. Wertsch, 238–251. Cambridge: Cambridge University Press.

Chapter   Google Scholar  

Bodnar, J. 1992. Remaking America: Public memory, commemoration, and patriotism in the twentieth century . Princeton: Princeton University Press.

Brown, S. D. 2008. The quotation marks have a certain importance: prospects for a “memory studies. Memory Studies 1(3): 261–271.

Article   Google Scholar  

Cole, J. 2001. Forget colonialism? Sacrifice and the art of memory in Madagascar . Berkeley, CA: University of California Press.

Drozdzewski, D., und C. Birdsall. 2018. Doing Memory Research . Basingstoke: Palgrave Macmillan.

Dutceac Segesten, A., und J. Wüstenberg. 2017. Memory Studies: The State of an Emergent Field. Memory Studies 10(4): 474–489.

Englund, S. 1992. The Ghost of Nation Past. Journal of Modern History 64(2): 299–320.

Erll, A. 2011a. Memory in Culture . Basingstoke: Palgrave Macmillan.

Book   Google Scholar  

Erll, A. 2011b. Travelling Memory. Parallax 17 (4): 4–18.

Erll, A., A. Nunning, und S. B. Young. 2008. Cultural Memory Studies: An International and Interdisciplinary Handbook. Berlin . New York: Walter de Gruyter.

Frei, N. 2006. Transnationale Vergangenheitspolitik. Der Umgang mit deutschen Kriegsverbrechen in Europa nach dem Zweiten Weltkrieg . Göttingen: Wallstein Verlag.

Garton Ash, T. 2002. Trials, purges and history lessons: treating a difficult past in post-communist Europe. In Memory and Power in Post-War Europe: Studies in the Present of the Past , Hrsg. J.-W. Müller, 265–282. Cambridge: Cambridge University Press.

Gillis, J., Hrsg. 1994. Commemorations: The Politics of National Identity . Princeton: Princeton University Press.

Ginzburg, C. 1997. Shared Memories, Private Recollections. History and Memory 9 (1-2): 353–363.

Halbwachs, M. 1950. La mémoire collective . Paris: Presses universitaires de France.

Halbwachs, M. 1925. Les cadres sociaux de la mémoire . Paris: Librairie Félix.

Hoskins, A. 2018. Digital Memory Studies . London: Routledge.

Huyssen, A. 1995. Twilight Memories: marking Time in a Culture of Amnesia . New York: Routledge.

Kammen, M. 1995. Review of Frames of Remembrance: The Dynamics of Collective Memory by Iowa Irwin-Zarecka. History & Theory 34(3): 245–262.

Middleton, D., und S. D. Brown. 2005. The social psychology of experience: Studies in remembering and forgetting . San Francisco, CA: Sage.

Keightley, E., und M. Pickering. 2013. Research Methods for Memory Studies . Edinburgh: Edinburgh University Press.

Kroh, J. 2008. Transnationale Erinnerung. Der Holocaust im Fokus geschichtspolitischer Initiativen . Frankfurt: Campus Verlag.

Kübler, E. 2012. Europäische Erinnerungspolitik. Der Europarat und die Erinnerung an den Holocaust. In Bielefeld: transcript Verlag .

Lorenz, C. 2014. Blurred Lines. History, Memory and the Experience of Time. International Journal for History, Culture and Modernity 2(1): 43–62.

Nora, P. 1984. -1992. Le Lieux de Mémoire . Paris: Gallimard.

Olick, J. K. 2007. The Politics of Regret: On Collective Memory and Historical Responsibility . New York: Routledge.

Olick, J. K. 2008. Collective Memory: a memoir and prospect. Memory Studies 1(1): 23–29.

Olick, J. K., und J. Robbins. 1998. Social memory studies: from “Collective Memory” to the historical sociology of mnemonic practices. Annual Review of Sociology 24(1): 105–140.

Olick, J. K., V. Vinitzky-Seroussi, und D. Levy. 2011. The Collective Memory Reader . New York: Oxford University Press.

Pennebaker, J. W., D. Paez, und J. Rime´, Hrsg. 1997. Collective memory of political events: Social psychological perspectives . Mahwah, NJ: Lawrence Erlbaum Associates Inc.

Radstone, S. 2000. Memory and Methodology . Oxford and New York: Berg.

Radstone, S. 2008. Memory studies: for and against. Memory Studies 1(1): 31–39.

Reading, A. 2016. Gender and memory in the Globital Age . Basingstoke: Palgrave Macmillan.

Roediger, H. L., und J. V. Wertsch. 2007. Creating a new discipline of memory studies. Memory Studies 1(1): 5–17.

Sierp, Aline. 2014. History, Memory and Trans-European Identity: Unifying Divisions . New York: Routledge.

Yates, F. 1966. The Art of Memory . London: Routledge.

Young, J. 1993. The texture of memory: Holocaust memorials and meaning . New Haven, CT: Yale University Press.

Zelizer, B. 1995. Reading the past against the grain: the shape of memory studies. Critical Studies in Media Communication 12: 214–239.

Zerubavel, E. 2003. Time maps: Collective memory and the social shape of the past . Chicago: University of Chicago Press.

Download references

Author information

Authors and affiliations.

Maastricht University, Maastricht, Niederlande

Aline Sierp

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Aline Sierp .

Editor information

Editors and affiliations.

Zentrum für Antisemitismusforschung, Technische Universität Berlin, Berlin, Berlin, Germany

Mathias Berek

Institut für Politikwissenschaft, TU Dresden, Dresden, Sachsen, Germany

Kristina Chmelar

Institut für Soziologie, Universität Koblenz-Landau, Koblenz, Bayern, Germany

Oliver Dimbath

TRAWOS-Institut, Hochschule Zittau/Görlitz, Görlitz, Sachsen, Germany

ISF München, München, Bayern, Germany

Michael Heinlein

Zentrum für Militärgeschichte und Sozialwissenschaften (ZMS) der Bundeswehr, Potsdam, Brandenburg, Germany

Nina Leonhard

Institut für Soziologie, FAU Erlangen-Nürnberg, Erlangen, Bayern, Germany

Valentin Rauer

Gerd Sebald

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature

About this entry

Cite this entry.

Sierp, A. (2021). Memory Studies – Development, Debates and Directions. In: Berek, M., et al. Handbuch Sozialwissenschaftliche Gedächtnisforschung. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-26593-9_42-1

Download citation

DOI : https://doi.org/10.1007/978-3-658-26593-9_42-1

Received : 17 November 2020

Accepted : 23 November 2020

Published : 30 May 2021

Publisher Name : Springer VS, Wiesbaden

Print ISBN : 978-3-658-26593-9

Online ISBN : 978-3-658-26593-9

eBook Packages : Springer Referenz Naturwissenschaften

  • Publish with us

Policies and ethics

Chapter history

DOI: https://doi.org/10.1007/978-3-658-26593-9_42-2

DOI: https://doi.org/10.1007/978-3-658-26593-9_42-1

  • Find a journal
  • Track your research

ORIGINAL RESEARCH article

The impact of a mnemonic acronym on learning and performing a procedural task and its resilience toward interruptions.

\r\nTara Radovi&#x;*

  • Institute of Psychology and Ergonomics, Berlin Institute of Technology, Berlin, Germany

The present study examines the potential impact of a mnemonic acronym on the learning, the execution, the resilience toward interruptions, and the mental representation of an eight-step procedural task with sequential constraints. 65 participants were required to learn a sequential task, including eight different steps which had to be carried out in a predefined sequence. 33 participants were provided with the acronym “WORTKLAU” as a mnemonic to support the learning and execution of the task and the other 32 participants had to learn and execute the task without such support. Each letter of the acronym coded one step of the task, involving a binary decision about a certain property of the complex stimulus. In 60 out of 72 trials of the task, participants were interrupted between different steps, and had to perform a 2-back interruption task for 6 or 30 s, after which they had to resume the procedural task as quickly as possible at the correct step. Learning times, performance in uninterrupted trials, and post-interruption performance measures were analyzed. Results of Experiment 1 suggest that the mnemonic acronym enhanced learning of the task sequence, and provide some evidence for a hierarchical mental representation of the task, resulting in faster resumption times at certain steps of the procedure after an interruption. In Experiment 2 the internal structure of the acronym was even emphasized by a hyphen at the borders of the two words included in the acronym (WORT-KLAU). This improved the resilience toward interruptions at the border step of the procedure significantly. Our results provide evidence for beneficial effects of mnemonic acronym particularly for the learning of a sequential procedural task. In addition, they suggest that the structure of mnemonic acronym directly impacts the mental representation of a task. Finally, they show that mnemonic acronyms could be used to improve the resilience toward detrimental effect of interruptions, at least at certain task steps of a procedural task.

Introduction

Accomplishing a complex task in everyday life or professional settings often requires to remember how to conduct a procedure that consists of a sequence of steps, which have to be performed in a predefined order. A simple example from everyday life is the sequence of actions needed to make boiled eggs. To get the eggs right, one needs to follow a sequence of steps, and any commission of sequence errors that is deviating from the right order (e.g., putting the eggs in the water before it boils) or omitting to perform a step (e.g., missing to pierce the egg before putting it in the boiling water) may compromise the result. Admittedly, the procedure to cook eggs is fairly easy and consequences of sequence errors in this example are only minor. However, there are other (professional) settings where committing sequence errors while performing a procedural task can have much more serious consequences. Examples are the execution of procedures by pilots on the flight deck, by physicians in an emergency department, or by nurses providing medication in a hospital. Here committing an error may have fatal consequences, which can hardly be corrected, and this risk is even elevated in case of interruptions which have frequently been observed in these settings ( Latorella, 1996 ; Dismukes et al., 1998 ; Drews, 2007 ; Scott-Cawiezell et al., 2007 ; Westbrook et al., 2010 ). As a countermeasure ensuring the correct execution of procedural tasks and making them more resilient toward interruptions, they often are supported by different sorts of checklists, which shall reduce memory demands and prevent the commission of sequence errors, primarily the missing of important steps ( Latorella, 1999 ; Loukopoulos et al., 2001 , 2003 ). However, checklists are not always available and there are number of instances where even important and safety-critical procedures have to be performed based on memory only (i.e., so called memory items in aviation, Hunt, 1988 ; Au, 2005 ). This provides a number of cognitive challenges similar to order memory and serial recall (e.g., Henson, 1998 ; Hurlstone et al., 2014 ), including initial learning of the sequence, retaining the sequence across time, and, most important, retrieving the correct order of steps once the procedure has to be executed. According to some authors, the latter is assumed to involve a so-called placekeeping process, i.e., monitoring the progress within a procedural task by keeping track of completed and to-be-executed steps ( Carlson and Cassenti, 2004 ; Trafton et al., 2011 ; Hambrick and Altmann, 2015 ).

Research from serial learning and recall suggest that these challenges might effectively be supported by the use of mnemonic techniques. It has been shown that the support in organizing the to-be-remembered material in the learning phase enhances learning, has long-term effect on retention of material, and leads to better performance in the recall phase by providing hierarchical organization of the learnt material (e.g., Miller, 1956 ; Bower, 1970 ; Bellezza, 1981 ; Malhotra, 1991 ; Higbee, 2001 ). One such technique is the administration of mnemonic acronyms, i.e., pronounceable phrases or words where each letter represents an item that has to be remembered in the order given by the phrase (e.g., first letter mnemonic; Malhotra, 1991 ; Higbee, 2001 ). Use of acronyms for memorizing items in a serial order is widely present in education ( Cook, 1989 ; Miller and Mercer, 1993 ; Stalder, 2005 ), and in clinical practice ( Bortle, 2010 ). Also, it was found that people voluntary develop acronyms and organize information in chunks ( Cook, 1989 ; Bower, 1970 ; Blick and Waite, 1971 ; Blick et al., 1972 ; Gruneberg, 1973 ; Bortle, 2010 ), which also points to the potentially positive effects of such mnemonic for order learning and recall.

Applied to procedural tasks that need to be performed from memory, the provision of acronyms composed of letters which represent the different steps might have at least three beneficial effects. First, it might enhance learning, retention, and retrieval of the steps in correct order. This is suggested by early studies demonstrating advantages of mnemonic acronyms on the learning and reproduction of verbal material ( Nelson and Archer, 1972 ; Stalder, 2005 ). Positive effects of mnemonic acronyms were shown particularly in situations where the order of items had to be learned and retrieved ( Nelson and Archer, 1972 ; Morris and Cook, 1978 ), whereas usually no effects were found where the identity of individual items needed to be retrieved ( Nelson and Archer, 1972 ; Morris and Cook, 1978 ; Carlson et al., 1981 ). The specific benefit of mnemonic acronyms for memorization of item order, but not item identity, might account for inconsistent findings regarding positive effects of mnemonic acronyms in verbal learning ( Boltwood and Blick, 1970 ; Gruneberg, 1973 ; Cook, 1989 ). For that reason, it seems at least plausible that the availability of acronyms would also support learning the correct order of different steps constituting a procedural task.

Second, the availability of an acronym might also increase the execution speed of the different steps of a procedural task, i.e., serve as a process mnemonic tool ( Higbee, 1987 ; Manalo, 2002 ). This is expected, because the availability of a pronounceable acronym provides a cuing structure whose inherent links between the different letters might strengthen the associations between successive steps ( Malhotra, 1991 ), which in turn could improve the transfer between the steps, leading to an overall increase of speed and accuracy in the execution phase.

Third, it can be assumed that mnemonic acronyms might enhance the resilience of a sequential procedural task toward adverse effects of interruptions. Adverse effects of interruptions, that is, additional time needed to resume a primary task after an interruption ( resumption time ) and elevated risk of committing sequence errors (i.e., skipping or repeating a step), have often been reported when resuming the primary task. Among other factors, the interruption effects depend on the length and complexity of the interruption task ( Hodgetts and Jones, 2006 ; Cades et al., 2007 ; Monk et al., 2008 , see for a review Trafton and Monk, 2007 ). These effects are often interpreted within the memory for goals model proposed by Altmann and Trafton (2002) . The model states that task goals need to be activated in working memory in order to perform a cognitive task. Assuming that cognitive goals underlie the same constraints as other items in working memory, active strengthening is required to reach and maintain sufficient level of activation in order to retrieve the goals successfully. Interruptions of a procedural task cause a decrease of the activation of related task goals, unless the goals are rehearsed while performing the interruption task. Thus, in order to resume the procedural task at the correct position after an interruption, the position within the task needs to be rehearsed during the interruption, and the activation level of the goal related to the correct task step needs to be elevated again, based on internal or external cues. It seems plausible that mnemonic acronyms could provide simple internal cues (e.g., letters instead of words or sentences) for rehearsal and re-activation of task steps, and consequently enhance goal activation in memory. Thus, mnemonic acronyms could be helpful during an interruption, when the goals of the primary task have to be rehearsed in parallel with the execution of the interruption task, as well as after the interruption, when reorienting and re-activation of the primary task goals take place. These effects should be reflected in decreased resumption times and in a decreased risk of sequence errors after interruptions, compared to a situation where no acronym is available.

Given these possible advantages and the available evidence for specific benefits of mnemonic acronyms in terms of order memorization, the provision of mnemonic acronyms to support learning, retention, and retrieval of procedural tasks with sequential constraints seems to be promising. Despite the examples of the use of acronyms to remember and retrieve the correct sequence of steps in a procedure (e.g., decision making procedures, Hörmann, 1994 ), the performance consequences of this mnemonic technique on learning and execution of procedural tasks were not examined in a systematic manner, thus far, to the best of our knowledge.

A new experimental paradigm, the UNRAVEL paradigm, which principally seems to be suitable to address this question, was recently introduced by Altmann et al. (2014) . UNRAVEL is an acronym where each letter represents a step that needs to be executed in response to a complex stimulus, with the letter sequence cueing the correct order of steps of the sequence. The complex task stimuli in this paradigm are composed of a letter, a number, and a box with different features (e.g., font, color, location). The different steps that have to be performed from memory in correct order include responses to a total of seven questions concerning the features of the given stimulus. Thus far, this paradigm has primarily been used to study consequences of interruptions on serial task performance ( Altmann et al., 2014 , 2017 ; Altmann and Trafton, 2015 ). For this purpose, the UNRAVEL task was repeatedly interrupted between steps by a simple interruption task. In order to investigate the performance consequences of these interruptions, the time needed to resume the task (resumption time), the number of sequence errors (i.e., instances where the task was resumed at the incorrect step), and the number of non-sequence errors (i.e., instances where the task was resumed at the correct step, but with the wrong response) were assessed. The obtained results replicated the standard effects in interruption research, namely that the adverse effects of interruptions, i.e., prolonged resumption times and an elevated risk to commit a sequence error, become worse with increasing duration of the interruption task ( Altmann et al., 2017 ). In addition, two aspects of the results suggest that the mnemonic acronym supporting the task, might have made a difference in performing this task. First, even though the UNRAVEL task poses comparatively high memory demands, the observed rates of sequence errors after interruptions were surprisingly low (4–16%), and essentially in the same range or only somewhat higher than the ones usually obtained with much less demanding primary tasks and comparable durations of short interruptions (e.g., Monk et al., 2008 ). This suggests that the availability of the acronym could have compensated for the higher memory demands of the UNRAVEL task, compared to a condition where the acronym would not have been available. Second, an analysis of performance at different steps of the UNRAVEL task revealed an interesting incidental finding. Namely, the risk of sequence errors was relatively low particularly for the first (U) and last (L) step of the task whereas more sequence errors were committed at the middle steps, even when no interruption preceded the step directly. The authors suggest that the obtained patterns were due to the mnemonic acronym and its structure, which, they assume, have organized the task hierarchically in accordance with the word boundaries of the acronym ( Altmann et al., 2014 ). However, since no control condition (i.e., without an acronym) was included in this interruption research, any conclusions concerning the possible effect of the acronym on interruption performance seem to be hardly conclusive based on the available data of this previous work.

As far as we are aware, there is actually only one UNRAVEL study, thus far, which had included a no-acronym control group. However, this study did not focus directly on the impact of an acronym as mnemonic on performance ( Hambrick et al., 2018 1 . Instead, it addressed how individual differences in general ability impacted performance in a placekeeping task with vs. without activation of task-relevant knowledge. Despite the different aims of that study, a look at the data of the different conditions at least suggest that the no-acronym condition was somewhat more demanding than the acronym condition, as participants in the no-acronym group more often consulted the help option than in the acronym group. No differences in overall mean response times (RTs) and rates of sequence error were found between the conditions, though, which is in contrast to the assumption of a generally beneficial mnemonic effect of an acronym on the execution of a serial task. However, because the specific effects of a mnemonic on performance in serial tasks were not the primary aim of this study, the authors just used very general performance measures, not addressing any specific effects of the mnemonic on, for example, learning times, and resilience toward interruptions or task representation. Thus, the conclusions of this study must be considered as very limited with respect to the performance consequences of acronym mnemonics on serial task performance.

The current research aims at a first systematic investigation on the performance effects of a mnemonic acronym vs. no-acronym on learning and performing a procedural task with sequential constraints. For this purpose, we used a German adaptation of the UNRAVEL task and contrasted conditions with and without the mnemonic regarding three different aspects: the time needed for learning the task, the speed and accuracy of executing the task without an interruption, and the potential of the acronym to structure the task and to enhance the resilience of the task (or at least certain steps) toward detrimental performance effects after an interruption. Our adaptation of the UNRAVEL task used a similar task stimulus to the one used by Altmann et al. (2014) , but included a total of eight instead of seven task steps, which had to be performed in a certain order. In the acronym condition, the sequence of tasks building the procedure was represented by the acronym WORTKLAU, consisting of two single one-syllable German words, i.e., “Wort” (engl. word ) and “Klau” (engl. theft ). Enlarging the procedure to eight steps and using the 2-word acronym was chosen to make the task even more complex and to have an acronym with a salient semantic structure including a central position marked by word boundaries.

In the first experiment, participants performed the primary task either with the support of the acronym (from the learning phase on) or without an acronym. In the latter case, they had to learn the eight steps and their order without any sort of mnemonic technique provided. During performance of the task, we further varied whether or not interruptions of two different lengths occurred at different steps. First, we expected shorter learning times in the acronym condition compared to the condition where no acronym was available. Second, we predicted that having a support of a mnemonic acronym would lead to faster and more accurate execution of the whole sequence of steps compared to the situation without the acronym. This was expected based on the assumption that the sequential associations between steps would be improved by the availability of the acronym. Third, we assumed that availability of the mnemonic acronym would improve the resilience toward interruptions, namely, that resumption times would be faster, and sequence errors at the first step after an interruption would be less frequent, compared to the no-acronym condition. Based on the assumption that acronyms indeed facilitate the rehearsal of where the primary task was interrupted and also provide a salient internal cue to re-activate the task goal at the correct step, this effect should occur independently of the length and position of an interruption. Finally, we assumed that the inherent semantic structure of the acronym would also organize the cognitive representation of the task. That is, we assumed that the mnemonic acronym consisting of two words would facilitate a sort of chunking, i.e., dividing the procedural task into two subunits in accordance with the word boundaries within the acronym. In that case, this should be reflected in a faster learning time and an even higher resilience toward interruption effects, particularly for interruptions occurring at the central position, compared to interruptions elsewhere during the task. This is suggested by the observations of position effects in the UNRAVEL paradigm and previous findings that interruptions are less disruptive if they occur after the completion of subtasks compared to the ones positioned within subtasks ( Monk et al., 2002 , 2004 ; Botvinick and Bylsma, 2005 ; Bailey and Konstan, 2006 ). Whereas the results of the first experiment allowed for an evaluation of most of these hypotheses, the observed effects were somewhat ambiguous with respect to the effects of the acronym on the mental representation of the task. Thus, a second experiment was run, in which the inherent structure of the acronym was made even more salient by use of a hyphen (“WORT-KLAU”).

Experiment 1

Materials and methods, participants.

Seventy four university students, ranging in age from 18 to 30, participated in the study. They were randomly assigned to two groups. 36 participants (23 female, 11 male; M = 24.97, SD = 2.97) performed the task with support of an acronym and the remaining 38 participants (24 female, 14 male; M = 25.16, SD = 2.85) performed the task without an acronym. A sample size of 32 participants per group was determined, based on a G-power sample size calculator ( Faul et al., 2007 ) for α = 0.05, power of 0.95, and an effect size of 0.20. Such effect size is in the range of previously reported effect sizes for main performance effects of interruption presence and length (e.g., Altmann et al., 2014 , 2017 ). However, no predictions regarding the sizes of specific effects of providing an acronym could be drawn from previous studies and, thus, were only assumed to be in the same range. Participants were recruited through a web portal of Technische Universität Berlin. For participation in the experiment, a course credit or monetary compensation were offered.

The primary task was a German adaptation of the UNRAVEL task introduced by Altmann et al. (2014) . It follows the same general approach and objectives as the original task, but also takes into account experiences of previous research with this task ( Altmann et al., 2014 ; Altmann and Trafton, 2015 ). As the UNRAVEL task, the German version also requires participants to respond to a complex stimulus with a number of sequential responses which have to be performed from memory in a predefined order. The stimuli of the primary task correspond to the original stimuli of the UNRAVEL task, with features adapted to a new and enlarged set of choice rules that have to be applied in a given sequence without any cues. That is, each stimulus consists of a dot, a number (1, 2, 3, or 9), a letter (A, B, U, or X) and a box, which differ according to eight different features: color of the dot (white or black), font style of the number (underlined or not), color of the letter/number (red or blue), position of the letter/number outside of the box (above or below), sound of letter (consonant or vowel), font style of the box (dotted or lined), position of the letter in the alphabet (near to the beginning or the end), and parity of the number (odd or even) (see Figure 1 ).

www.frontiersin.org

Figure 1. Two examples of task stimuli of the primary WOTRKLAU task.

In response to the stimulus, the participant has to go through a sequential list of eight choice rules, corresponding to the different features, and to type the correct responses in a standard keyboard in a prescribed order. As a mnemonic technique to support learning, retention, and correct execution of the sequence, the WORTKLAU acronym was used. It can be considered as a sort of a first-letter mnemonic representing the sequence of operations of the primary task by the respective first letter of one of the response options, corresponding to the logic of the acronym in the UNRAVEL task. The choice rules, the corresponding responses and the association with the acronym are shown in Table 1 .

www.frontiersin.org

Table 1. List of steps, choice rules, and possible answers in WORTKLAU task translated from German to English. Possible answers that form the acronym are provided in both German (direct link to the acronym by first letter of one of the alternatives) and English.

Compared to the UNRAVEL paradigm, the number of steps to be performed in response to each stimulus has been enlarged by one to a total of eight steps in the German WORTKLAU adaptation. This difference has two important consequences: the memory demands of the WORTKLAU task are even higher than in the UNRAVEL task and the acronym is composed of two single words of the same length (i.e., WORT and KLAU; corresponding to the English words word and theft ), which provides a semantic structure to the acronym by dividing it in two parts. The latter makes it possible to study possible effects of the acronym structure on the task execution in a controlled way.

A numerical 2-back task ( Moore and Ross, 1963 ) was used as interruption task. In this task, participants are presented with series of single numbers and need to respond when a presented number equals the one presented two places before. The task places relatively high demands on working memory by requiring a running memory update with each new number presented. It has been used in other interruption research before in order to suppress or at least hinder active rehearsal of where an interruption occurred in a primary task (e.g., Monk et al., 2004 , 2008 ).

Participants were tested individually in the Human Performance Laboratory of the Chair of Work, Engineering and Organizational Psychology at Technische Universität Berlin. After signing an informed consent and filling in a demographic questionnaire addressing basic biographic characteristics (e.g., age) and relevant experiences (e.g., typing skills), participants were introduced to the WORTKLAU task. In the no-acronym group, the pre-defined order of choice rules was presented, but the sequence of response options was mixed so that forming an acronym from them was not obvious. In contrast, the acronym group was introduced to the mnemonic acronym as support for memorizing the different choice rules in the correct order. Afterward, in both groups followed a short practice phase including five trials of the task, which had to be performed with support of a handout describing the sequence of choices to be made. Immediate feedback on accuracy was provided on the screen after each response. After the practice trials, participants continued with reading of instructions and familiarizing with the interruption task. After the 2-back interruption task was introduced, a short practice trial (1 min) followed.

After this familiarization phase, participants had to pass a knowledge test addressing the procedure and choice rules of the sequential WORTKLAU task. However, before taking this test they could take as much time as they needed to learn the sequence. Participants, who then passed the knowledge test, directly proceeded to the final training without feedback, which consisted of eight WORTKLAU trials with five trials being interrupted after different steps. All other participants got additional learning time before they repeated the knowledge test and could start with the final training block. All participants passed the knowledge test at the second try.

After the final training block, participants had a short break that was followed by the experimental data collection. This main part of the experiment consisted of three experimental blocks, with 24 WORTKLAU trials per block, i.e., 72 trials in total. In each block, 20 trials were interrupted. Interruptions could occur at five different positions in the WORTKLAU sequence (i.e., before steps R, T, K, L, A), each interruption lasting for either 6 or 30 s. That is, each Position × Length combination of interruptions was presented twice per block. The remaining four trials per block, i.e., 12 trials in total, were not interrupted. These were used for assessing effects of the acronym on uninterrupted performance and also used as baseline for calculating interruption effects. Interrupted and uninterrupted trials were mixed randomly. Participants were instructed to proceed as quickly and accurately as possible through the different steps of the WORTKLAU task. In case of errors, they should not correct them, but continue working through the sequence. The interruption task always appeared immediately upon the response to one of the steps of the WORTKLAU task, and replaced the WORTKLAU stimulus fully. During the 2-back interruption task, one number at a time was presented in the center of the screen as a part of short (4 items) or long (20 items) series with a presentation rate of 1.5 s. Immediately after the last item of the 2-back series, the stimulus of the primary task was presented again, and participants were required to resume the primary task as soon as possible at the correct step, i.e., the step that should have followed the last performed step before the interruption. After the last step of a trial in the primary task was performed and before the new stimulus was shown, a blank screen appeared for 300 ms. On average, a complete experimental session lasted 90 min. Before leaving, with each participant a structured interview was conducted, which addressed strategies they used for learning and execution of the task (e.g., “Did you try to divide the task sequence into different parts?,” “Was any position for you especially easy to resume after an interruption?”). In addition, participants subjectively assessed their own performance in the primary and the interruption tasks on simple four-point Likert scales.

For examining the effects of a mnemonic acronym on learning times, on overall performance in the uninterrupted trials of the primary task, and on post-interruption performance, we contrasted the performance in the acronym group working with support of the WORTKLAU acronym, with the performance in the no-acronym (control) group.

For investigating the effects of the mnemonic acronym on resilience toward interruptions a 2 (Group) × 2 (Length) × 5 (Position) mixed factorial design was used. The first factor was defined as a between-subjects factor representing the acronym and no-acronym groups. The second factor was defined as a within-subjects factor, representing the length of interruption (6 vs. 30 s). The third factor was again a within-subjects factor and included five levels corresponding to the position in the sequence of response where an interruption occurred (before steps R, T, K, L, A).

Dependent Variables

A set of overall eight performance measures were used to assess the impact of the acronym on different aspects of performance including learning, performance during uninterrupted trials, and consequences of interruptions:

Learning time

This variable was used to assess a possible impact of the acronym on the time needed to learn the correct sequence of choice rules of the primary task. It was defined as the time passed between the end of the first familiarization phase and the successful pass of the knowledge test, including the extra time needed if the first trial of the knowledge test failed. Operationally it was measured based on the time stamps sampled in the logfile of the experiment, indicating the end of the last 2-back practice trial and the beginning of the final practice block, respectively. In order to be able to control for differences in pure reading speed, the time needed for reading the instructions in the familiarization phase, defined by the time passed between the end of the first practice block of WORTKLAU task and beginning of the 2-back training trials, was also assessed via time stamps sampled in the logfiles of the experiment.

Completion time

Completion time was defined as the mean of RTs needed to complete the different steps of the primary WORTKLAU task in trials where no interruptions occurred. For the first step of each trial, RTs were defined as the time (in ms) passed from the occurrence of the new task stimulus until the first response provided. For all following steps, the RT was assessed by the length of inter-response interval (IRI, in ms), elapsed since the preceding response. Only the steps answered correctly were included in this measure.

Sequence errors

This measure was defined as the overall mean proportion of all responses to the different steps within uninterrupted trials, where a participant deviated from the prescribed order of the steps, by either missing the steps (e.g., going directly from the W to the R step) or repeating a step.

Non-sequence errors

This measure was defined as the overall mean proportion of all responses to the different steps within uninterrupted trials, where a participant provided a response to a given step at the correct position of the trial, but the response was false (e.g., the stimuli presented contained a white dot, but the participant pressed the S instead of W key).

Resumption time

Resumption time was defined as the time needed to return to a certain step of the primary task after an interruption. Based on all interrupted trials, it was calculated for each given post-interruption step (R, T, K, L, or A) individually, by subtracting the mean inter-response-interval for this step in the uninterrupted trials from the time passed between the reappearance of the primary task stimulus and the response to this step on the keyboard after an interruption. Only correct responses were considered for this measure.

Post-interruption sequence errors

This measure included the proportion of sequence errors (i.e., omitting or repeating a step) occurring at the different steps after an interruption.

Post-interruption non-sequence errors

This measure included the proportion of non-sequence errors (i.e., falsely responding to the correct step) occurring at the different steps after an interruption.

Interruption task performance

Interruption task performance was measured through correct hits, when participants responded to the stimuli in the task rightfully, and correct rejections, when participants correctly did not respond to the stimuli presented, and it was expressed as a percentage.

In addition to these performance measures a number of further, mainly explorative variables were derived from the structured post-experimental interview, including percentages of different chunking strategies deliberately applied by the participants. Finally, a set of control variables used to identify possible basic differences between the experimental groups included subjective ratings of performance in the primary and in the interruption tasks, and selected items of the demographic questionnaire, like age and subjective ratings of typing proficiency.

A total of nine participants were excluded from further analyses for either systematic non-sequence errors in the K step (consonant-vowel) of the trial, for using non-mnemonic strategies to conduct the task (i.e., fingers pointing to the correct answer at the keyboard during the interruption), or for inability to analyze all reaction times per situation due to the high number of errors. Thus, the results presented in the following are based on the data of 33 participants in the acronym group and 32 participants in the no-acronym group. The two groups neither differed in age [ M = 24.97 in the acronym and M = 25.38 in the no-acronym group, t (63) = 0.55, p = 0.58], nor in their proficiency of typing skills [2.73 vs. 2.94, t (61.62) = 1.18, p = 0.24]. In addition, the two groups also did not differ with respect of their subjective rating of their performance in the primary task [2.94 vs. 3.03, t (64) = 0.64, p = 0.52] and in the interruption task [2.21 vs. 2.34, t (64) = 0.88, p = 0.38].

With regard to RT measures in the uninterrupted trials, all RT shorter than 500 ms or larger than 3 standard deviations (SD) from the mean, calculated for each step and each participant, were excluded, resulting in excluding 0.03% RTs in the acronym group, and 0.05% in the no-acronym group. For post-interruption, RTs (between the re-occurrence of the primary task stimulus after interruption and the first response), also all times shorter than 500 ms were excluded, resulting in exclusion of 0.03% in the acronym and 0.04% in the no-acronym group.

Learning Time

All learning and reading times that were 3 SD above or below the group means were excluded from the analysis. This resulted in the suspension of one participant due to a long reading time in the acronym group, and three participants due to long learning and reading times in the no-acronym group. In the acronym group, the mean learning time of the remaining participants was 910.50 s ( SD = 145.42) compared to 1150.52 s ( SD = 320.17) in the no-acronym group. An analysis of covariance (ANCOVA) with group as fixed factor and reading times as covariate, revealed a significant difference in learning times between the groups, F (1,58) = 13.53, p = 0.001, η p 2 = 0.19, whereas reading time was not statistically significant, p = 0.21.

Uninterrupted Primary-Task Performance

A complete overview of the performance measures calculated for each step of the primary task in the uninterrupted condition, together with the three derived overall performance scores on trial level are shown in Table 2 .

www.frontiersin.org

Table 2. Means and standard errors (in brackets) of response times, proportion of sequence errors, and proportion of non-sequence errors at each step of the task in uninterrupted trials. In addition, resulting mean completion times and mean overall proportion of sequence and of non-sequence errors are shown for both conditions at the bottom of the table.

Completion times for the different steps in the 12 WORTKLAU trials without interruptions were only descriptively faster in the acronym ( M = 2265 ms; SE = 124) than in the no-acronym group ( M = 2291 ms; SE = 99). This corresponds to an average time needed to complete a whole WORTKLAU of 18.12 and 18.33 s, respectively. A t -test contrasting the mean completion times in both conditions did not reveal the difference as significant, t (63) = 0.17, p = 0.87.

As expected, the mean proportion of sequence errors committed at the different steps of the WORTKLAU task was about half in the acronym group ( M = 0.009, SE = 0.005), compared to the no-acronym group ( M = 0.023, SE = 0.005). However, the proportions of sequence errors were very low in both groups and contrasting these means by a t -test the difference just failed to reach the usual level of significance, t (38.74) = 1.84, p = 0.074.

Non-sequence errors followed the same pattern as sequence errors, being lower in the acronym group ( M = 0.012, SE = 0.015) than in the no-acronym group ( M = 0.017, SE = 0.035). However, also this difference was too small to reach statistical significance, t (63) = 0.83, p = 0.41.

Performance in Interrupted Trials

An overview of performance measures calculated for each post-interruption step of the primary task in the conditions with short and long interruptions is shown in Table 3 .

www.frontiersin.org

Table 3. Means and standard errors (in brackets) of post-interruption performance measures (response time, resumption time, proportion of sequence, and proportion of non-sequence errors) separately for each position of interruption and both interruption lengths in the acronym and no-acronym group.

Resumption times

The 2 (Group) × 2 (Length) × 5 (Position) ANOVA, revealed significant main effects of length of interruption F (1,63) = 105.77, p < 0.001, η p 2 = 0.63, and position of interruption, F (4,252) = 4.52, p = 0.002, η p 2 = 0.07, as well as a Group × Position interaction, F (4,61) = 3.05, p < 0.018, η p 2 = 0.05. No other effects became significant, all p > 0.10. As expected, long interruptions led to longer resumption times ( M = 3651 ms, SE = 256) compared to short ones ( M = 1999 ms, SE = 145). The effects of group and interruption position on the resumption time, including their interaction, are shown in Figure 2 . As becomes evident, resumption times were different, dependent on the position at which the interruptions occurred, with the shortest resumption times in both groups when the interruption occurred at the center position. However, this latter effect was somewhat more pronounced in the acronym group than in the no-acronym group. A planned t -test for paired samples contrasting the resumption time for interruptions at the central position ( M = 1765 ms, SE = 277) and the mean of all other positions ( M = 2710 ms, SE = 265) revealed a significant effect in the acronym group, t (32) = 3.24, p = 0.003, whereas the same comparison failed to become significant in the no-acronym group ( M = 2591 ms, SE = 325 vs. M = 2948, SE = 296), t (31) = 1.41, p = 0.17. However, also the marked increase of resumption times for interruptions at position #5 (“L” step) compared to the center position, which is visible in the acronym group, but absent in the no-acronym group, might have contributed to the interaction effect.

www.frontiersin.org

Figure 2. Resumption times and standard errors of the acronym and no-acronym group for different interruption positions.

The 2 (Group) × 2 (Length) × 5 (Position) ANOVA revealed significant main effects of interruption length, F (1,63) = 110.73, p < 0.001, η p 2 = 0.64, and of position F (4,252) = 3.72, p = 0.006, η p 2 = 0.06, as well as a significant Length × Position interaction, F (4,252) = 2.97, p = 0.02, η p 2 = 0.04. No other effects became significant, all p > 0.06. In accordance with the result obtained for resumption times, long interruptions led to higher rates of sequence errors ( M = 0.223, SE = 0.017) compared to short ones ( M = 0.068, SE = 0.007), and this effect emerged independently of the acronym availability. The mean rates of sequence errors reflecting the Length × Position interaction are shown in Figure 3 . As becomes evident, in case of short interruptions, the rate of post-interruption sequence errors was generally low and did not vary much dependent on the position of the interruption. However, for long interruptions, the rate of sequence errors differed across positions, with the lowest error rates after interruptions at the positions #3 and #4 (center). A post hoc t -test for paired samples contrasting the mean error rate at the central position with the mean of all other positions was conducted for the two interruption lengths separately. With short interruptions, no differences were found, t (64) = 1.34, p = 0.18, whereas the analysis for long interruptions revealed less sequence errors at the central position ( M = 0.170, SE = 0.024) compared to the mean of all others ( M = 0.240, SE = 0.018), t (64) = 2.72, p = 0.008. However, whether or not an acronym was available to support the execution of the procedure did not make a difference.

www.frontiersin.org

Figure 3. Proportion of sequence errors and standard errors of the acronym and no-acronym groups together at different interruption positions.

Previous studies have shown that interruptions may not only raise the risk of sequence errors, but specifically sequence errors in form of repeating a step (in the terms used by Altmann et al., 2014 perseveration error) instead of skipping a step (anticipation error), whereas the latter was found to be more characteristic in uninterrupted trials. Thus, we performed an exploratory post hoc analysis investigating whether the provision of a mnemonic acronym would make a difference in this respect. A 2 (Group) × 2 (Context: with vs. without interruption) × 2 (Error type: repeating vs. skipping) ANOVA only revealed a significant main effect of context, F (1,63) = 135.51, p < 0.001, η p 2 = 0.68, and a Context × Error type interaction, F (1,64) = 5.74, p = 0.02, η p 2 = 0.08. Overall, participants more often committed a sequence error after an interruption ( M = 0.073, SE = 0.005) than in the uninterrupted trials ( M = 0.010, SE = 0.002). In context of uninterrupted trials, participants were more likely to skip ( M = 0.016, SE = 0.003) than to repeat a step ( M = 0.005, SE = 0.001), while an opposite tendency was found in the post-interruption context ( M = 0.068, SE = 0.006 vs. M = 0.079, SE = 0.008). Neither the main effect of group, nor any interaction effect of group and the other factors became significant (all p > 0.35).

The mean rate of post-interruption non-sequence errors was generally low (<0.03) in both groups with only few variations induced by the experimental conditions. Thus, we assumed the variations reflected in this measure as just random and resigned to analyze non-sequence errors statistically.

Mean accuracy in the interruption task was ranging between 82 and 100% ( M = 93.95%, SE = 0.77) in the acronym group, and between 68 and 99% ( M = 91.78%, SE = 1.02) in the no-acronym group. Despite the trend of somewhat lower performance in the no-acronym group, a t -test for independent samples showed no significant differences in 2-back task between the groups, t (64) = 1.71, p = 0.09. In order to examine possible relationships between the performance in the interruption task and the post-interruption performance, a Pearson’s product-moment correlation coefficient was computed for each group separately. In the acronym group, the accuracy in the interruption task did not correlate with the mean resumption time, r = 0.10, p = 0.60, n = 33, nor with the mean proportion of post-interruption sequence errors, r = 0.03, p = 0.86, n = 33. However, in the no-acronym group, a significant correlation between the accuracy in the interruption task and the mean resumption time was found, r = 0.62, p < 0.001, n = 32. That is, higher accuracy in the interruption task was related to longer resumption times after the interruption. No correlation between interruption task performance and mean proportion of post-interruption sequence errors was found, r = −0.04, p = 0.81, n = 32.

Post-experimental Interview

Use of chunking strategy.

Chunking the task into subtasks in the learning and the execution phase was the common strategy in the acronym group, employed by 79% participants: 33% of participants reported to have split the task into two halves corresponding to the two words building the acronym (WORT – KLAU) and 39% of the sample split the acronym in three parts (WO – RT – KLAU), also based on the semantic structure of the acronym, but including the word “wo” (engl. “where”) as a separate part. The remaining 6% reported to have used some other sort of chunking strategy (e.g., 4 × 2 steps). On the other hand, only 34% of participants in the no-acronym group employed task chunking as a strategy: 13% of participants split the task in two halves, 10% of participants split the task in 4 × 2 steps, and the rest (9%) used some other way of chunking the task. On a descriptive level, shorter learning times emerged in a subgroup of participants who employed some kind of task chunking compared to a subgroup who did not, within each experimental group (acronym group: 899.73 vs. 990.00 s; no-acronym group: 988.44 vs. 1241.45 s).

Ease of resumption

Participants also reported whether any interruption position was particularly easy to resume. In the acronym group, 30% of participants reported the central position as especially easy to resume, and 36% of participants reported the central position in addition to some other as being particularly easy. In contrast, only 22 and 13% participants of the no-acronym group reported the same benefit of these positions. The remaining 22 and 19% of participants in the acronym and in the no-acronym group, respectively, did not reported any position as specifically easy to resume, and the rest of participants reported some other positions or their combinations.

The aims of the present research were to investigate effects of the availability of a mnemonic acronym on learning and execution of a procedural task, the resilience toward detrimental effects of interruptions, and the impact of the structure of a mnemonic acronym on the mental representation of the task.

Let us first consider the effects of learning and execution of the eight-step procedure in the uninterrupted trials. Based on knowledge gained from research on memory for order ( Nelson and Archer, 1972 ; Morris and Cook, 1978 ), beneficial effects of the mnemonic acronym were expected to emerge in the time needed to learn the procedure. In accordance with this hypothesis, the acronym group acquired the procedure significantly faster than the no-acronym group. This finding is in line with previous research on memory for order, showing positive effects of mnemonic acronyms on memorization of the order of verbal items ( Morris and Cook, 1978 ). However, no beneficial effects of the acronym were found with respect to completion time and error rates, confirming recent observations in the study of Hambrick et al. (2018) . Thus, our additional hypothesis that mnemonic acronyms might also serve as process mnemonics that promote speed and accuracy of task execution through strengthening associations between the steps of the procedure ( Malhotra, 1991 ) was not supported by the data. Obviously, once the procedure was learnt, no additional benefit of the cuing structure was provided by the mnemonic acronym during the actual execution. This suggests that mnemonic acronyms can serve well as learning mnemonics supporting the establishment of declarative knowledge concerning the set and sequence of rules of a sequential task. However, the transfer of this knowledge to the actual active execution of the task ( Kieras and Bovair, 1986 ) does not seem to be supported further by the availability of a mnemonic acronym.

A second aim of the experiment addressed possible effects of mnemonic acronyms on the resilience of executing procedures toward interruptions. More specifically, we assumed that acronyms would generally improve the rehearsal of where in the sequence the interruption occurred and provide cues for a better re-activation of the correct step to-be-performed next after the interruption. The results did not support this assumption, as no differences between the groups were found in terms of overall resumption times and post-interruption sequence errors. This was also reflected in the subjective ratings of primary task and interruption task performance where no significant differences emerged between the two groups. However, the finding of the explorative analysis regarding the different relationship between the accuracy in the interruption task and resumption time in the two groups revealed a more subtle effect of the mnemonic acronym, which suggests that the availability of a mnemonic might have facilitated the rehearsal of primary task goals during the interruption phase. Nonetheless, in the no-acronym group participants who were more accurate in the interruption task needed more time to resume the primary task after the interruption and vice versa, and no such mutual dependence was found in the acronym group. This suggests that the mnemonic acronym could have provided simple rehearsal cues, which helped to reduce possible interference effects between performing the 2-back task and rehearsal of the primary task goal during the interruption phase, i.e., allowed for similar resumption performance independent of how much priority was given to the performance in the 2-back task.

Finally, we expected that the structure of the acronym would affect the mental representation of the task. More specifically, we assumed that the word boundaries included in the acronym consisting of two words (WORT, KLAU) would lead to a chunking of the procedure in at least two parts, based on the meaning of the words within the acronym. This then should be reflected in faster and more accurate post-interruption performance at the central step of the procedure compared to the others. No such effects were expected in the no-acronym group. The results do not seem to be fully conclusive in this regard. Based on the post-experiment interview, the vast majority participants (about 80%) in the acronym group deliberately used the semantic structure of the acronym in one way or the other to divide the procedure in different chunks, whereas only a minority of participants in the no-acronym group reported to do so. Albeit this is in general accordance with our hypothesis, it was not as clearly reflected in the performance data. Here, a significant Group × Position effect emerged for resumption times, but was not easy to explain since also marked differences at positions other than the central position might have contributed to this effect. Furthermore, no comparable effect was found for the post-interruption sequence errors. Thus, before accepting the hypothesis that providing mnemonic acronyms as a tool to structure the mental representation of tasks, we conducted a second experiment where we made the word boundary within the acronym WORTKLAU even more salient.

Experiment 2

The results of the first experiment provided some evidence that the semantic structure of the mnemonic acronym affected the mental representation of the task. In order to investigate this possible effect further, we even emphasized the word boundary of the acronym WOTKLAU by introducing a hyphen between the words WORT and KLAU of the acronym in the learning. We expected that such subtle, but salient manipulation would be more effective than just the internal semantic structure of the acronym to structure the mental representation of the task in two halves, corresponding to the word boundary in the central position of the acronym. During the execution of the task, this should be reflected in a considerable higher resilience toward interruptions, specifically reflected in clearly reduced resumption times and sequence error rates for interruptions taking place at the central position (i.e., before step “K”) than all other positions. However, learning times and performance at uninterrupted trials of the primary task are expected to remain unaffected and to replicate the results obtained in Experiment 1 for the acronym group.

Twenty university students (seven female; M = 25.65, SD = 3.31), ranging in age from 19 to 30, participated in the study. Participants were recruited through a web portal of Technische Universität Berlin. For participation in the experiment, a course credit or monetary compensation were offered.

Task and Procedure

Tasks and procedure were the same as in the acronym group in Experiment 1. The only difference regarded the learning phase of the experiment, where the acronym “WORTKLAU” was replaced with “WORT-KLAU.”

For investigating effects of the mnemonic acronym with salient central position on resilience toward interruptions a 2 (Length) × 5 (Position) within-subjects factorial design was used. The first factor was defined as a within-subjects factor, representing the duration of interruption (6 vs. 30 s). The second factor was another within-subjects factor and included five levels corresponding to the position in the sequence of response where an interruption occurred.

Learning times, uninterrupted primary-task performance measures (completion time, sequence, and non-sequence errors), and post-interruption performance measures (resumption times, post-interruption sequence and non-sequence errors, and interruption task performance) were calculated in the same way as in Experiment 1. In addition, the same explorative and control variables as in the first experiment were included, i.e., percentages of different chunking strategies deliberately applied, subjective ratings of performance in the primary and in the interruption tasks, age, and subjective ratings of typing proficiency.

Due to a high number of errors, leading to missing resumption times in certain conditions, one participant was excluded from the further analyses. The mean subjective ratings of typing speed, primary task performance and interruption task performance of the remaining participants were 2.74 ( SE = 0.13), 3.0 (.13) and 2.21 (.14), and, thus, replicated the mean ratings of the acronym group of the first experiment almost exactly (2.73; 2.94; 2.21, respectively). Applying the same criteria for outlier correction as in Experiment 1, in total 0.02% single values were excluded from RTs in uninterrupted trials, and 0.02% values from the post-interruption RTs.

Learning Time and Uninterrupted Primary-Task Performance

Learning time, completion time, sequence, and non-sequence errors.

Because no experimental manipulation in this experiment addressed learning time and the primary task performance in uninterrupted trials, only descriptive statistics are reported. Mean learning time was 999.21 s ( SE = 46.28), which was close to the mean learning time needed in the acronym group of Experiment 1 (910.50 s). Similarly, also the baseline performance scores achieved in the primary task in uninterrupted trials more or less replicated those of Experiment 1. The mean completion time per step in uninterrupted trials was 2716 ms ( SE = 215) corresponding to the average time needed to complete a whole WORT-KLAU trial of 21.73 s ( SE = 1.72). The mean errors rates were 0.014 ( SE = 0.004) for sequence errors and 0.011 ( SE = 0.002) for non-sequence errors.

The effects of interruptions on mean resumption times are shown in Figure 4 . A 2 (Length) × 5 (Position) ANOVA for repeated measures only revealed the two main effects as significant, Length: F (1,18) = 32.27, p < 0.001, η p 2 = 0.64; and Position: F (4,72) = 3.27, p = 0.016, η p 2 = 0.15. As expected and as becomes evident from Figure 4 , long interruptions led to considerably longer resumption times ( M = 4185 ms, SE = 472) than short ones ( M = 2089 ms, SE = 250). In addition, resumption times differed across positions with quickest resumptions after interruptions at the central position. A t -test for paired samples contrasting the mean resumption time at the central position ( M = 2041 ms, SE = 448) with the mean of all other positions ( M = 3222 ms, SE = 343), revealed this difference as significant, t (18) = 3.21, p = 0.005.

www.frontiersin.org

Figure 4. Resumption times and standard errors for short and long interruption length at different interruption positions.

Effects of Length and Position of interruptions on post-interruption sequence errors are shown in Figure 5 . As becomes evident, again both factors obviously affected the risk to commit such errors, yet in a somewhat different way across positions, dependent on the length of interruption. The ANOVA revealed significant main effects of Length F (1,18) = 42.12, p < 0.001, η p 2 = 0.70, and Position F (4,72) = 7.67, p < 0.001, η p 2 = 0.30, as well as an interaction effect F (4,15) = 2.80, p = 0.032, η p 2 = 0.13. As expected, long interruptions lead to more sequence errors ( M = 0.225, SE = 0.031) compared to short ones ( M = 0.064, SE = 0.014). Regarding the Position effect, it becomes evident from Figure 5 that, independent of the length of interruption the risk of sequence errors was lowest at the central positions with actually perfect performance after short interruptions. This was confirmed by a post hoc t -test for paired samples, contrasting mean sequence errors after interruptions at the central position with the mean errors after interruptions at other positions which became significant for both interruption lengths: short interruptions, t (18) = 4.33, p < 0.001, and long interruptions, t (18) = 3.93, p = 0.001. However, for short interruptions also the risk of sequence errors after interruptions at the second position was almost zero ( M = 0.009, SE = 0.009), and also this mean differed significantly from the means of positions #2, and #6, all p < 0.003.

www.frontiersin.org

Figure 5. Proportion of sequence errors and standard errors for short and long interruption length at different interruption positions.

Mean error rates of post-interruption non-sequence errors were generally low (<0.06%) and were not further analyzed statistically.

Mean accuracy in the interruption task was ranging between 82 and 99% ( M = 92.38%, SE = 1.05). Corresponding to the results of the acronym group in Experiment 1, the accuracy in the interruption task did neither correlate with the mean of resumption times, r = 0.22, p = 0.37, n = 19, nor with the mean proportion of post-interruption sequence errors, r = −0.18, p = 0.46, n = 19.

Chunking the task into halves as a deliberately applied strategy during the learning and the execution phase, directly corresponding to the emphasized semantic structure of the primary task, was explicitly reported by 42% of the participants. Only one participant reported some other chunking pattern (3+5). However, 53% of the participants did not report a use of any chunking strategy. In a post hoc analysis, t -test for independent samples revealed no differences in learning times between the subgroup of participants who used some kind of chunking strategy (1165.11 s) and participants who did not chunk the task (1266.80 s). However, the trend observed was in accordance with the finding in Experiment 1, where the subgroup who employed chunking was faster in learning than the subgroup who did not.

Thirty nine % of the participants reported the central position as especially easy to resume after an interruption, and 33% of participants reported the central position in addition to some other. In contrast, only 28% of participants reported that no position was particularly easy to resume.

The main objective of Experiment 2 was to examine the effects of a more salient semantic structuring of the mnemonic acronym on the mental representation of the sequential WORTKLAU task. Compared to the acronym used in the first experiment, the two-word structure of the acronym was made more salient by simply including a hyphen at the boundary between the two words WORT and KLAU. It was expected that this would lead to a structured mental representation of the WORTKLAU task, consisting of two parts, represented by the word. This, in turn, was expected to make the task more resilient toward interruptions at the central position, reflected in shorter resumption times and less sequence errors when resuming the primary task after interruptions at the central position, compared to interruption at other steps. The obtained results provide support for this hypothesis. Independent of the length of interruptions, and more clearly visible than in the first experiment, the primary task was resumed faster and more accurately after the interruptions at the central compared to the mean of all other positions. This effect was most marked for long interruptions, where the mean rate of post-interruption sequence errors at the central position dropped to only 8%. For short interruptions, the rates of post-interruption sequence errors were relatively low anyway, but actually zero for all participants at the central position. Taken together, these results confirm the findings obtained in the acronym group of Experiment 1, which already suggested that the semantic structure of an acronym provided as a mnemonic for a sequential task might also affect the structure of mental representation of the task. The fact that this effect was more strongly reflected in the performance measures than in the subjective reports of the participants concerning the use of a deliberately chosen strategy suggest that it can occur without becoming subjectively aware.

General Discussion

The aim of the present study was to examine the potential of a mnemonic acronym to serve as a learning mnemonic for a sequential procedural task, and to serve as a process mnemonic during the task execution. Moreover, the goal was to investigate the potential of a mnemonic acronym to improve overall resilience toward interruptions by providing an easily accessible cue for rehearsal, as well as to improve resilience toward interruptions at certain steps by providing a structure to the mental representation on the procedure. To our knowledge, this is the first study that has addressed these questions directly in a systematic way.

The results of the two experiments provide direct empirical evidence for the beneficial effects of a mnemonic acronym as a support tool for learning. The two groups provided with different versions of the WORTKLAU acronym in the learning phase needed approximately 5 min less (on average) to learn the rules of the sequential procedure, compared to the participants of Experiment 1 who learned the sequence without the help of an acronym. These effects are in line with early studies on mnemonic acronyms that showed their positive effects in learning and reproduction of verbal material (e.g., Higbee, 2001 ; Stalder, 2005 ), especially when the order of items needs to be memorized ( Nelson and Archer, 1972 ; Morris and Cook, 1978 ), which was the key property of the task in our experiment. The effects suggest that the knowledge gained from serial verbal learning can be transferred directly to the learning of sequential procedural tasks, involving different steps to be performed in prescribed order. However, at least partially, the positive effects may also be explained by the structure of the complex WORTKLAU acronym, which could have enhanced the chunking of the task during learning and execution. Within each experimental group in both experiments, the subgroup that chunked the task was somewhat faster in learning compared to the subgroup that did not report such strategy. Although this difference only emerged descriptively and should not be overemphasized, it at least suggests that learning times in the acronym group did not only benefit from the support of the acronym as cue for coding the order and content of choice rules, but also from its structure supporting a hierarchical task organization. In addition, it cannot be excluded that also indirect benefits of the mnemonic acronyms, e.g., increase motivation to work on the task ( Stalder, 2005 ), might have contributed to the faster learning times in the acronym group.

A more advanced assumption involved the hypothesis that the mnemonic acronym might also serve as a process mnemonic improving the speed and accuracy of a procedural task execution. That is, we expected that the mnemonic acronym would provide a cuing structure, which strengthened the associations between successive steps of the task ( Malhotra, 1991 ). In that case, a faster and more accurate execution of the task sequence would be enabled. This expectation was mainly based on observational and field studies reporting the benefits of mnemonic acronyms for supporting learning, teaching, and executing of procedures ( Cook, 1989 ; Stalder, 2005 ; Bortle, 2010 ). However, the data of the two experiments do not support this assumption. In neither experiment, the groups performing the primary task with support of the acronym did outperform the control group of Experiment 1 when performing the task in uninterrupted trials, i.e., both groups achieved the same levels of speed and accuracy. This suggests that the mnemonic acronym did only support the establishment of declarative knowledge in long-term memory, but failed to further support the transfer of the memorized sequence of rules in the sequence of response selections and actions required for the actual execution ( Kieras and Bovair, 1986 ). Theoretically, the execution of such sequential, procedural task is proposed to rely on mechanisms involved in order memory and serial recall (single mechanism theories, e.g., Burgess and Hitch, 1999 ; Brown et al., 2000 ; Botvinick and Plaut, 2006 ) or on a specific placekeeping ability involving two mechanisms – episodic and semantic memory ( Trafton et al., 2011 ; Hambrick and Altmann, 2015 ). Both groups of theories propose chain associations between the steps, where the execution of one step serves as a prime for the activation and execution of following steps. The results of the present study suggest that, once a sequential procedural task is learnt, with or without the support of an acronym, the sequential associations between successive steps are already strong enough to serve this assumed cueing and priming mechanisms sufficiently, rendering all additional effects of an acronym negligible.

A third set of assumptions concerned the possible effects of a mnemonic acronym to improve the resilience of a procedural task toward interruptions and to affect the structure of mental representation of the task. A general higher resilience toward interruptions was expected based on the assumption that rehearsal of a pending goal during the interruption task would be enhanced by a simple internal cue provided by the acronym, leading to elevated activation of the primary task goal during the interruption ( Altmann and Trafton, 2002 ). In addition, we assumed that the acronym might provide effective cues for reorienting and re-activation the correct step of the primary task after an interruption. However, no differences in post-interruption measures, neither resumption times nor sequence errors, were found between the acronym and the no-acronym group of Experiment 1. This suggests that a mnemonic acronym does not contribute to a better prevention of goal decay during the interruption phase, nor does it seem to be especially helpful as a cue for reactivating the primary task goal after the interruption. However, the observation that the performance in the interruption task and the time needed to resume the primary task were positively correlated across participants in the no-acronym (Experiment 1), but not across participants of the two acronym groups (Experiments 1, 2) suggests that the mnemonic acronym nevertheless affected the rehearsal processes in the interruption phase in some way. More specifically, it enabled participants to better uncouple the processes involved in the 2-back task from rehearsing the relevant primary task goals. Why such effect would not lead to better resumption performance is difficult to explain, though, and the interpretation should be considered with some caution, given that the correlations were based on relatively small number of participants in the different groups and a restricted variance of interruption task performance especially in the two acronym groups.

Even more specific effects on the resilience toward interruptions were expected due to the potential impact of the mnemonic acronym on the organization of the mental representation of a procedural task established during learning. Specifically, it was assumed that the semantic structure of an acronym would guide a sort of hierarchical mental task representation, which in turn would make a procedural task more resilient toward interruption at task steps representing a boundary in the semantic structure of the acronym. This assumption was supported by the data of both experiments. In the first experiment, post-interruption performance in terms of resumption times was better when the interruption was placed between the two separate words building the acronym, compared to interruptions at other steps. When the boundary between the acronym words were made even more salient (Experiment 2), the effect was replicated in both resumption times and post-interruption sequence errors. This finding is in line with previous studies, which examined the relationship between the hierarchical structure of a task and interruption effects ( Monk et al., 2002 , 2004 ; Botvinick and Bylsma, 2005 ; Bailey and Konstan, 2006 ). They usually found interruptions being less detrimental for performance, if they occurred between subtasks compared to within subtasks. These results were explained by reduced mental workload at subtask boundaries, resulting from previous subtask completion and not yet fully processing the incoming one ( Miyata and Norman, 1986 ; Wickens, 2002 ). Based on our research, it appears that mnemonic acronyms can induce a hierarchical mental organization of a complex procedural task with sequential constraints in different subtasks even if the task per se does not have such structure. However, considering the effect of this hierarchical organization on the resilience toward interruptions one should keep in mind that these effects were considerably smaller compared to the impact of interruption length. Whereas the observed sizes of the effects of position of interruptions ranged between 0.06 and 0.3 across the two experiments, the different lengths of interruptions produced considerable larger effects (0.63–0.70), which could not be completely attenuated by help of the mnemonic in either of the two experiments.

The interruptions in our experiments primarily were used to specifically assess the possible effects of mnemonic acronyms on the resilience of a procedural task toward interruptions. Apart from this, our results also contribute to interruption research in general. Independent of whether or not the mnemonic acronym was available, most of the performance consequences of interruptions previously described from research with the UNRAVEL tasks ( Altmann et al., 2014 , 2017 ) were confirmed again in our experiments. That is, resumption times and proportion of post-interruption sequence errors increased depending on the length of interruptions, with mean rates of sequence errors after short and long interruptions closely resembling the ones reported by Altmann et al. (2014 , 2017) . In addition, the somewhat higher prevalence of erroneous repetition of steps (perseveration errors) versus skipping of steps (anticipation errors) after interrupted compared to uninterrupted steps, previously reported by Altmann et al. (2014) is replicated in our research. This provides converging evidence for these phenomena to cross-validate the previous findings obtained in the UNRAVEL task using our modified German adaptation combined with a different interruption task.

Altogether, to our knowledge, the current study is one of the first attempts to examine extensively effects of mnemonic acronym on learning and execution of procedural task with sequential constraints, as well as resilience toward interruptions in an experimental setting. The results provide support for implementing mnemonic acronyms in the learning phase of a procedural task, as they can promote faster learning. However, once the task is learnt, no additional benefit of the acronym on plain execution of the task would be expected. Furthermore, it seems that a mnemonic acronym can also affect the mental representation of a serial task by dividing it in subtasks, which in turn may lead to a higher resilience toward interruptions at subtask borders. Thus, overall, the results provide evidence of limited and specific advantages of mnemonic acronyms in context of procedural tasks, which should be further consolidated in future research. Moreover, the finding that a hierarchically organized mental representation of a procedural task can help to make this task more resilient toward interruptions at certain positions also raises the question about other ways to achieve such organization. Besides providing mnemonic acronyms, for example, also a segmented learning of a procedure by organizing the steps in pairs or subgroups, or a temporal grouping similar to the one applied in previous research on memory for serial order (e.g., Parmentier and Maybery, 2008 ) might provide options to yield a hierarchical representation of a task and might be considered in future research.

Limitations of the current study involve the typical limitations of laboratory studies. Our participants were university students, who might be considered to represent an already highly selected population with respect to the level of their cognitive capabilities. However, in the context of the current study this might have made it rather more difficult to find beneficial effects of a mnemonic acronym. In addition, the WORTKLAU task used in our research to simulate a procedural task with sequential constraints certainly is an abstract laboratory task. We just assume that the cognitive demands of this task closely resemble the ones needed in many procedural tasks in everyday environments and applied settings. Nevertheless, the consequences of committing errors in task execution were not quite comparable to typical tasks outside the laboratory. Thus, further research should show whether the effects found in this research can be replicated with more representative samples and more realistic tasks in relevant field settings.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Ethics Statement

The studies involving human participants were reviewed and approved by the Ethics Committee of the Institute for Psychology and Ergonomics, Technische Universtät Berlin [Die Ethik-Kommission des Instituts für Psychologie und Arbeitswissenschaft (IPA) der TU Berlin]. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

TR and DM conceived the idea, planned the experiments, and contributed to the analysis of the results and writing of the manuscript.

This work was supported by Deutsche Forschungsgemeinschaft (Grant Number: MA 3759/5-1).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

  • ^ We are grateful to one of the reviewers to make us aware of this study.

Au, H. (2005). “Line pilot performance of memory items,” in Proceedings of the International Symposium on Aviation Psychology , Dayton, OH

Google Scholar

Altmann, E. M., and Trafton, J. G. (2002). Memory for goals: an activation-based model. Cogn. Sci. 26, 39–83. doi: 10.1016/s0364-0213(01)00058-1

CrossRef Full Text | Google Scholar

Altmann, E. M., Trafton, J. G., and Hambrick, D. Z. (2014). Momentary interruptions can derail the train of thought. J. Exp. Psychol. 143:215. doi: 10.1037/a0030986

PubMed Abstract | CrossRef Full Text | Google Scholar

Altmann, E. M., and Trafton, J. G. (2015). Brief lags in interrupted sequential performance: evaluating a model and model evaluation method. Int. J. Hu. Comput. Stud. 79, 51–65. doi: 10.1016/j.ijhcs.2014.12.007

Altmann, E. M., Trafton, J. G., and Hambrick, D. Z. (2017). Effects of interruption length on procedural errors. J. Exp. Psychol. Appl. 23:216. doi: 10.1037/xap0000117

Bailey, B. P., and Konstan, J. A. (2006). On the need for attention-aware systems: measuring effects of interruption on task performance, error rate, and affective state. Comput. Hum. Behav. 22, 685–708. doi: 10.1016/j.chb.2005.12.009

Bellezza, F. S. (1981). Mnemonic devices: classification, characteristics, and criteria. Rev. Educ. Res. 51, 247–275. doi: 10.3102/00346543051002247

Blick, K. A., Buonassissi, J. V., and Boltwood, C. E. (1972). Mnemonic techniques used by college students in serial learning. Psychol. Rep. 31, 983–986. doi: 10.2466/pr0.1972.31.3.983

Blick, K. A., and Waite, C. J. (1971). A survey of mnemonic techniques used by college students in free-recall learning. Psychol. Rep. 29, 76–78. doi: 10.2466/pr0.1971.29.1.76

Boltwood, C. E., and Blick, K. A. (1970). The delineation and application of three mnemonic techniques. Psychonom. Sci. 20, 339–341. doi: 10.3758/bf03335678

Bortle, C. D. (2010). The Role of Mnemonic Acronyms in Clinical Emergency Medicine: a Grounded Theory Study. Doctoral thesis, University of Phoenix, Phoenix, AZ.

Botvinick, M. M., and Bylsma, L. M. (2005). Distraction and action slips in an everyday task: evidence for a dynamic representation of task context. Psychonom. Bull. Rev. 12, 1011–1017. doi: 10.3758/bf03206436

Botvinick, M. M., and Plaut, D. C. (2006). Short-term memory for serial order: a recurrent neural network model. Psychol. Rev. 113:201. doi: 10.1037/0033-295x.113.2.201

Bower, G. H. (1970). Organizational factors in memory. Cogn. Psychol. 1, 18–46. doi: 10.1016/0010-0285(70)90003-4

Brown, G. D., Preece, T., and Hulme, C. (2000). Oscillator-based memory for serial order. Psychol. Rev. 107:127. doi: 10.1037//0033-295x.107.1.127

Burgess, N., and Hitch, G. J. (1999). Memory for serial order: a network model of the phonological loop and its timing. Psychol. Rev. 106:551. doi: 10.1037//0033-295x.106.3.551

Cades, D. M., Boehm-Davis, D. A. B., Trafton, J. G., and Monk, C. A. (2007). “Does the difficulty of an interruption affect our ability to resume?,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Los Angeles, CA.

Carlson, R. A., and Cassenti, D. N. (2004). Intentional control of event counting. J. Exp. Psychol. 30:1235. doi: 10.1037/0278-7393.30.6.1235

Carlson, L., Zimmer, J. W., and Glover, J. A. (1981). First-letter mnemonics: DAM (don’t aid memory). J. Gen. Psychol. 104, 287–292. doi: 10.1080/00221309.1981.9921047

Cook, N. M. (1989). The applicability of verbal mnemonics for different populations: a review. Appl. Cogn. Psychol. 3, 3–22. doi: 10.1002/acp.2350030103

Dismukes, R. K., Young, G. E., Sumwalt, R. L. III., and Null, C. H. (1998). Cockpit interruptions and distractions: effective management requires a careful balancing act. Nat. Acad. Sci. Eng. Med. 68, 18–21

Drews, F. A. (2007). The frequency and impact of task interruptions in the ICU. in Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Los Angeles, CA.

Faul, F., Erdfelder, E., Lang, A. -G., and Buchner, A. (2007). G ∗ Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191. doi: 10.3758/bf03193146

Gruneberg, M. M. (1973). The role of memorization techniques in finals examination preparation–a study of psychology students. Educ. Res. 15, 134–139. doi: 10.1080/0013188730150209

Hambrick, D. Z., and Altmann, E. M. (2015). The role of placekeeping ability in fluid intelligence. Psychonom. Bull. Rev. 22, 1104–1110. doi: 10.3758/s13423-014-0764-5

Hambrick, D. Z., Altmann, E. M., and Burgoyne, A. P. (2018). A knowledge activation approach to testing the circumvention-of-limits hypothesis. Am. J. Psychol. 131, 307–321.

Henson, R. N. (1998). Short-term memory for serial order: the start-end model. Cognit. Psychol. 36, 73–137. doi: 10.1006/cogp.1998.0685

Higbee, K. L. (1987). “Process mnemonics: Principles, prospects, and problems,” in Imagery and Related Mnemonic Processes , M. A. McDaniel, and M. Pressley, (Springer: New York, NY), 407–427. doi: 10.1007/978-1-4612-4676-3_19

Higbee, K. L. (2001). Your Memory: How it Works and How to Improve it. Cambridge, MA: Da Capo Press.

Hodgetts, H. M., and Jones, D. M. (2006). interruption of the tower of london task: support for a goal-activation approach. J. Exp. Psychol. 135:103. doi: 10.1037/0096-3445.135.1.103

Hörmann, H. J. (1994). “FOR-DEC-A prescriptive model for aeronautical decision-making,” in Proceedings of the 21. WEAAP-Conference , Dublin.

Hunt, P. (1988). Safety in aviation. Perfusion 3, 83–96.

Hurlstone, M. J., Hitch, G. J., and Baddeley, A. D. (2014). Memory for serial order across domains: an overview of the literature and directions for future research. Psychol. Bull. 140:339. doi: 10.1037/a0034221

Latorella, K. A. (1996). “Investigating interruptions: an example from the flightdeck,” in Proceedings of the Human Factors and Ergonomics Society Annual. Los Angeles, CA

Latorella, K. A. (1999). Investigating Interruptions: Implications for Flightdeck Performance. (NASA/TM-1999-209707). Hampton: NASA Langley Research Center.

Loukopoulos, L. D., Dismukes, R. K., and Barshi, I. (2001). “Cockpit interruptions and distractions: a line observation study,” in Proceedings of the 11th International Symposium on Aviation Psychology. Columbus, OH.

Loukopoulos, L. D., Dismukes, R. K., and Barshi, I. (2003). “Concurrent task demands in the cockpit: challenges and vulnerabilities in routine flight operations,” in Proceedings of the 12th International Symposium on Aviation Psychology. Dayton, OH.

Kieras, D. E., and Bovair, S. (1986). The acquisition of procedures from text: a production-system analysis of transfer of training. J. Mem. Lang. 25, 507–524 doi: 10.1016/0749-596x(86)90008-2

Malhotra, N. K. (1991). Mnemonics in marketing: a pedagogical tool. J. Acad. Market. Sci. 19, 141–149. doi: 10.1007/bf02726006

Manalo, E. (2002). Uses of mnemonics in educational settings: a brief review of selected research. Psychologia 45, 69–79. doi: 10.2117/psysoc.2002.69

Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97. doi: 10.1037/h0043158

Miller, S. P., and Mercer, C. D. (1993). Mnemonics: enhancing the math performance of students with learning difficulties. Interv. Sch. Clinic 29, 78–82. doi: 10.1177/105345129302900204

Monk, C. A., Boehm-Davis, D. A., and Trafton, J. G. (2002). “The attentional costs of interrupting task performance at various stages,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Los Angeles, CA.

Monk, C. A., Trafton, J. G., and Boehm-Davis, D. A. (2008). The effect of interruption duration and demand on resuming suspended goals. J. Exp. Psychol. 14:299. doi: 10.1037/a0014402

Monk, C. A., Boehm-Davis, D. A., Mason, G., and Trafton, J. G. (2004). Recovering from interruptions: implications for driver distraction research. Hum. Factors 46, 650–663. doi: 10.1518/hfes.46.4.650.56816

Moore, M. E., and Ross, B. M. (1963). Context effects in running memory. Psychol. Rep. 12, 451–465. doi: 10.2466/pr0.1963.12.2.451

Morris, P. E., and Cook, N. (1978). When do first letter mnemonics aid recall? Br. J. Educ. Psychol. 48, 22–28. doi: 10.1111/j.2044-8279.1978.tb02366.x

Miyata, Y., and Norman, D. A. (1986). Psychological issues in support of multiple activities. CRC Press: Boca Raton, FL.

Nelson, D. L., and Archer, C. S. (1972). The first letter mnemonic. J. Educ. Psychol. 63:482. doi: 10.1037/h0033131

Parmentier, F. B., and Maybery, M. T. (2008). Equivalent effects of grouping by time, voice, and location on response timing in verbal serial memory. J. Exp. Psychol. 34:1349. doi: 10.1037/a0013258

Scott-Cawiezell, J., Pepper, G. A., Madsen, R. W., Petroski, G., Vogelsmeier, A., and Zellmer, D. (2007). Nursing home error and level of staff credentials. Clin. Nurs. Res. 16, 72–78. doi: 10.1177/1054773806295241

Stalder, D. R. (2005). Learning and motivational benefits of acronym use in introductory psychology. Teach. Psychol. 32, 222–228. doi: 10.1207/s15328023top3204_3

Trafton, J. G., Altmann, E. M., and Ratwani, R. M. (2011). A memory for goals model of sequence errors. Cogn. Syst. Res. 12, 134–143. doi: 10.1016/j.cogsys.2010.07.010

Trafton, J. G., and Monk, C. A. (2007). Task interruptions. Rev. Hum. Factors Ergonom. 3, 111–126.

Westbrook, J., Woods, A., Rob, M., Dunsmuir, W. T., and Day, R. O. (2010). Association of interruptions with an increased risk and severity of medication administration errors. Arch. Intern. Med. 170, 683–690.

Wickens, C. D. (2002). Multiple resources and performance prediction. Theor. Issues Ergonom. Sci. 3, 159–177. doi: 10.1080/14639220210123806

Keywords : interruptions, sequential task, resumption time, goal activation, sequential error, mnemonic technique, acronym, procedure learning

Citation: Radović T and Manzey D (2019) The Impact of a Mnemonic Acronym on Learning and Performing a Procedural Task and Its Resilience Toward Interruptions. Front. Psychol. 10:2522. doi: 10.3389/fpsyg.2019.02522

Received: 22 August 2019; Accepted: 24 October 2019; Published: 06 November 2019.

Reviewed by:

Copyright © 2019 Radović and Manzey. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tara Radović, [email protected] ; [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Open access
  • Published: 20 December 2023

Statins and cognitive decline in patients with Alzheimer’s and mixed dementia: a longitudinal registry-based cohort study

  • Bojana Petek 1 , 2 , 3 ,
  • Henrike Häbel 4 ,
  • Hong Xu 5 ,
  • Marta Villa-Lopez 5 , 6 , 7 ,
  • Irena Kalar 1 , 3 , 8 ,
  • Minh Tuan Hoang 5 , 9 ,
  • Silvia Maioli 1 , 10 ,
  • Joana B. Pereira 11 ,
  • Shayan Mostafaei 5 , 9 ,
  • Bengt Winblad 1 , 12 ,
  • Milica Gregoric Kramberger 1 , 3 , 8 ,
  • Maria Eriksdotter 5 , 12 &
  • Sara Garcia-Ptacek 5 , 12  

Alzheimer's Research & Therapy volume  15 , Article number:  220 ( 2023 ) Cite this article

33k Accesses

7 Citations

211 Altmetric

Metrics details

Disturbances in brain cholesterol homeostasis may be involved in the pathogenesis of Alzheimer’s disease (AD). Lipid-lowering medications could interfere with neurodegenerative processes in AD through cholesterol metabolism or other mechanisms.

To explore the association between the use of lipid-lowering medications and cognitive decline over time in a cohort of patients with AD or mixed dementia with indication for lipid-lowering treatment.

A longitudinal cohort study using the Swedish Registry for Cognitive/Dementia Disorders, linked with other Swedish national registries. Cognitive trajectories evaluated with mini-mental state examination (MMSE) were compared between statin users and non-users, individual statin users, groups of statins and non-statin lipid-lowering medications using mixed-effect regression models with inverse probability of drop out weighting. A dose-response analysis included statin users compared to non-users.

Our cohort consisted of 15,586 patients with mean age of 79.5 years at diagnosis and a majority of women (59.2 %). A dose-response effect was demonstrated: taking one defined daily dose of statins on average was associated with 0.63 more MMSE points after 3 years compared to no use of statins (95% CI: 0.33;0.94). Simvastatin users showed 1.01 more MMSE points (95% CI: 0.06;1.97) after 3 years compared to atorvastatin users. Younger (< 79.5 years at index date) simvastatin users had 0.80 more MMSE points compared to younger atorvastatin users (95% CI: 0.05;1.55) after 3 years. Simvastatin users had 1.03 more MMSE points (95% CI: 0.26;1.80) compared to rosuvastatin users after 3 years. No differences regarding statin lipophilicity were observed. The results of sensitivity analysis restricted to incident users were not consistent.

Conclusions

Some patients with AD or mixed dementia with indication for lipid-lowering medication may benefit cognitively from statin treatment; however, further research is needed to clarify the findings of sensitivity analyses.

The brain houses about a quarter of the cholesterol present in the body, making it the richest cholesterol-containing organ [ 1 ]. The essential role of brain cholesterol is reflected in its involvement in numerous physiological processes such as maintaining membrane integrity, neurotransmission and synaptogenesis [ 2 ]. A dysregulation of brain cholesterol homeostasis may be involved in the pathogenesis of Alzheimer’s disease [ 2 ] through interference with the amyloidogenic Aβ pathway [ 3 ], impairment of cerebral blood flow [ 4 ], and other mechanisms [ 5 ]. On the other hand, the association of peripheral hypercholesterolemia and cognition is complex. Peripheral hypercholesterolemia in midlife has been linked to cognitive decline and AD in late-life [ 6 , 7 ] through different mechanisms [ 7 , 8 , 9 , 10 ]. Moreover, genetic polymorphism of brain cholesterol transporter ApoE4 and several additional genetic factors implicated in lipid metabolism could be relevant to AD pathogenesis [ 11 , 12 ]. In contrast, peripheral hyperlipidaemia in late life is a marker of a better general health and cognition [ 13 , 14 ].

The possible cognitive effects of HMG-CoA reductase inhibitors or statins, which are used in cardiovascular disease prevention, have sparked extensive research in the last few decades. Based on their pharmacokinetic characteristics, statins can be divided according to their structure (fungus-derived or synthetical), lipophilicity, metabolism, bioavailability, potency and binding to different proteins and transporters [ 15 ]. The multi-layered effects of statins on cognition are translated through numerous neurodegenerative processes in a cholesterol-dependent as well as independent (´´pleiotropic´´) manner [ 15 , 16 ]. Statins seem to interfere with the amyloidogenic cascade [ 17 ] and phosphorylation of tau [ 18 ], provide beneficial vascular factors through endothelial function and clearance of neurotoxic substances [ 19 ], decrease neuroinflammation and oxidative stress as well as promote neuronal survival and plasticity, synaptogenesis and neurotransmission [ 16 ].

The overall cognitive effects of statins are likely connected to a complex interaction of factors, related to the patient’s characteristics, integrity of blood–brain barrier permeability [ 20 ], characteristics of statins [ 18 ], time of treatment, dosages as well as critical time windows in the pathogenesis of dementia [ 21 , 22 ] (Fig. 1 ).

figure 1

Interaction between the patient’s and medication’s characteristics potentially influence the cognitive effects of statins. Two separate cholesterol pools in the body are thought to be connected to the risk of Alzheimer’s disease (AD), central and peripheral. The brain penetration of statins has been attributed to different factors linked to BBB crossing (lipophilicity of a statin, chemical structure, molecular weight and size of the molecule, different transporters and their genetic polymorphisms). The structure of the barrier itself additionally influences the permeability of statins and is affected by aging, neurodegenerative processes and possibly, peripheral hypercholesterolemia. The overall cognitive effects of statins are likely a result of their central and peripheral actions and are connected to the time of intervention in life and the pathogenesis of AD. Moreover, an interaction of comorbidities and comedication, a sufficient time of treatment and dosages are important. In midlife, protective effect of statins against AD could be achieved through lowering the metabolic risk of hyperlipidaemia. BBB blood–brain barrier, AD Alzheimer’s disease

Despite the extensive number of observational cohort studies and some clinical trials on statins, their ability to prevent dementia or ameliorate cognitive decline after disease onset is still unclear. A number of mild and reversible short-term cognitive adverse effects [ 23 , 24 ] contributed to a warning for the labelling of statins by the US Food and Drug Administration. However, numerous large systematic reviews and meta-analyses have not confirmed these adverse cognitive effects [ 25 , 26 , 27 , 28 , 29 ] and some suggested that the use of statins may lower the risk of AD [ 25 , 30 , 31 , 32 , 33 ]. Clinical trials generally reported a null effect [ 34 , 35 , 36 ] but were commonly underpowered or used less robust cognitive evaluation tools. Comparably less information is available regarding the effect of statins on cognitive decline in patients with established AD [ 37 , 38 , 39 , 40 ]. Epidemiological biases inherent to observational design or a heterogeneous design of studies partly explain these discrepancies [ 41 ].

The aim of our study was to evaluate the association between statin use and cognitive decline over time in a large cohort of patients diagnosed with AD or mixed AD dementia. We hypothesized that statins that cross the BBB would be associated with less cognitive decline evaluated with mini-mental state examination (MMSE) in these patients.

Study design and registries

We performed a longitudinal cohort study of patients with AD or mixed dementia and indication for lipid-lowering treatment, registered in the Swedish registry for dementia (SveDem). SveDem is a nationwide quality-of-care registry, established in 2007 [ 42 ]. All memory clinics and 78 % of primary care centres in Sweden report to SveDem [ 43 ]. From this registry, we obtained demographic information (age, sex, living arrangements), date and care unit of registration, type of dementia diagnosis and cognitive status of the patients (MMSE scores) at baseline and follow-ups. The date of the dementia diagnosis in SveDem was set as the index date; 61% of patients had only one entry, 26% had two, 8% had 3 and 5% had more than 3. In total, 80,004 individual patients with dementia were registered in SveDem between 2007 and 2018. All patients were followed until death, emigration or end of follow-up (16 October 2018).

All patients with a missing MMSE score at index date were excluded from the analyses. Only patients diagnosed with hyperlipidaemia (ICD-10 codes from E78.0 to E78.6 obtained from the the Swedish National Patient Registry (NPR), see below) in the preceding 10 years before the index date or those with a prescription of statins (ICD-10 code C10 obtained from the Swedish Prescribed Drug Registry (PDR), see below) in the preceding 6 months before the index date were included in the analyses. Furthermore, the top 1% of statins users sorted by averaged defined daily doses (DDD) were excluded as well, assuming that their consumption data was falsely high and that these individuals bought medication that they did not consume. Figure 2 shows the patient selection flowchart: 15,586 individuals were included for the main analysis.

figure 2

Flowchart of study participants selection. Hyperlipid patients with AD or mixed dementia, registered in SveDem from 2007 to 2018 were included in the study. Among these, we compared cognitive trajectories over time, evaluated with MMSE, in different comparison groups: (1) statin users vs non-users of statins, (2) simvastatin users vs atorvastatin users, (3) simvastatin users vs rosuvastatin users, (4) lipophilic statin users vs hydrophilic statin users, (5) fungal statin users vs synthetic statin users and (6) non-statin lipid-lowering medications users vs statin users

The exposure drug use was extracted for every SveDem entry. Drug use was defined from the PDR as either the average DDD during the 6-month period preceding each SveDem entry date or simply as a categorical variable (yes/no) during the same period (time-updated exposure). Time-updated exposure means that presence/absence and dose of statins was examined in each 6-month period leading up to each measurement of MMSE (baseline or follow-up). Individual patients starting their statin treatment after the index date were excluded from the analyses because cognitive decline could have affected prescription. All other patients were included in the analyses as non-users. DDD is defined by the World Health Organization as the assumed average maintenance daily dose of a medication for its primary indication in adults [ 44 ]. One DDD of simvastatin is equivalent to 30 mg of simvastatin or 20 mg of atorvastatin.

Medication use

Medication use with their corresponding ATC codes were collected from the PDR that was established in 2005, which includes all prescription medication dispensed at Swedish pharmacies [ 45 ]. Lipid-lowering medications included simvastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin, fibrates, bile acid sequestrants, nicotinic acid and derivates and other non-statin lipid lowering medications. Comedications were calculated as time-updated exposures (yes/no) during the 6-month period preceding the index date. Comedications were selected based on known relevance for patients with dementia and included cardiac drugs, vasoprotectives, platelet aggregation inhibitors, anticoagulants, antipsychotics, anxiolytics, hypnotics, antidepressants, cholinesterase inhibitors, memantine and vitamin D ( Appendix ). Assumption on the adherence was made based on the collection of the medication at the pharmacy.

Comorbidities

Comorbidities were obtained with their corresponding ICD-10 codes from the NPR and were coded dichotomously up to 10 years before index date. NPR covers all diagnoses from in-hospital and specialist clinics. Comorbidities were selected based on their known relevance for cognition in patients with dementia and included diabetes mellitus, arrythmia, heart failure, atrial fibrillation, alcohol-related disease, chronic kidney disease, cardiovascular disease, ischemic heart disease, respiratory disease, stroke, anaemia, liver disease, malignancy and obesity ( Appendix ).

Covariates that were considered included age at baseline, sex (male/female), residency (living with another adult/alone/nursing home/missing), type of dementia diagnostic unit (special memory clinic/primary care centre) and calendar year of dementia diagnosis, all at index date. We selected the covariates that are likely associated with cognitive functions or the probability of receiving statins, based on previous research and/or our clinical knowledge.

The linkage of data from the forementioned registries—SveDem, Swedish National Patient Registry, and Swedish Prescribed Drug Registry—was allowed by the personal identification number of each Swedish citizen. Patient identification was pseudonymized and blinded to the researchers.

The main outcome was cognitive decline, evaluated with MMSE points.

Statistical analysis

The data were described in terms of mean and standard deviation (SD) for continuous variables and as positive counts (percentages) for categorical variables.

Linear mixed-effects regression models with random intercept and slope were used to investigate the change in MMSE scores over time and to detect differences between statin users and non-users. The model included statin use and time from index date as continuous variables and an interaction between drug use and time. Following our previous work in SveDem [ 46 ], a linear trend over time was assumed and the model allowed for a random intercept and random slope for each patient. This model is referred to as the crude model. In an adjusted model, comedications, comorbidities and other covariates were included in the model mostly as categorical variables. Only age, MMSE scores and calendar year at index date were treated as continuous variables. Fully adjusted models included clinical and demographic characteristics (MMSE score at baseline and age at diagnosis, year of diagnosis, sex, residency, comorbidities and comedications). Inverse probability weighting was used to account for the potential effects of general attrition from those lost to follow-up due to dropout. For this purpose, a logistic regression model was fitted to the data to estimate the probability of dropping out within the subsequent year. Dropout was defined as the last observed MMSE score without death or study end occurring in the subsequent year. For more details, see Handels et al. [ 47 ].

The analyses were repeated for selected drug groups where drug use was defined as yes/no during the 6-month period before each SveDem entry date and in subgroups defined by gender and age. We split the cohort at the mean age at index date (79.5 years) to create subgroups of younger and older patients. We compared individual statin users. We considered two approaches to divide the statins regarding their functional properties: firstly, lipophilic (simvastatin, atorvastatin, fluvastatin, lovastatin and pitavastatin) or hydrophilic groups (rosuvastatin, pravastatin) [ 48 ]. Moreover, we classified them into fungus-derived (simvastatin, pravastatin, lovastatin) or synthetic statins (atorvastatin, cerivastatin, fluvastatin, rosuvastatin, pitavastatin) [ 15 ]. Finally, we compared statin users to non-statin lipid-lowering medication users. To evaluate the association between the comparison groups after 3 years, we calculated a theoretical linear extrapolation based on the mixed effect models.

Multiple imputations of MMSE scores [ 47 ] and incident users models were two additional models we performed as a sensitivity analysis to check the robustness of our results. The first model deals with bias arising from missing MMSE scores at follow-ups and the second addresses the length of treatment as confounding. Incident users were defined as drug users who did not take out any drug prescription of statins during 12 months before 6-month period preceding each SveDem entry date. Table 1 shows the design of the study.

The cluster robust sandwich estimator was used to estimate standard error of the estimations and two-sided p -values were reported. All analyses were conducted using STATA version 16.1 (StataCorp, College Station, TX).

Characteristics of study population

As shown in Table 2 , our cohort consisted of 15,586 AD or mixed dementia patients with a mean age of 79.5 years (SD = 6.8) at dementia diagnosis. Most patients were women (59.2%). At baseline, all patients scored on average 21 points on MMSE (SD = 5); 10,869 patients (69.7%) used statins in the observation period. The most prescribed statin in the whole cohort was simvastatin (8235, 52.8%) followed by atorvastatin (2210, 14.2%). There were 296 (1.9%) users of a non-statin lipid-lowering medication. Most of the patients resided at home (53.9% with another adult and 40.7% alone) and 5% lived nursing homes. The most common comorbidities in the cohort were hypertension (40.1%), diabetes mellitus (24.4%) and cardiovascular disease (24%).

The average time of follow-up was 0.86 (SD = 1.40) years and average number of MMSE follow-ups for a patient with measures of MMSE was 1.61 (SD = 0.97). The average decline per year of the cohort was −1.20 points MMSE (95% CI: −1.32; −1.09). The average cognitive decline observed among 6113 patients with at least two MMSE measurements was −2.61 points (SD = 4.57).

Statin users were more commonly male (44 % vs 33.5 %, p < 0.001), younger (78.7 vs 80.7 years at baseline, p < 0.001) and had a better cognitive status at baseline (21.3 vs 20.8 MMSE points, p < 0.001), compared to non-users of statins. As expected, there was a higher prevalence of comorbidities in the former compared to the latter group, such as hypertension, cardiovascular disease, liver disease and diabetes mellitus. Statin users were more likely to be prescribed co-medication, such as antithrombotics, antihypertensives, antidiabetics or psycholeptics. More detailed information is presented in Tables 2 and 3 .

The sensitivity analysis revealed that a majority of our cohort had used statins at some point in time. Overall, 844 patients were incident users of statins. Compared to incident simvastatin users ( n = 557), incident atorvastatin users ( n = 267) had a lower baseline MMSE (20.5 vs 21.4 points, p = 0.01) and less commonly received their dementia diagnosis at special memory clinic (60.3 % vs 78.8 %, p < 0.001). (Supplementary table 7 ).

Cognitive decline in different treatment groups

Statin users compared to non- users of statins.

Statin use was associated with a slower cognitive decline over time compared to no use of statins. After taking an average of 1 DDD of statins for a year, statin users had 0.21 more MMSE points (95% CI: 0.12; 0.32) compared to non-users. There was a dose-response effect. After 3 years of taking an average 1 DDD of statins, statin-treated patients had 0.63 points more MMSE points (95% CI: 0.33; 0.94) (Table 4 , Fig. 3 ). These results were consistent in subgroup analysis and when considering imputed missing MMSE values and analysis restricted to incident users (Supplementary table 6 ) and in subgroup analyses (Supplementary table 1 ). We conducted post hoc analyses stratifying by dementia type (Alzheimer or mixed dementia; results not shown) with similar results to those presented for the whole group.

figure 3

Cognitive decline, evaluated with change in MMSE score over time, in statin users compared to non-users of statins. The graph shows the association between increasing doses of statin treatment and MMSE over time, as predicted from the model. Linear mixed-effects regression model, adjusted for demographic characteristics, comorbidities and comedication, with inverse probability weighting. DDD defined daily dose. DDD is defined by the World Health Organization as the assumed average maintenance daily dose of a medication for its primary indication in adults. Yearly visit 1 represent the first MMSE measurement (baseline)

Simvastatin users compared to atorvastatin users

Simvastatin users exhibited a slower cognitive decline over time, compared to atorvastatin users (0.35 more MMSE points per year of follow-up, 95% CI: 0.03; 0.67 and 1.01 more MMSE points after 3 years, 95% CI: 0.06; 1.97) (Table 4 , Fig. 4 ).

figure 4

Cognitive decline, evaluated with change in MMSE score over time, in simvastatin compared to atorvastatin users. Linear mixed-effects regression model, adjusted for demographic characteristics, comorbidities and comedication, with inverse probability weighting. Yearly visit 1 represent the first MMSE measurement (baseline)

When stratifying analyses for gender and age, the protective association for cognition of simvastatin compared to atorvastatin, was only statistically significant in the participants aged <79.5 years (which was the mean age of the total sample). Younger users of simvastatin had a slower decline of MMSE (0.28 points more per year, 95% CI: 0.03; 0.54, and 0.80 points more after 3 years, 95% CI: 0.05; 1.55) compared to younger atorvastatin users (Table 5 ).

This protective association was statistically significant in multiple imputations of MMSE model but not in incident users (0.56 less MMSE points per year, 95% CI: -1.13; 0.01) (Supplementary table 6 ). We conducted post hoc analyses stratifying by dementia type (Alzheimer or mixed dementia): in mixed dementia, simvastatin use was associated with 1.57 more MMSE points MMSE at 3 years (95% CI 0.79; 2.34) while results were not significant for the Alzheimer group (0.49 more points MMSE at 3 years; 95% CI -0.76; 1.76).

Simvastatin users compared to rosuvastatin users

Simvastatin users had a slower MMSE decline, compared to rosuvastatin users (0.35 more MMSE points per year of follow-up, 95% CI: 0.09; 0.61, and 1.03 more MMSE points after 3 years, 95% CI: 0.26; 1.80) (Table 4 , Fig. 5 ).

figure 5

Cognitive decline, evaluated with change in MMSE scores over time, in simvastatin compared to rosuvastatin users. Linear mixed-effects regression model, adjusted for demographic characteristics, comorbidities, and co-medication, with inverse probability weighting. Yearly visit 1 represent the first MMSE measurement (baseline)

In subgroup analysis, these results remained statistically significant in women and younger patients (Supplementary table 2 ).

The associations remained protective when imputing missing MMSE values. However, incident simvastatin users had a faster decline of MMSE compared to incident rosuvastatin users (1.63 less MMSE points per year of follow-up, 95% CI: -3.18; -0.07 and 4.77 less MMSE points after 3 years, 95% CI: -9.46; -0.07) (Supplementary table 6 ).

Lipophilic statin users compared to hydrophilic statin users

We did not find significant differences in MMSE decline in lipophilic statin users (simvastatin, atorvastatin, fluvastatin users) (Table 4 ) or when considering imputed values of missing MMSE, compared to hydrophilic statins users (rosuvastatin, pravastatin users). However, it was faster in incident users of lipophilic statins (1.32 less MMSE points per year, 95% CI: -2.46; -0.18), and 3.84 less points after 3 years, 95% CI: -7.28; -0.41), compared to hydrophilic statins (Supplementary table 6 ). These analyses were not statistically significant in sub-analysis of age groups and sex (Supplementary table 3 ).

Fungal statin users compared to synthetic statin users

Use of fungal statins (simvastatin, pravastatin users) was not associated with a difference in MMSE decline compared to synthetic statin users (atorvastatin, rosuvastatin, fluvastatin users) (Table 4 ). In a subgroup analysis, the MMSE decline was slower in younger fungal statin users (0.26 more points per year, 95% CI: 0.03; 0.49, and 0.73 more points after 3 years, 95% CI: 0.05; 1.42) (Supplementary table 4 ). However, the decline was faster when analysing only incident fungal users, compared to incident synthetic users (0.61 points less per year, 95% CI: -1.19; -0.04, and 1.83 points less after 3 years, 95% CI: -3.55; -0.12) (Supplementary table 6 ).

Non-statin lipid-lowering medication users compared to statin users

We did not observe a significant difference in MMSE decline between non-statin lipid lowering medication users, compared to statin users (Table 4 ).

In some subgroup analyses, users of non-statin lipid lowering medication had a slower MMSE decline (Supplementary table 5 ).

In this longitudinal Swedish registries-based observational study of patients with AD or mixed AD dementia, we discovered a dose-dependent cognitive benefit over time in statin users compared to non-users of statins. Additionally, we discovered a slower MMSE decline over time in patients taking simvastatin, compared to either atorvastatin or rosuvastatin users. Younger users of simvastatin had a slower MMSE decline compared to younger atorvastatin users. We did not observe a difference in MMSE decline depending on lipophilicity. Incident users’ analysis revealed inconsistent findings which could be potentially explained with time-dependant non-linear association between effect of statins on cognitive processes or through differences and selection of these users.

Different statins

Simvastatin was the most used statin in Sweden when our data was collected. Accordingly, this makes comparisons among different statins or groups difficult, often lacking enough power. A beneficial role of simvastatin in early dementia is biologically plausible, when there are high levels of neuroinflammation as this lipophilic statin readily crosses the BBB and could exert various neuroprotective properties, such as protection against tau hyperphosphorylation and mediation of brain cholesterol homeostasis [ 18 ]. Research on animal models of AD further support the beneficial effects of simvastatin on cognition through different mechanisms [ 49 , 50 ]. Clinical trials in patients with mild to moderate AD reported a neutral [ 35 ] or a beneficial effect of simvastatin [ 51 ], using MMSE as a cognitive outcome. Findings were limited by relatively short trial duration and low number of participants and were therefore possibly underpowered. To our best knowledge, our study is the first observational study to compare cognitive decline between different statins in patients with established AD and mixed dementia. A careful adjustment for comedication in general and cholinesterase inhibitors in particular is important, as our group previously discovered a small long-term beneficial effect on cognition in AD and mixed dementia patients treated with cholinesterase inhibitors [ 46 ].

In our study, the analysis including only incident users showed an opposite association. A possible explanation to inconsistent results of incident user design may be related to a temporally dependent biphasic effect of statin therapy on cholesterol metabolites, as shown in a study which included asymptomatic patients at risk of AD [ 52 ]. In this study, statins initially reduced cholesterol metabolites in the cerebrospinal fluid, reaching a nadir at 6–7 months, followed by a return to baseline and an overshoot at two years. Moreover, several differences between incident users compared to all users exist which could partly contribute to the discrepancies. The individual characteristics of incident users, such as baseline MMSE differences, or a possible selection of these smaller groups of patients through individual physicians’ preferences, might have influenced their cognitive trajectories which could not be accounted for in adjusted models.

Lipophilicity and chemical characteristics (fungal or synthetic statins)

Several biochemical characteristics of statins probably influence the functional effects of statins on cholesterol metabolism and cognition. Statins with a higher lipophilicity (e.g., simvastatin, atorvastatin, fluvastatin) may cross the BBB more easily compared to more hydrophilic statins (rosuvastatin, pravastatin) [ 18 ]. Additionally, the size and orientation of a statin molecule may influence the BBB permeability of statins which explained a low ability of a lipophilic atorvastatin to cross the BBB, due to its large size [ 18 ]. In our study, we did not observe a difference in cognitive decline when comparing users of lipophilic to hydrophilic statins in most models and subgroup analyses. However, MMSE decline was faster in incident users of lipophilic statins. Due to forementioned Swedish prescription patterns, the comparisons in our study were driven by simvastatin and atorvastatin users compared to rosuvastatin users. To the best of our knowledge, we are not aware of another observational study which compared cognitive decline between lipophilic and hydrophilic statin users in AD patients.

Fungal statins (simvastatin and pravastatin) differ from the synthetic statins (atorvastatin, rosuvastatin, fluvastatin) in several functional characteristics. Synthetic statins were shown to form more interactions which leads to a stronger inhibition of HMG-CoA reductase and a higher potency [ 53 ]. Moreover, fungus-derived statins were observed to have a high permeability through the blood–brain barrier and cause a reduction of cholesterol levels as well as lower a burden of neurofibrillary tangles in animal models [ 54 ]. In our study, this comparison between fungal and synthetic statins was driven by simvastatin and atorvastatin users, so this classification did not add further information. To the best of our knowledge, no previous studies compared cognitive decline in AD or mixed dementia patients among fungal and synthetic statins. A recent cohort study comparing different incident statin users found a higher risk of AD in fungal statins compared to synthetic, as well as higher risk in lipophilic statins compared to hydrophilic statins; however, the risk was reduced in sensitivity analysis [ 55 ].

Non-statin lipid-lowering medications

The confidence intervals for this comparison were broad and did not reach statistical significance. However, MMSE decline was slower in some subgroups of non-statin lipid-lowering medications (men and younger users). Statins and other hypolipemics, such as gemfibrozil, represent another interesting comparison group since they are both prescribed for hyperlipidaemia, therefore diminishing indication bias, and could exhibit cognitive effects through different metabolic pathways. Gemfibrozil attenuated amyloid burden as well as neuroinflammation and improved the memory in AD mouse models through activation of PPAR-alpha in a recent study [ 56 ], but our study was probably underpowered for this comparison.

Statin dose, potency, treatment length and time window for intervention

The dose, potency or duration of treatment have been recognized as important factors when evaluating the effects of statins on cognition. Most work has been done on evaluating these factors in the prevention of dementia or AD [ 32 , 33 , 57 ]. A dose-response was observed in a large cohort study which included only AD patients [ 58 ]. In our study, a dose-effect was observed when comparing statin users and non-users. The prediction model showed a benefit after 3 years, which is an estimated brain cholesterol turnover rate in adults, but most of the data in our cohort aggregates towards earlier follow-ups. A time window of intervention with statins regarding the neuropathogenesis of dementia, or life course of a patient, might exist as the neurodegenerative pathological changes of AD begin decades prior to clinical symptoms [ 59 ]. The protective effect of statins could be achieved in a long-term amelioration of brain vascular burden, restoration of disturbed central cholesterol homeostasis and neuroprotective effects, possibly in preclinical [ 60 ] or early stages of AD [ 61 , 62 ].

Statin use compared to no use of statins

An extensive evaluation of the possible role of statins in preventing dementia has been performed in the last two decades [ 25 , 26 , 32 , 33 , 41 , 63 , 64 , 65 ] but comparably less studies included patients with already established AD [ 25 , 31 , 38 , 39 , 40 , 41 , 64 ]. Clinical trials of statin use in AD patients did not report a clear benefit to ameliorate cognitive decline [ 38 , 40 , 64 ]. Observational cohort studies of patients with AD and a various follow-up ranging from 10 months to over 10 years, reported a slower [ 61 , 66 , 67 , 68 ] or similar [ 69 ] cognitive decline in statins users compared to non-users. Findings from our analysis comparing statin use to no use which imply a possible dose-dependent beneficial role of statins in patients with AD is in accord with many of these previous studies. However, comparing statin users and no-users introduces several important biases.

Importantly, hyperlipidaemia is an indication for statin use in midlife and represents a risk factor for dementia and AD. On the other hand, low cholesterol level in late life has been recognized as a measure of frailty or prodromal stage of dementia, particularly AD [ 70 ]. These facts can lead an indication bias when comparing users to non-users. Secondly, clinicians might be less likely to prescribe statins to older patients, especially those with pre-existing cognitive decline, frailty, or comorbidities since the risk of possible side effects and diminished life expectancy outweighs the benefit of medication. Furthermore, cognitive impairment could lead to a discontinuation of statins or drop-out from study, which would lead to a false beneficial association [ 41 , 71 ]. Older patients who receive statins for their hypercholesterolemia could naturally possess a lower risk of dementia or reflect a better cognitive trajectory [ 72 ], leading to reverse causation.

Strengths and limitations

Our study has several strengths and limitations. We can report associations but are not able to draw the conclusions on causality. This study was meant as an exploratory analysis which requires confirmation. We considered a variety of comorbidities and comedications in our models; however, a few important covariates were not available for our analysis, such as cholesterol levels, ApoE status and possible genetic polymorphisms specific for different populations [ 73 ]. However, we addressed this issue with a selection of a population with indication for treatment and used multivariate adjusting to balance the differences between the groups as well as performed several sensitivity analyses. Patient adherence to medication was indirectly assumed based on the dispensation of medication at pharmacies. There was a considerable number of drop-out participants and missing values on MMSE. MMSE was the only measure of cognitive decline in our study and is less robust to detect cognitive changes in different cognitive domains. We chose to only include observed MMSE scores in the analyses to limit the risk of creating false data with imputation but there is, of course, a risk of selection bias. We restricted the study population to those individuals with drug use at index date to ensure that we can follow a statin user’s cognitive decline from the beginning. The results from the sensitivity analyses considering multiple imputations of MMSE are in line with our main findings and confirmed our choices. However, the results of analysis on incident users were not consistent. Another important strength of our study lies in carefully selected statistical methods. Use of linear mixed modelling with multiple imputation is currently regarded as a superior method to account for the attrition bias [ 72 ]. We considered a reasonably long follow-up and performed a large, population-based study. We examined the use of statins before dementia to explore the reverse causality of cognition influencing the adherence or use of statin but cannot completely exclude this problem with our current methods.

Our population-based exploratory cohort study of patients with AD or mixed dementia adds to a growing body of evidence that statins are not detrimental for cognition. Moreover, statins might exhibit a long-term cognitive benefit these patients who have indication for lipid-lowering treatment. However, our findings warrant a confirmational study. We believe that our findings should further encourage clinicians to select eligible patients with dementia to benefit from prevention of their cardiovascular and cerebrovascular disease with statins. Further research of the pathogenesis of dementia is warranted. Acknowledging dementia as a complex, multifactorial syndrome where different pathogenic processes and risk factors are at play at different stages of dementia, it would be plausible to examine the combined effects of several medications affecting different metabolic pathways in well-defined subtypes of dementia. Moreover, the role of lipid metabolism dysregulation to the pathogenesis of AD should be further explored, taking the genetic factors into consideration. Further research is needed to decipher the non-consistent results in incident statin users, where time since prescription may be an important factor.

Availability of data and materials

Following the Swedish and EU legislation, the data are not available for public access. In order to obtain the data from Swedish registries, researches must apply to the steering committees of the registries as well as relevant government authorities, after obtaining the ethical approval.

Abbreviations

Alzheimer’s disease

Mini-mental state examination

Swedish Registry for Cognitive/Dementia Disorders

Amyloid beta

Apolipoprotein E4

3-Hydroxy-3-methylglutaryl coenzyme A reductase

Blood–brain barrier

National Patient Registry

Swedish Prescribed Drug Registry

Defined daily dose

International Classification of Diseases, 10th Revision

Standard deviation

Confidence interval

Björkhem I, Meaney S, Fogelman AM. Brain cholesterol: long secret life behind a barrier. Arterioscler Thromb Vasc Biol. 2004;24(5):806–15.

Article   PubMed   Google Scholar  

Chew H, Solomon VA, Fonteh AN. Involvement of lipids in Alzheimer’s disease pathology and potential therapies. Front Physiol. 2020;9:11.

Google Scholar  

Simons M, Keller P, De Strooper B, Beyreuther K, Dotti CG, Simons K. Cholesterol depletion inhibits the generation of β-amyloid in hippocampal neurons. Proc Natl Acad Sci U S A. 1998;95(11):6460–4.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Czuba E, Steliga A, Lietzau G, Kowiański P. Cholesterol as a modifying agent of the neurovascular unit structure and function under physiological and pathological conditions. Metab Brain Dis. 2017;32(4):935–48.

Ismail MAM, Mateos L, Maioli S, Merino-Serrais P, Ali Z, Lodeiro M, et al. 27-Hydroxycholesterol impairs neuronal glucose uptake through an IRAP/GLUT4 system dysregulation. J Exp Med. 2017;214(3):699–717.

Kivipelto M, Solomon A. Cholesterol as a risk factor for Alzheimer’s disease—epidemiological evidence. Acta Neurol Scand. 2006;114(SUPPL. 185):50–7.

Article   Google Scholar  

Anstey KJ, Ashby-Mitchell K, Peters R. Updating the evidence on the association between serum cholesterol and risk of late-life dementia: review and meta-analysis. J Alzheimer’s Disease. 2017;56:215–28.

Loera-Valencia R, Goikolea J, Parrado-Fernandez C, Merino-Serrais P, Maioli S. Alterations in cholesterol metabolism as a risk factor for developing Alzheimer’s disease: potential novel targets for treatment. J Steroid Biochem Mol Biol. 2019;1(190):104–14.

Gamba P, Staurenghi E, Testa G, Giannelli S, Sottero B, Leonarduzzi G. A crosstalk between brain cholesterol oxidation and glucose metabolism in Alzheimer’s disease. Front Neurosci. 2019;13:556.

Article   PubMed   PubMed Central   Google Scholar  

Petek B, Villa-Lopez M, Loera-Valencia R, Gerenu G, Winblad B, Kramberger MG, et al. Connecting the brain cholesterol and renin–angiotensin systems: potential role of statins and RAS-modifying medications in dementia. J Intern Med. 2018;284(6):620–42.

Article   PubMed   CAS   Google Scholar  

Bellenguez C, Küçükali F, Jansen IE, Kleineidam L, Moreno-Grau S, Amin N, et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet. 2022;54(4):412–36.

de Oliveira FF, Bertolucci PHF, Chen ES, Smith MC. Pharmacogenetic analyses of therapeutic effects of lipophilic statins on cognitive and functional changes in Alzheimer’s disease. J Alzheimers Dis. 2022;87(1):359–72.

Benito-León J, Vega-Quiroga S, Villarejo-Galende A, Bermejo-Pareja F. Hypercholesterolemia in elders is associated with slower cognitive decline: a prospective, population-based study (NEDICES). J Neurol Sci. 2015;350(1–2):69–74.

Reitz C, Tang MX, Luchsinger J, Mayeux R. Relation of plasma lipids to Alzheimer disease and vascular dementia. Arch Neurol. 2004;61(5):705–14.

McFarland AJ, Anoopkumar-Dukie S, Arora DS, Grant GD, McDermott CM, Perkins AV, et al. Molecular mechanisms underlying the effects of statins in the central nervous system. Int J Mol Sci. 2014;15(11):20607–37.

Mendoza-Oliva A, Zepeda A, Arias C. The complex actions of statins in brain and their relevance for Alzheimer’s disease treatment: an analytical review. Curr Alzheimer Res. 2014;11(999):1–1.

Simons M, Keller P, Dichgans J, Schulz JB. Cholesterol and Alzheimer’s disease: is there a link? Neurology. 2001;57(6):1089–93.

Sierra S, Ramos MC, Molina P, Esteo C, Vázquez JA, Burgos JS. Statins as neuroprotectants: a comparative in vitro study of lipophilicity, blood-brain-barrier penetration, lowering of brain cholesterol, and decrease of neuron cell death. J Alzheimer’s Dis. 2011;23(2):307–18.

Article   CAS   Google Scholar  

Zhou Q, Liao JK. Pleiotropic effects of statins—basic research and clinical perspectives. Circ J. 2010;74(5):818–26.

Zlokovic BV. Neurovascular pathways to neurodegeneration in Alzheimer’s disease and other disorders. Nat Rev Neurosci. 2011;12(12):723–38.

Schultz BG, Patten DK, Berlau DJ. The role of statins in both cognitive impairment and protection against dementia: a tale of two mechanisms. Transl Neurodegener. 2018;7(1):5.

Jamshidnejad-Tosaramandani T, Kashanian S, Al-Sabri MH, Kročianová D, Clemensson LE, Gentreau M, et al. Statins and cognition: modifying factors and possible underlying mechanisms. Front Aging Neurosci. 2022;15:14.

Wagstaff LR, Mitton MW, Arvik BML, Doraiswamy PM. Statin-associated memory loss: analysis of 60 case reports and review of the literature. Pharmacotherapy. 2003;23:871–80.

Evans MA, Golomb BA. Statin-associated adverse cognitive effects: survey results from 171 patients. Pharmacotherapy. 2009;29(7):800–11.

Richardson K, Schoen M, French B, Umscheid CA, Mitchell MD, Arnold SE, et al. Statins and cognitive function: a systematic review. Ann Intern Med. 2013;159(10):688–97.

Samaras K, Brodaty H, Sachdev PS. Does statin use cause memory decline in the elderly? Trends Cardiovasc Med. 2016;26(6):550–65.

Swiger KJ, Manalac RJ, Blumenthal RS, Blaha MJ, Martin SS. Statins and cognition: a systematic review and meta-analysis of short- and long-term cognitive effects. Mayo Clin Proc. 2013;88(11):1213–21.

Mcguinness B, Craig D, Bullock R, Passmore P. Statins for the prevention of dementia. Cochrane Database Syst Rev. 2016;2016(1):CD003160.

PubMed   PubMed Central   Google Scholar  

Adhikari A, Tripathy S, Chuzi S, Peterson J, Stone NJ. Association between statin use and cognitive function: a systematic review of randomized clinical trials and observational studies. J Clin Lipidol. 2021;15(1):22-32.e12.

Chu CS, Tseng PT, Stubbs B, Chen TY, Tang CH, Li DJ, et al. Use of statins and the risk of dementia and mild cognitive impairment: a systematic review and meta-analysis. Sci Rep. 2018;8(1):5804.

Zhu X-C, Dai W-Z, Ma T. Overview the effect of statin therapy on dementia risk, cognitive changes and its pathologic change: a systematic review and meta-analysis. Ann Transl Med. 2018;6(22):435–435.

Poly TN, Islam MM, Walther BA, Yang HC, Wu CC, Lin MC, et al. Association between use of statin and risk of dementia: a meta-analysis of observational studies. Neuroepidemiology. 2020;54(3):214–26.

Olmastroni E, Molari G, De Beni N, Colpani O, Galimberti F, Gazzotti M, et al. Statin use and risk of dementia or Alzheimer’s disease: a systematic review and meta-analysis of observational studies. Eur J Prev Cardiol. 2021.

Collins R, Armitage J, Parish S, Sleight P, Peto R. MRC/BHF Heart Protection Study of cholesterol lowering with simvastatin in 20 536 high-risk individuals: a randomised placebo-controlled trial. Lancet. 2002;360(9326):7–22.

Sano M, Bell KL, Galasko D, Galvin JE, Thomas RG, Van Dyck CH, et al. A randomized, double-blind, placebo-controlled trial of simvastatin to treat Alzheimer disease. Neurology. 2011;77(6):556–63.

Feldman HH, Doody RS, Kivipelto M, Sparks DL, Waters DD, Jones RW, et al. Randomized controlled trial of atorvastatin in mild to moderate Alzheimer disease: LEADe. Neurology. 2010;74(12):956–64.

Murphy C, Dyer AH, Lawlor B, Kennelly SP. What is the impact of ongoing statin use on cognitive decline and dementia progression in older adults with mild-moderate Alzheimer disease? PLoS One. 2023;18(5):e0285529.

Liang T, Li R, Cheng O. Statins for treating Alzheimer’s disease: truly ineffective? Eur Neurol. 2015;73(5–6):360–6.

Davis KAS, Bishara D, Perera G, Molokhia M, Rajendran L, Stewart RJ. Benefits and harms of statins in people with dementia: a systematic review and meta-analysis. J Am Geriatr Soc. 2020;68(3):650–8.

Xuan K, Zhao T, Qu G, Liu H, Chen X, Sun Y. The efficacy of statins in the treatment of Alzheimer’s disease: a meta-analysis of randomized controlled trial. Neurol Sci. 2020;41(6):1391–404.

Power MC, Weuve J, Sharrett AR, Blacker D, Gottesman RF. Statins, cognition, and dementia-systematic review and methodological commentary. Nat Rev Neurol. 2015;11(4):220–9.

Religa D, Fereshtehnejad SM, Cermakova P, Edlund AK, Garcia-Ptacek S, Granqvist N, et al. SveDem, the Swedish Dementia Registry - a tool for improving the quality of diagnostics, treatment and care of dementia patients in clinical practice. PLoS One. 2015;10(2):e0116538.

The Swedish Registry for Cognitive/Dementia Disorders (SveDem) Annual Report 2021. https://www.ucr.uu.se/svedem/om-svedem/arsrapporter/svedem-arsrapport-2021/viewdocument/1063 . Accessed 06 Aug 2023.

Defined Daily Dose. https://www.whocc.no/ddd/definition_and_general_considera/ . Accessed 06 Aug 2023.

Wettermark B, Hammar N, Fored CM, Leimanis A, Olausson PO, Bergman U, et al. The new Swedish Prescribed Drug Register Opportunities for pharmacoepidemiological research and experience from the first six months. Pharmacoepidemiol Drug Saf. 2007;16(7):726–35.

Xu H, Garcia-Ptacek S, Jönsson L, Wimo A, Nordström P, Eriksdotter M. Long-term effects of cholinesterase inhibitors on cognitive decline and mortality. Neurology. 2021;96(17):e2220-30.

Handels R, Jönsson L, Garcia-Ptacek S, Eriksdotter M, Wimo A. Controlling for selective dropout in longitudinal dementia data: application to the SveDem registry. Alzheimer’s Dement. 2020;16(5):789–96.

Schachter M. Chemical, pharmacokinetic and pharmacodynamic properties of statins: an update. Fundam Clin Pharmacol. 2005;19(1):117–25.

Tong XK, Royea J, Hamel E. Simvastatin rescues memory and granule cell maturation through the Wnt/β-catenin signaling pathway in a mouse model of Alzheimer’s disease. Cell Death Dis. 2022;13(4):325.

Hu X, Song C, Fang M, Li C. Simvastatin inhibits the apoptosis of hippocampal cells in a mouse model of Alzheimer’s disease. Exp Ther Med. 2018;15(2):1795–802.

PubMed   CAS   Google Scholar  

Simons M, Schwärzler F, Lütjohann D, Von Bergmann K, Beyreuther K, Dichgans J, et al. Treatment with simvastatin in normocholesterolemic patients with Alzheimer’s disease: a 26-week randomized, placebo-controlled, double-blind trial. Ann Neurol. 2002;52(3):346–50.

Evans BA, Evans JE, Baker SP, Kane K, Swearer J, Hinerfeld D, et al. Long-term statin therapy and CSF cholesterol levels: implications for Alzheimer’s disease. Dement Geriatr Cogn Disord. 2009;27(6):519–24.

Gazzerro P, Proto MC, Gangemi G, Malfitano AM, Ciaglia E, Pisanti S, et al. Pharmacological actions of statins: a critical appraisal in the management of cancer. Pharmacol Rev. 2012;64(1):102–46.

Tramontina AC, Wartchow KM, Rodrigues L, Biasibetti R, Quincozes-Santos A, Bobermin L, et al. The neuroprotective effect of two statins: simvastatin and pravastatin on a streptozotocin-induced model of Alzheimer’s disease in rats. J Neural Transm. 2011;118(11):1641–9.

Sinyavskaya L, Gauthier S, Renoux C, Dell’Aniello S, Suissa S, Brassard P. Comparative effect of statins on the risk of incident Alzheimer disease. Neurology. 2018;90(3):e179-87.

Chandra S, Pahan K. Gemfibrozil, a lipid-lowering drug, lowers amyloid plaque pathology and enhances memory in a mouse model of Alzheimer’s disease via peroxisome proliferator-activated receptor α. J Alzheimer’s Dis Reports. 2019;3(1):149–68.

Zhang X, Wen J, Zhang Z. Statins use and risk of dementia: a dose–response meta analysis. Med (United States). 2018;97(30).

Jeong SM, Shin DW, Yoo TG, Cho MH, Jang W, Lee J, et al. Association between statin use and Alzheimer’s disease with dose response relationship. Sci Rep. 2021;11(1):15280.

Tarawneh R, Holtzman DM. The clinical problem of symptomatic Alzheimer disease and mild cognitive impairment. Cold Spring Harb Perspect Med. 2012;2(5):a006148.

Carlsson CM, Gleason CE, Hess TM, Moreland KA, Blazel HM, Koscik RL, et al. Effects of simvastatin on cerebrospinal fluid biomarkers and cognition in middle-aged adults at risk for Alzheimer’s disease. J Alzheimer’s Dis. 2008;13(2):187–97.

Lin FC, Chuang YS, Hsieh HM, Lee TC, Chiu KF, Liu CK, et al. Early statin use and the progression of Alzheimer disease: a total population-based case-control study. Med (United States). 2015;94(47):e2143.

CAS   Google Scholar  

Sparks DL, Sabbagh M, Connor D, Soares H, Lopez J, Stankovic G, et al. Statin therapy in Alzheimer’s disease. Acta Neurol Scand. 2006;114(SUPPL. 185):78–86.

Wong WB, Lin VW, Boudreau D, Devine EB. Statins in the prevention of dementia and Alzheimer’s disease: a meta-analysis of observational studies and an assessment of confounding. Pharmacoepidemiol Drug Saf. 2013;22(4):345–58.

Ott BR, Daiello LA, Dahabreh IJ, Springate BA, Bixby K, Murali M, et al. Do statins impair cognition? A systematic review and meta-analysis of randomized controlled trials. J Gen Intern Med. 2015;30(3):348–58.

Fink HA, Jutkowitz E, McCarten JR, Hemmy LS, Butler M, Davila H, et al. Pharmacologic interventions to prevent cognitive decline, mild cognitive impairment, and clinical Alzheimer-type dementia: a systematic review. Ann Intern Med. 2018;168(1):39–51.

Deschaintre Y, Richard F, Leys D, Pasquier F. Treatment of vascular risk factors is associated with slower decline in Alzheimer disease. Neurology. 2009;73(9):674–80.

Masse I, Bordet R, Deplanque D, Al Khedr A, Richard F, Libersa C, et al. Lipid lowering agents are associated with a slower cognitive decline in Alzheimer’s disease. J Neurol Neurosurg Psychiatry. 2005;76(12):1624–9.

Rosenberg PB, Mielke MM, Tschanz J, Cook L, Corcoran C, Hayden KM, et al. Effects of cardiovascular medications on rate of functional decline in Alzheimer disease. Am J Geriatr Psychiatry. 2008;16(11):883–92.

Kemp EC, Ebner MK, Ramanan S, Godek TA, Pugh EA, Bartlett HH, et al. Statin use and risk of cognitive decline in the ADNI cohort. Am J Geriatr Psychiatry. 2020;28(5):507–17.

Mielke MM, Zandi PP, Sjögren M, Gustafson D, Östling S, Steen B, et al. High total cholesterol levels in late life associated with a reduced risk of dementia. Neurology. 2005;64(10):1689–95.

Alsehli AM, Olivo G, Clemensson LE, Williams MJ, Schiöth HB. The Cognitive effects of statins are modified by age. Sci Rep. 2020;10(1).

Samaras K, Makkar SR, Crawford JD, Kochan NA, Slavin MJ, Wen W, et al. Effects of statins on memory, cognition, and brain volume in the elderly. J Am Coll Cardiol. 2019;74(21):2554–68.

Zissimopoulos JM, Barthold D, Brinton RD, Joyce G. Sex and race differences in the association between statin use and the incidence of Alzheimer disease. JAMA Neurol. 2017;74(2):225–32.

Download references

Acknowledgements

The authors are grateful to SveDem, www.svedem.se , for providing data for this study. We thank all patients, caregivers, reporting units and coordinators in SveDem as well as SveDem steering committee. SveDem is supported financially by the Swedish Associations of Local Authorities and Regions. Most of the statistical analyses were conducted at the Division of Biostatistics, Institute of Environmental Medicine, Karolinska Institute, 171 77 Solna, Sweden.

Open access funding provided by Karolinska Institute. This study was supported by the regional agreement on medical training and clinical research between the Stockholm Region and the Karolinska Institutet (ALF); the Health, Medicine and Technique grants from the Stockholm Region and Kungliga Tekniska Högskolan-KTH; Swedish medical research council grant (VR) # 2022-01425 (Garcia-Ptacek) and 2020-02014 (Eriksdotter); Stiftelsen Dementia; Margaretha af Ugglas Foundation; Karolinska Institutet Research Foundation; Karolinska Institutet Foundation for Diseases of Aging; Johanniterorden i Sverige/Swedish Order of St John; the Erling Persson foundation; and by the private initiative "Innovative ways to fight Alzheimer´s disease - Leif Lundblad Family and others".

Author information

Authors and affiliations.

Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden

Bojana Petek, Irena Kalar, Silvia Maioli, Bengt Winblad & Milica Gregoric Kramberger

Clinical Institute of Genomic Medicine, University Medical Centre Ljubljana, Ljubljana, Slovenia

Bojana Petek

Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia

Bojana Petek, Irena Kalar & Milica Gregoric Kramberger

Medical Statistics Unit, Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden

Henrike Häbel

Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden

Hong Xu, Marta Villa-Lopez, Minh Tuan Hoang, Shayan Mostafaei, Maria Eriksdotter & Sara Garcia-Ptacek

Faculty of Medicine, University Complutense of Madrid, Madrid, Spain

Marta Villa-Lopez

Department of Neurology, University of Alberta Hospital, Edmonton, Canada

Department of Neurology, University Medical Centre Ljubljana, Ljubljana, Slovenia

Irena Kalar & Milica Gregoric Kramberger

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden

Minh Tuan Hoang & Shayan Mostafaei

Center for Alzheimer Research, Department of Neurobiology, Care Sciences and Society, Division of Neurogeriatrics, Karolinska Institutet, Stockholm, Sweden

Silvia Maioli

Division of Neuro, Department of Clinical Neurosciences, Karolinska Institutet, Stockholm, Sweden

Joana B. Pereira

Aging and Inflammation Theme, Karolinska University Hospital, Stockholm, Sweden

Bengt Winblad, Maria Eriksdotter & Sara Garcia-Ptacek

You can also search for this author in PubMed   Google Scholar

Contributions

BP contributed to study design, interpretation of data, drafted and revised the manuscript. SGP designed the study, contributed to acquisition and analysis of data, drafted, and revised the manuscript, and approved the final version. HH contributed to data preparation, performed the analysis and interpretation of data, revised, and approved the final version of the manuscript. HX contributed to study design, acquisition, and interpretation of data, revised, and approved the final version of manuscript. ME contributed to data acquisition, study design, critically revised the manuscript and approved the final version of the manuscript. MVL, IK, MTH, SM, JBP, SM, BW, MGK critically revised the manuscript and approved the final version of the manuscript

Corresponding authors

Correspondence to Bojana Petek or Sara Garcia-Ptacek .

Ethics declarations

Ethics approval and consent to participate.

This study complies with the Declaration of Helsinki and was approved by the regional human ethics committee in Stockholm (numbers: 2015/743-31/4 and 2017/942-32). All patients and their relatives were informed of inclusion in SveDem at the time of diagnosis and had the right to decline the participation or withdraw the data from the registry at any point. The data were de-identified before analysis.

Consent for publication

Not applicable.

Competing interests

SGP holds stock in AstraZeneca, Bioarctic, Calliditas, Camurus, Dynavax, Moderna, Novo Nordisk, Pfizer, Sanofi, and Vicore. The other authors report no conflicts of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

Supplementary table 1. Cognitive decline in subgroups of statins users vs users of non-users of statins. Supplementary table 2. Cognitive decline in subgroups of simvastatin users vs rosuvastatin users. Supplementary table 3. Cognitive decline in subgroups of lipophilic statins users vs hydrophilic statins users. Supplementary table 4. Cognitive decline in subgroups of fungal statins users vs synthetic statins users. Supplementary table 5. Cognitive decline in subgroups of non-statin lipid-lowering medication users vs statins users. Supplementary table 6. Cognitive decline in different treatment groups, sensitivity analysis. Supplementary table 7. Characteristics of incident users.

ICD-10 codes for comorbidities

Diabetes mellitus/insulin/other antidiabetics (E1[0-4] A10A A10B), arrhythmia (I49), alcohol-related disease (E244 F10 G312 G621 G721 I426 K292 K70 K860 O354 P043 Q860 T51 Y90 Y91 Z502 Z714), chronic kidney disease (N18 N19 I120 I131 N032 N033 N034 N035 N036 N037 N052 N053 N054 N055 N056 N057 N18 N19 N250 T856 T857 Z490 Z491 Z492 Z940 Z992), heart failure (I500, I501, I509), cardiovascular disease (I21[0-4] I219 I220 I221 I228 I229 I25 I63 I739 I70), myocardial infarction (I21 I22 I252), respiratory disease (J4[0-7] J6[0-7] J684 J701 J703), haemorrhagic stroke (I60 I61 I62), other stroke (I63 I64 I69), anaemia (D50 D62), liver failure (K7), ischemic heart disease (I20 I21 I22 I23 I24 I25), malignancy (C[0-9][0-9] D[1-3]D4[0-8]) and obesity (E66).

ATC codes for lipid-lowering medication

Simvastatin (C10AA01 C10BA02 C10BX01 C10BX04), atorvastatin (C10AA05 C10BA05 C10BX03, C10BX06 C10BX08 C10BX11 C10BX12 C10BX15), pravastatin (C10AA03), fluvastatin (C10AA04), rosuvastatin (C10AA07 C10BA06 C10BX05 C10BX07 C10BX09 C10BX10 C10BX13 C10BX14), pitavastatin (C10AA08), fibrates (C10AB), bile acid sequestrants (C10AC), nicotinic acid and derivates (C10AD) and other lipid modifying agents (C10AX).

ATC codes for comedication

Cardiac drugs (C01), vasoprotectives (C05), platelet aggregation inhibitors (B01AC), anticoagulants (B01AA B01AB B01AF B01AE07), antipsychotics (N05A), anxiolytics (N05B), hypnotics (N05C), antidepressants (N06A), anticholinesterases (N06DA), memantine (N06DX01) and vitamin D (A11CC).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Petek, B., Häbel, H., Xu, H. et al. Statins and cognitive decline in patients with Alzheimer’s and mixed dementia: a longitudinal registry-based cohort study. Alz Res Therapy 15 , 220 (2023). https://doi.org/10.1186/s13195-023-01360-0

Download citation

Received : 06 August 2023

Accepted : 28 November 2023

Published : 20 December 2023

DOI : https://doi.org/10.1186/s13195-023-01360-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cohort study
  • HMG-CoA reductase inhibitors
  • Alzheimer's disease
  • Cognitive impairment
  • Cholesterol metabolism

Alzheimer's Research & Therapy

ISSN: 1758-9193

research study on memory

Advertisement

Supported by

Three Receive Nobel in Economics for Research on Global Inequality

Daron Acemoglu, Simon Johnson and James Robinson shared the award for their work on explaining the gaps in prosperity between nations.

  • Share full article

Nobel Economics Prize Shared Among Three

Daron acemoglu, simon johnson and james robinson received the prize for their work on explaining inequality between countries..

The Royal Swedish Academy of Sciences has decided to award Sveriges Riksbank Prize in economic sciences in memory of Alfred Nobel for 2024 to Daron Acemoglu, MIT, Cambridge, USA. Simon Johnson, MIT, Cambridge, USA. And James Robinson, University of Chicago, USA. For studies of how institutions are formed and affect prosperity.

Video player loading

By Jeanna Smialek

Jeanna Smialek is an economics journalist reporting from Washington.

The Nobel Memorial Prize in Economic Sciences was awarded on Monday to Daron Acemoglu and Simon Johnson, both of the Massachusetts Institute of Technology, and to James Robinson of the University of Chicago.

They received the prize for their research into how institutions shape which countries become wealthy and prosperous — and how those structures came to exist in the first place.

The laureates delved into the world’s colonial past to trace how gaps emerged between nations, arguing that countries that started out with more inclusive institutions during the colonial period tended to become more prosperous. Their pioneering use of theory and data has helped to better explain the reasons for persistent inequality between nations, according to the Nobel committee.

“Reducing the huge differences in income between countries is one of our times’ greatest challenges,” Jakob Svensson, chairman of the economics prize committee, said while announcing the award. Thanks to the economists’ “groundbreaking research,” he said, “we have a much deeper understanding of the root causes of why countries fail or succeed.”

According to the researchers, prosperity today is partly a legacy of how a nation’s institutions evolved over time — which they studied by looking at what happened to countries during European colonization.

Countries with “inclusive” institutions that protected personal property rights and allowed for widespread economic participation tended to end up on a pathway to longer-term prosperity. Those that had what the researchers called “extractive” institutions — ones that helped elites to maintain control, but which gave workers little hope of sharing in the wealth — merely provided short-term gains for the people in power.

“Rather than asking whether colonialism is good or bad, we note that different colonial strategies have led to different institutional patterns that have persisted over time,” Dr. Acemoglu said during a news conference after the prize was announced.

“Broadly speaking, the work that we have done favors democracy,” he said.

In fact, the laureates found that different types of colonization brought about big shifts in fortunes. European nations used more authoritarian systems to control places that were densely populated at the time of colonization, while those that were sparsely populated often saw more settlers and established a more inclusive form of governance — if not quite a democratic one. Over time, the countries in question saw their economic fates flip: While the Aztec empire was more populous and rich than North America at the time of early European exploration, for instance, today the United States and Canada have outstripped Mexico in economic prosperity.

“This reversal of relative prosperity is historically unique,” the Nobel release explained. Places that were not colonized did not see a similar shift, the committee said.

The legacy is still visible today, the researchers said. As an example, Dr. Acemoglu and Dr. Robinson point to the city of Nogales, which straddles the border between Mexico and Arizona.

Northern Nogales is more affluent than its southern portion, despite a shared culture and location. The driver of differences, the economists argue, is the institutions governing the two halves of the city.

The economists wrote books based on their work, including “ Why Nations Fail ,” by Dr. Acemoglu and Dr. Robinson, and “ Power and Progress ,” by Dr. Acemoglu and Dr. Johnson, published last year.

The economists’ arguments have at times been debated , including by academics who think that culture matters more to development than they let on. A few economists posted criticism on social media after the announcement of the award, with some accusing that their research was too centered on European ideals.

In an interview, Dr. Acemoglu said he thought that the European experience in democracy had important lessons for the world, and also that he didn’t think it “means it’s a one size fits all.”

And while their research tends to favor democracy, Dr. Acemoglu acknowledged during Monday’s news conference that it “is not a panacea.”

Representative government can be hard to introduce and volatile, for one thing. And there are pathways to growth for countries that are not democracies, he said, including rapidly tapping a nation’s resources to ramp up economic progress. But, Dr. Acemoglu said, “more authoritarian growth” is often more unstable and less innovative.

He said that there were many challenges confronting modern democracies, from polarization and social media to climate change and new technological developments like artificial intelligence.

“Democracy needs to work harder, too,” he said, explaining that the many people left behind economically in places like the United States show that the system is not working perfectly. “Many people are discontent, and many people feel like they don’t have a voice — and that’s not what democracy promises.”

Dani Rodrik, an economist at the Harvard Kennedy School who studies globalization and development, said that the three laureates had contributed a clearer understanding that democracy could be important to successful development — something that had not always been widely accepted within the profession.

“They’ve elevated the important and positive impact of democracy on long-term economic performance,” he said.

The researchers have also helped to make studying history and institutions, formerly out of vogue, “cool again,” Dr. Rodrik said.

Many economists noted that this award was a long time coming. Dr. Acemoglu has for years topped lists of who might win a Nobel.

He was in Athens when he got the call.

“You dream of having a good career, but this is over and on top of that,” Dr. Acemoglu said during the news conference.

Dr. Johnson, a former chief economist at the International Monetary Fund, woke up to a bevy of congratulatory text messages in the United States. He said in an interview that he was “surprised and delighted.”

Who was awarded the economics prize in 2023?

Last year, Claudia Goldin was awarded for her research uncovering the reasons for gender gaps in labor force participation and earnings.

Who else received a prize this year?

The Nobel in physiology or medicine went to Victor Ambros and Gary Ruvkun for their discovery of microRNA, which helps determine how cells develop and function.

The award for physics was awarded to John J. Hopfield and Geoffrey E. Hinton for discoveries that helped computers learn more in the way that the human brain does, providing the building blocks for developments in artificial intelligence.

The prize for chemistry was shared between Demis Hassabis and John M. Jumper of Google , who used A.I. to predict the structure of millions of proteins; and David Baker, who used computer software to invent a new protein.

The literature prize went to Han Kang , the first writer from South Korea to receive the award, for “her intense poetic prose that confronts historical traumas.”

The Nobel Peace Prize was awarded to the Japanese grass-roots movement Nihon Hidankyo , which for decades has represented hundreds of thousands of survivors of the U.S. atomic bombings of Hiroshima and Nagasaki in 1945.

Jeanna Smialek covers the Federal Reserve and the economy for The Times from Washington. More about Jeanna Smialek

COMMENTS

  1. 10 Influential Memory Theories and Studies in Psychology

    The human memory has been the subject of investigation among many 20th Century psychologists and remains an active area of study for today's cognitive scientists. Below we take a look at some of the most influential studies, experiments and theories that continue to guide our understanding of the function of memory.

  2. Memory Studies: Sage Journals

    Memory Studies affords recognition, form and direction to work in this nascent field, and provides a peer-reviewed, critical forum for dialogue and debate on the theoretical, empirical, and methodological issues central to a collaborative understanding of memory today.Memory Studies examines the social, cultural, cognitive, political and technological shifts affecting how, what and why ...

  3. Study shows how taking short breaks may help our brains learn new

    Above is a map of the memory replay activity observed in the study. Cohen lab, NINDS In a study of healthy volunteers, National Institutes of Health researchers have mapped out the brain activity that flows when we learn a new skill, such as playing a new song on the piano, and discovered why taking short breaks from practice is a key to learning.

  4. Effects of Audiovisual Memory Cues on Working Memory Recall

    Memory tasks have been used in past research as a test of efficiency and benefits of multisensory learning. These studies focus mostly on short-term or working memory (WM), which is a temporary store of information with a limited capacity. A memory cue is a stimulus that aids in the retrieval of a memory trace from memory .

  5. Focus on learning and memory

    We hope to continue to publish outstanding research in this area, particularly studies that resolve long-standing questions, that develop or leverage new methodologies, and that integrate multiple ...

  6. Journal of Applied Research in Memory and Cognition

    The Journal of Applied Research in Memory and Cognition (JARMAC) publishes the highest-quality applied research in memory and cognition, in the format of empirical reports, review articles, and target papers with invited peer commentary. The goal of this unique journal is to reach psychological scientists and other researchers working in this field and related areas, as well as professionals ...

  7. Evidence of memory from brain data

    This work is substantially advancing the basic science research in memory studies—including helping determine what kinds of subtly different cognitive processes have distinct neural substrates. This work also leverages real-life experiences, is able to assess subjective mnemonic status at a single-trial, individual level, and can use a ...

  8. How neurons form long-term memories

    This may be how a sparse group of neurons can be linked together to ultimately encode a memory." The study findings represent a possible molecular- and circuit-level mechanism for long-term memory. They shed new light on the fundamental biology of memory formation and have broad implications for diseases of memory dysfunction.

  9. Journal of Experimental Psychology: Learning, Memory, and Cognition

    Empirical research, including meta-analyses, submitted to the Journal of Experimental Psychology: Learning, Memory, and Cognition must, at a minimum, meet Level 1 (Disclosure) for all eight aspects of research planning and reporting as well as Level 2 (Requirement) for citation; data, code, and materials transparency; and study and analysis ...

  10. The impact of the design of learning spaces on attention and memory

    Another study employed a memory-oriented task of recognizing meaningless images to assess the impact of physical surroundings on the memory of persons using the learning space (Xiong et al., 2018). The researchers found that memory is considerably affected by the crossover between lighting and temperature, whereas stimulating participants by ...

  11. Inside the Science of Memory

    New Discoveries in Memory. Many of the research questions surrounding memory may have answers in complex interactions between certain brain chemicals—particularly glutamate—and neuronal receptors, which play a crucial role in the signaling between brain cells. Huganir and his team discovered that when mice are exposed to traumatic events ...

  12. Memory News

    Aug. 28, 2024 — New research explores music's impact on learning, memory, and emotions in two studies. One reveals that familiar music can enhance concentration and learning, while the other ...

  13. PDF Methods to Study Human Memory

    whole-brain activity patterns, fMRI has enjoyed enormous popularity as a tool to study human memory. fMRI enabled testing and directly contrasting different cognitive models of memory in humans (e.g. activations of long-term memory representations, recruitment of sensory or motor areas, working memory capacity limits; D'Esposito & Postle, 2015).

  14. (PDF) Memory Types and Mechanisms

    ence and the Study of Memory. Neuron. 1998 Mar;20(3):445-68. ... Join ResearchGate to discover and stay up-to-date with the latest research from leading experts in Memory and many other ...

  15. Brief, daily meditation enhances attention, memory, mood, and ...

    Affiliations 1 New York University, Center for Neural Science, 4 Washington Place, Room 809, New York, NY 10003, United States; Virginia Tech Carilion Research Institute, Center for Transformative Research on Health Behaviors, 1 Riverside Circle, Suite 104G, Roanoke, VA 24016, United States. Electronic address: [email protected]. 2 New York University, Center for Neural Science, 4 Washington Place ...

  16. Stanford researchers observe memory formation in real time

    In their new study, published July 8, 2022 in Neuron, the researchers trained mice to use their paws to reach food pellets through a small slot.Using genetic wizardry developed by the lab of Liqun Luo, a Wu Tsai Neurosciences Institute colleague in the Department of Biology, the researchers were able to identify specific neurons in the brain's motor cortex — an area responsible for ...

  17. Research Methods for Memory Studies on JSTOR

    This guide provides students and researchers with a clear set of outlines and discussions of particular methods of research in memory studies. It offers not on...

  18. Memory: a timeline of discoveries

    A timeline of discoveries about human memory, exploring key milestones and advancements in understanding how memories are formed and stored.

  19. Memory Studies

    Memory is a key area for interdisciplinary research. Memory Studies is thus not only a multidisciplinary field but fundamentally an interdisciplinary project involving fields as diverse as history, sociology, international relations, art, literary and media studies, anthropology, philosophy, theology, psychology, and the neurosciences.

  20. The Impact of a Mnemonic Acronym on Learning and Performing a

    Institute of Psychology and Ergonomics, Berlin Institute of Technology, Berlin, Germany; The present study examines the potential impact of a mnemonic acronym on the learning, the execution, the resilience toward interruptions, and the mental representation of an eight-step procedural task with sequential constraints. 65 participants were required to learn a sequential task, including eight ...

  21. Statins and cognitive decline in patients with Alzheimer's and mixed

    We performed a longitudinal cohort study of patients with AD or mixed dementia and indication for lipid-lowering treatment, registered in the Swedish registry for dementia (SveDem). SveDem is a nationwide quality-of-care registry, established in 2007 . All memory clinics and 78 % of primary care centres in Sweden report to SveDem . From this ...

  22. Three Receive Nobel in Economics for Research on Global Inequality

    The Royal Swedish Academy of Sciences has decided to award Sveriges Riksbank Prize in economic sciences in memory of Alfred Nobel for 2024 to Daron Acemoglu, MIT, Cambridge, USA. Simon Johnson ...