• NAEYC Login
  • Member Profile
  • Hello Community
  • Accreditation Portal
  • Online Learning
  • Online Store

Popular Searches:   DAP ;  Coping with COVID-19 ;  E-books ;  Anti-Bias Education ;  Online Store

The Power of Playful Learning in the Early Childhood Setting

a child playing in a box

You are here

Play versus learning represents a false dichotomy in education (e.g., Hirsh-Pasek & Golinkoff 2008). In part, the persistent belief that learning must be rigid and teacher directed—the opposite of play—is motivated by the lack of a clear definition of what constitutes playful learning (Zosh et al. 2018). And, in part, it is motivated by older perceptions of play and learning. Newer research, however, allows us to reframe the debate as learning via play—as playful learning.

This piece, which is an excerpt from Chapter 5 in  Developmentally Appropriate Practice in Early Childhood Programs Serving Children from Birth Through Age 8, Fourth Edition (NAEYC 2022), suggests that defining play on a spectrum (Zosh et al. 2018, an idea first introduced by Bergen 1988) helps to resolve old divisions and provides a powerful framework that puts  playful learning —rich curriculum coupled with a playful pedagogy—front and center as a model for all early childhood educators. ( See below for a discussion of play on a spectrum.)

This excerpt also illustrates the ways in which play and learning mutually support one another and how teachers connect learning goals to children’s play. Whether solitary, dramatic, parallel, social, cooperative, onlooker, object, fantasy, physical, constructive, or games with rules, play, in all of its forms, is a teaching practice that optimally facilitates young children’s development and learning. By maximizing children’s choice, promoting wonder and enthusiasm for learning, and leveraging joy, playful learning pedagogies support development across domains and content areas and increase learning relative to more didactic methods (Alfieri et al. 2011; Bonawitz et al. 2011; Sim & Xu 2015).

Playful Learning: A Powerful Teaching Tool

literature review on play based learning

This narrowing of the curriculum and high-stakes assessment practices (such as paper-and-pencil tests for kindergartners) increased stress on educators, children, and families but failed to deliver on the promise of narrowing—let alone closing—the gap.  All  children need well-thought-out curricula, including reading and STEM experiences and an emphasis on executive function skills such as attention, impulse control, and memory (Duncan et al. 2007). But to promote happy, successful, lifelong learners, children must be immersed in developmentally appropriate practice and rich curricular learning that is culturally relevant (NAEYC 2020). Playful learning is a vehicle for achieving this. Schools must also address the inequitable access to play afforded to children (see “Both/And: Early Childhood Education Needs Both Play and Equity,” by Ijumaa Jordan.) All children should be afforded opportunities to play, regardless of their racial group, socioeconomic class, and disability if they have been diagnosed with one. We second the call of Maria Souto-Manning (2017): “Although play has traditionally been positioned as a privilege, it must be (re)positioned as a right, as outlined by the  United Nations Convention on the Rights of the Child, Article 31” (785).

What Is Playful Learning?

Playful learning describes a learning context in which children learn content while playing freely (free play or self-directed play), with teacher guidance (guided play), or in a structured game. By harnessing children’s natural curiosity and their proclivities to experiment, explore, problem solve, and stay engaged in meaningful activities—especially when doing so with others—teachers maximize learning while individualizing learning goals. Central to this concept is the idea that teachers act more as the Socratic “guide at the side” than a “sage on the stage” (e.g., King 1993, 30; Smith 1993, 35). Rather than view children as empty vessels receiving information, teachers see children as active explorers and discoverers who bring their prior knowledge into the learning experience and construct an understanding of, for example, words such as  forecast  and  low pressure  as they explore weather patterns and the science behind them. In other words, teachers support children as active learners.

Importantly, playful learning pedagogies naturally align with the characteristics that research in the science of learning suggests help humans learn. Playful learning leverages the power of active (minds-on), engaging (not distracting), meaningful, socially interactive, and iterative thinking and learning (Zosh et al. 2018) in powerful ways that lead to increased learning.

Free play lets children explore and express themselves—to be the captains of their own ship. While free play is important, if a teacher has a learning goal, guided play and games are the road to successful outcomes for children (see Weisberg, Hirsh-Pasek, & Golinkoff 2013 for a review). Playful learning in the form of guided play, in which the teacher builds in the learning as part of a fun context such as a weather report, keeps the child’s agency but adds an intentional component to the play that helps children learn more from the experience. In fact, when researchers compared children’s skill development during free play in comparison to guided play, they found that children learned more vocabulary (Toub et al. 2018) and spatial skills (Fisher et al. 2013) in guided play than in free play.

Self-Directed Play, Free Play

NAEYC’s 2020 position statement on developmentally appropriate practice uses the term  self-directed play  to refer to play that is initiated and directed by children. Such play is termed  free play  in the larger works of the authors of this excerpt; therefore, free play is the primary term used in this article, with occasional references to self-directed play, the term used in the rest of the DAP book.

Imagine an everyday block corner. The children are immersed in play with each other—some trying to build high towers and others creating a tunnel for the small toy cars on the nearby shelves. But what if there were a few model pictures on the wall of what children could strive to make as they collaborated in that block corner? Might they rotate certain pieces purposely? Might they communicate with one another that the rectangle needs to go on top of the square? Again, a simple insertion of a design that children can try to copy turns a play situation into one ripe with spatial learning. Play is a particularly effective way to engage children with specific content learning when there is a learning goal.

Why Playful Learning Is Critical

Teachers play a crucial role in creating places and spaces where they can introduce playful learning to help all children master not only content but also the skills they will need for future success. The science of learning literature (e.g., Fisher et al. 2013; Weisberg, Hirsh-Pasek, & Golinkoff 2013; Zosh et al. 2018) suggests that playful learning can change the “old equation” for learning, which posited that direct, teacher-led instruction, such as lectures and worksheets, was the way to achieve rich content learning. This “new equation” moves beyond a sole focus on content and instead views playful learning as a way to support a breadth of skills while embracing developmentally appropriate practice guidelines (see Hirsh-Pasek et al. 2020).

Using a playful learning pedagogical approach leverages the skill sets of today’s educators and enhances their ability to help children attain curricular goals. It engages what has been termed active learning that is also developmentally appropriate and offers a more equitable way of engaging children by increasing access to participation. When topics are important and culturally relevant to children, they can better identify with the subject and the learning becomes more seamless.

While educators of younger children are already well versed in creating playful and joyful experiences to support social goals (e.g., taking turns and resolving conflicts), they can use this same skill set to support more content-focused curricular goals (e.g., mathematics and literacy). Similarly, while teachers of older children have plenty of experience determining concrete content-based learning goals (e.g., attaining Common Core Standards), they can build upon this set of skills and use playful learning as a pedagogy to meet those goals.

Learning Through Play: A Play Spectrum

As noted previously, play can be thought of as lying on a spectrum that includes free play (or self-directed play), guided play, games, playful instruction, and direct instruction (Bergen 1988; Zosh et al. 2018). For the purposes of this piece, we use a spectrum that includes the first three of these aspects of playful learning, as illustrated in “Play Spectrum Showing Three Types of Playful Learning Situations” below.

The following variables determine the degree to which an activity can be considered playful learning:

  • level of adult involvement
  • extent to which the child is directing the learning
  • presence of a learning goal

Toward the left end of the spectrum are activities with more child agency, less adult involvement, and loosely defined or no particular learning goals. Further to the right, adults are more involved, but children still direct the activity or interaction.

Developmentally appropriate practice does not mean primarily that children play without a planned learning environment or learn mostly through direct instruction (NAEYC 2020). Educators in high-quality early childhood programs offer a range of learning experiences that fall all along this spectrum. By thinking of play as a spectrum, educators can more easily assess where their learning activities and lessons fall on this spectrum by considering the components and intentions of the lesson. Using their professional knowledge of how children develop and learn, their knowledge of individual children, and their understanding of social and cultural contexts, educators can then begin to think strategically about how to target playful learning (especially guided play and games) to leverage how children naturally learn. This more nuanced view of play and playful learning can be used to both meet age-appropriate learning objectives and support engaged, meaningful learning.   

literature review on play based learning

In the kindergarten classroom in the following vignette, children have ample time for play and exploration in centers, where they decide what to play with and what they want to create. These play centers are the focus of the room and the main tool for developing social and emotional as well as academic skills; they reflect and support what the children are learning through whole-group discussions, lessons, and skills-focused stations. In the vignette, the teacher embeds guided play opportunities within the children’s free play.

Studying Bears: Self-Directed Play that Extends What Kindergartners Are Learning

While studying the habits of animals in winter, the class is taking a deeper dive into the lives of American black bears, animals that make their homes in their region. In the block center, one small group of children uses short lengths and cross-sections of real tree branches as blocks along with construction paper to create a forest habitat for black bear figurines. They enlist their friends in the art center to assist in making trees and bushes. Two children are in the writing center. Hearing that their friends are looking for help to create a habitat, they look around and decide a hole punch and blue paper are the perfect tools for making blueberries—a snack black bears love to eat! Now multiple centers and groups of children are involved in making the block center become a black bear habitat.

In the dramatic play center, some of the children pretend to be bear biologists, using stethoscopes, scales, and magnifying glasses to study the health of a couple of plush black bears. When these checkups are complete, the teacher suggests the children could describe the bears’ health in a written “report,” thus embedding guided play within their free play. A few children at the easels in the art center are painting pictures of black bears.

Contributed by Amy Blessing

Free play, or self-directed play, is often heralded as the gold standard of play. It encourages children’s initiative, independence, and problem solving and has been linked to benefits in social and emotional development (e.g., Singer & Singer 1990; Pagani et al. 2010; Romano et al. 2010; Gray 2013) and language and literacy (e.g., Neuman & Roskos 1992). Through play, children explore and make sense of their world, develop imaginative and symbolic thinking, and develop physical competence. The kindergarten children in the example above were developing their fine motor and collaboration skills, displaying their understanding of science concepts (such as the needs of animals and living things), and exercising their literacy and writing skills. Such benefits are precisely why free play has an important role in developmentally appropriate practice. To maximize learning, teachers also provide guided play experiences.

Guided Play

While free play has great value for children, empirical evidence suggests that it is not always sufficient  when there is a pedagogical goal at stake  (Smith & Pellegrini 2008; Alfieri et al. 2011; Fisher et al. 2013; Lillard 2013; Weisberg, Hirsh-Pasek, & Golinkoff 2013; Toub et al. 2018). This is where guided play comes in.

Guided play allows teachers to focus children’s play around specific learning goals (e.g., standards-based goals), which can be applied to a variety of topics, from learning place value in math to identifying rhyming words in literacy activities. Note, however, that the teacher does not take over the play activity or even direct it. Instead, she asks probing questions that guide the next level of child-directed exploration. This is a perfect example of how a teacher can initiate a context for learning while still leaving the child in charge. In the previous kindergarten vignette, the teacher guided the children in developing their literacy skills as she embedded writing activities within the free play at the centers.

Facilitating Guided Play

Skilled teachers set up environments and facilitate development and learning throughout the early childhood years, such as in the following:

  • Ms. Taglieri notices what 4-month-old Anthony looks at and shows interest in. Following his interest and attention, she plays Peekaboo, adjusting her actions (where she places the blanket and peeks out at him) to maintain engagement.
  • Ms. Eberhard notices that 22-month-old Abe knows the color yellow. She prepares her environment based on this observation, placing a few yellow objects along with a few red ones on a small table. Abe immediately goes to the table, picking up each yellow item and verbally labeling them (“Lellow!”).
  • Mr. Gorga creates intrigue and participation by inviting his preschool class to “be shape detectives” and to “discover the secret of shapes.” As the children explore the shapes, Mr. Gorga offers questions and prompts to guide children to answer the question “What makes them the same kind of shapes?”

An analogy for facilitating guided play is bumper bowling. If bumpers are in place, most children are more likely than not to knock down some pins when they throw the ball down the lane. That is different than teaching children exactly how to throw it (although some children, such as those who have disabilities or who become frustrated if they feel a challenge is too great, may require that level of support or instruction). Guided play is not a one-size-fits-all prescriptive pedagogical technique. Instead, teachers match the level of support they give in guided play to the children in front of them.

Critically, many teachers already implement these kinds of playful activities. When the children are excited by the birds they have seen outside of their window for the past couple of days, the teachers may capitalize on this interest and provide children with materials for a set of playful activities about bird names, diets, habitats, and songs. Asking children to use their hands to mimic an elephant’s trunk when learning vocabulary can promote learning through playful instruction that involves movement. Similarly, embedding vocabulary in stories that are culturally relevant promotes language and early literacy development (García-Alvarado, Arreguín, & Ruiz-Escalante 2020). For example, a teacher who has several children in his class with Mexican heritage decides to read aloud  Too Many Tamales  (by Gary Soto, illus. Ed Martinez) and have the children reenact scenes from it, learning about different literary themes and concepts through play. The children learn more vocabulary, have a better comprehension of the text, and see themselves and their experiences reflected. The teacher also adds some of the ingredients and props for making tamales into the sociodramatic play center (Salinas-González, Arreguín-Anderson, & Alanís 2018) and invites families to share stories about family  tamaladas  (tamale-making parties).

Evidence Supporting Guided Play as a Powerful Pedagogical Tool

Evidence from the science of learning suggests that discovery-based guided play actually results in increased learning for all children relative to both free play and direct instruction (see Alferi et al. 2011). These effects hold across content areas including spatial learning (Fisher et al. 2013), literacy (Han et al. 2010; Nicolopoulou et al. 2015; Hassinger-Das et al. 2016; Cavanaugh et al. 2017; Toub et al. 2018; Moedt & Holmes 2020), and mathematics (Zosh et al. 2016).

There are several possible reasons for guided play’s effectiveness. First, it harnesses the joy that is critical to creativity and learning (e.g., Isen, Daubman, & Nowicki 1987; Resnick 2007). Second, during guided play, the adults help “set the stage for thought and action” by essentially limiting the number of possible outcomes for the children so that the learning goal is discoverable, but children still direct the activity (Weisberg et al. 2014, 276). Teachers work to provide high-quality materials, eliminate distractions, and prepare the space, but then, critically, they let the child play the active role of construction. Third, in guided play, the teacher points the way toward a positive outcome and hence lessens the ambiguity (the degrees of freedom) without directing children to an answer or limiting children to a single discovery (e.g., Bonawitz et al. 2011). And finally, guided play provides the opportunity for new information to be integrated with existing knowledge and updated as children explore.

Reinforcing Numeracy with a Game

The children in Mr. Cohen’s preschool class are at varying levels of understanding in early numeracy skills (e.g., cardinality, one-to-one correspondence, order irrelevance). He knows that his children need some practice with these skills but wants to make the experience joyful while also building these foundational skills. One day, he brings out a new game for them to play—The Great Race. Carla and Michael look up expectantly, and their faces light up when they realize they will be playing a game instead of completing a worksheet. The two quickly pull out the box, setting up the board and choosing their game pieces. Michael begins by flicking the spinner with his finger, landing on 2. “Nice!” Carla exclaims, as Michael moves his game piece, counting “One, two.” Carla takes a turn next, spinning a 1 and promptly counting “one” as she moves her piece one space ahead. “My turn!” Michael says, eager to win the race. As he spins a 2, he pauses. “One . . . two,” he says, hesitating, as he moves his piece to space 4 on the board. Carla corrects him, “I think you mean ‘three, four,’ right? You have to count up from where you are on the board.” Michael nods, remembering the rules Mr. Cohen taught him earlier that day. “Right,” he says, “three, four.”

Similar to guided play, games can be designed in ways that help support learning goals (Hassinger-Das et al. 2017). In this case, instead of adults playing the role of curating the activity, the games themselves provide this type of external scaffolding. The example with Michael and Carla shows how children can learn through games, which is supported by research. In one well-known study, playing a board game (i.e., The Great Race) in which children navigated through a linear, numerical-based game board (i.e., the game board had equally spaced game spaces that go from left to right) resulted in increased numerical development as compared to playing the same game where the numbers were replaced by colors (Siegler & Ramani 2008) or with numbers organized in a circular fashion (Siegler & Ramani 2009). Structuring experiences so that the learning goal is intertwined naturally with children’s play supports their learning. A critical point with both guided play and games is that children are provided with support but still lead their own learning.

Digital educational games have become enormously popular, with tens of thousands of apps marketed as “educational,” although there is no independent review of these apps. Apps and digital games may have educational value when they inspire active, engaged, meaningful, and socially interactive experiences (Hirsh-Pasek et al. 2015), but recent research suggests that many of the most downloaded educational apps do not actually align with these characteristics that lead to learning (Meyer et al. 2021). Teachers should exercise caution and evaluate any activity—digital or not—to see how well it harnesses the power of playful learning.

Next Steps for Educators

Educators are uniquely positioned to prepare today’s children for achievement today and success tomorrow. Further, the evidence is mounting that playful pedagogies appear to be an accessible, powerful tool that harnesses the pillars of learning. This approach can be used across ages and is effective in learning across domains.

By leveraging children’s own interests and mindfully creating activities that let children play their way to new understanding and skills, educators can start using this powerful approach today. By harnessing the children’s interests at different ages and engaging them in playful learning activities, educators can help children learn while having fun. And, importantly, educators will have more fun too when they see children happy and engaged.

As the tide begins to change in individual classrooms, educators need to acknowledge that vast inequalities (e.g., socioeconomic achievement gaps) continue to exist (Kearney & Levine 2016). The larger challenge remains in propelling a cultural shift so that administrators, families, and policymakers understand the way in which educators can support the success of all children through high-quality, playful learning experiences.

Consider the following reflection questions as you reflect how to support equitable playful learning experiences for each and every child:

  • One of the best places to start is by thinking about your teaching strengths. Perhaps you are great at sparking joy and engagement. Or maybe you are able to frequently leverage children’s home lives in your lessons. How can you expand practices you already use as an educator or are learning about in your courses to incorporate the playful learning described in this article?
  • How can you share the information in this chapter with families, administrators, and other educators? How can you help them understand how play can engage children in deep, joyful learning?

This piece is excerpted from NAEYC’s recently published book  Developmentally Appropriate Practice in Early Childhood Programs Serving Children from Birth Through Age 8,  Fourth Edition. For more information about the book, visit  NAEYC.org/resources/pubs/books/dap-fourth-edition .

Teaching Play Skills

Pamela Brillante

While many young children with autism spectrum disorder enjoy playing, they can have difficulty engaging in traditional play activities. They may engage in activities that do not look like ordinary play, including playing with only a few specific toys or playing in a specific, repetitive way.

Even though most children learn play skills naturally, sometimes families and teachers have to teach children how to play. Learning how to play will help develop many other skills young children need for the future, including

  • social skills:  taking turns, sharing, and working cooperatively
  • cognitive skills:  problem-solving skills, early academic skills
  • communication skills:  responding to others, asking questions
  • physical skills:  body awareness, fine and gross motor coordination

Several evidence-based therapeutic approaches to teaching young children with autism focus on teaching play skills, including

  • The Play Project:  https://playproject.org
  • The Greenspan Floortime approach: https://stanleygreenspan.com
  • Integrated Play Group (IPG) Model: www.wolfberg.com

While many children with autism have professionals and therapists working with them, teachers and families should work collaboratively and provide multiple opportunities for children to practice new skills and engage in play at their own level. For example, focus on simple activities that promote engagement between the adult and the child as well as the child and their peers without disabilities, including playing with things such as bubbles, cause-and-effect toys, and interactive books. You can also use the child’s preferred toy in the play, like having the Spider-Man figure be the one popping the bubbles.

Pamela Brillante , EdD, has spent 30 years working as a special education teacher, administrator, consultant, and professor. In addition to her full-time faculty position in the Department of Special Education, Professional Counseling and Disability Studies at William Paterson University of New Jersey, Dr. Brillante continues to consult with school districts and present to teachers and families on the topic of high-quality, inclusive early childhood practices.  

Photographs: © Getty Images Copyright © 2022 by the National Association for the Education of Young Children. See Permissions and Reprints online at  NAEYC.org/resources/permissions .

Alfieri, L., P.J. Brooks, N.J. Aldrich, & H.R. Tenenbaum. 2011. “Does Discovery-Based Instruction Enhance Learning?” Journal of Educational Psychology 103 (1): 1–18.  

Bassok, D., S. Latham, & A. Rorem. 2016. “Is Kindergarten the New First Grade?” AERA Open 2 (1): 1–31. doi.10.1177/2332858415616358. 

Bergen, D., ed. 1988. Play as a Medium for Learning and Development: A Handbook of Theory and Practice . Portsmouth, NH: Heinemann Educational Books. 

Bonawitz, E.B., P. Shafto, H. Gweon, N.D. Goodman, E.S. Spelke, & L. Schulz. 2011. “The Double-Edged Sword of Pedagogy: Instruction Limits Spontaneous Exploration and Discovery.” Cognition 120 (3): 322–30. 

Cavanaugh, D.M., K.J. Clemence, M.M. Teale, A.C. Rule, & S.E. Montgomery. 2017. “Kindergarten Scores, Storytelling, Executive Function, and Motivation Improved Through Literacy-Rich Guided Play.” Journal of Early Childhood Education 45 (6): 1–13. 

Christakis, E. 2016. The Importance of Being Little: What Preschoolers Really Need from Grownups . New York: Penguin Books.  

Duncan, G. J., A. Claessens, A.C. Huston, L.S. Pagani, M. Engel, H. Sexton, C.J. Dowsett, K. Magnuson, P. Klebanov, L. Feinstein, J. Brooks-Gunn, K. Duckworth, & C. Japel. 2007. “School Readiness and Later Achievement.” Developmental Psychology 43 (6): 1428–46. https://doi.apa.org/doi/10.1037/0012-1649.43.6.1428 .  

Fisher, K.R., K. Hirsh-Pasek, N. Newcombe, & R.M. Golinkoff. 2013. “Taking Shape: Supporting Preschoolers’ Acquisition of Geometric Knowledge Through Guided Play.” Child Development 84 (6): 1872–78. 

García-Alvarado, S., M.G. Arreguín, & J.A. Ruiz-Escalante. 2020. “Mexican-American Preschoolers as Co-Creators of Zones of Proximal Development During Retellings of Culturally Relevant Stories: A Participatory Study.” Journal of Early Childhood Literacy : 1–22. https://doi.org/10.1177%2F1468798420930339 . 

Gray, P. 2013. Free to Learn: Why Unleashing the Instinct to Play Will Make Our Children Happier, More Self-Reliant, and Better Students for Life . New York: Basic Books.  

Han, M., N. Moore, C. Vukelich, & M. Buell. 2010. “Does Play Make a Difference? How Play Intervention Affects the Vocabulary Learning of At-Risk Preschoolers.” American Journal of Play 3 (1): 82–105. 

Hannaway, J., & L. Hamilton. 2008. Accountability Policies: Implications for School and Classroom Practices . Washington, DC: Urban Institute. http://webarchive.urban.org/publications/411779.html . 

Hassinger-Das, B., K. Ridge, A. Parker, R.M. Golinkoff, K. Hirsh-Pasek, & D.K. Dickinson. 2016. “Building Vocabulary Knowledge in Preschoolers Through Shared Book Reading and Gameplay.” Mind, Brain, and Education 10 (2): 71–80. https://doi.org/10.1111/mbe.12103 . 

Hassinger-Das, B., T.S. Toub, J.M. Zosh, J. Michnick, R. Golinkoff, & K. Hirsh-Pasek. 2017. “More Than Just Fun: A Place for Games in Playful Learning.” Infancia y aprendizaje: Journal for the Study of Education and Development 40 (2): 191–218. https://doi.org/10.1080/02103702.2017.1292684 . 

Hirsh-Pasek, K., & R.M. Golinkoff. 2008. “Why Play = Learning.” In Encyclopedia on Early Childhood Development [online], eds. R.E. Tremblay, M. Boivin, & R.D. Peters, topic ed. P.K. Smith, 1–6. Centre of Excellence for Early Childhood Development and Strategic Knowledge Cluster on Early Child Development. www.child-encyclopedia.com/play/according-experts/why-play-learning . 

Hirsh-Pasek, K., H. S. Hadani, E. Blinkoff, & R. M. Golinkoff. 2020. A new path to education reform: Playful learning promotes 21st-century skills in schools and beyond . The Brookings Institution: Big Ideas Policy Report. www.brookings.edu/policy2020/bigideas/a-new-path-to-education-reform-playful-learning-promotes-21st-century-skills-in-schools-and-beyond . 

Hirsh-Pasek, K., J.M. Zosh, R.M. Golinkoff, J.H. Gray, M.B. Robb, & J. Kaufman. 2015. “Putting Education in ‘Educational’ Apps: Lessons from the Science of Learning.” Psychological Science in the Public Interest 16 (1): 3–34. 

Isen, A.M., K.A. Daubman, & G.P. Nowicki. 1987. “Positive Affect Facilitates Creative Problem Solving.” Journal of Personality and Social Psychology 52 (6): 1122–31. 

Kearney, M.S., & P.B. Levine. (2016, Spring). Income, Inequality, Social Mobility, and the Decision to Drop Out of High School . Washington, DC: Brookings. www.brookings.edu/bpea-articles/income-inequality-social-mobility-and-the-decision-to-drop-out-of-high-school . 

King, A. 1993. “From Sage on the Stage to Guide on the Side.” College Teaching 41 (1): 30–35.  

Lillard, A.S. 2013. “Playful Learning and Montessori Education.” American Journal of Play 5 (2): 157–86. 

Meyer, M., J.M. Zosh, C. McLaren, M. Robb, R.M. Golinkoff, K. Hirsh-Pasek, & J. Radesky. 2021. “How Educational Are ‘Educational’ Apps for Young Children? App Store Content Analysis Using the Four Pillars of Learning Framework.” Journal of Children and Media . Published online February 23. 

Miller, E., & J. Almon. 2009. Crisis in the Kindergarten: Why Children Need to Play in School . College Park, MD: Alliance for Childhood. https:// files.eric.ed.gov/fulltext/ED504839.pdf . 

Moedt, K., & R.M. Holmes. 2020. “The Effects of Purposeful Play After Shared Storybook Readings on Kindergarten Children’s Reading Comprehension, Creativity, and Language Skills and Abilities.” Early Child Development and Care 190 (6): 839–54. 

NAEYC. 2020. “Developmentally Appropriate Practice.” Position statement. Washington, DC: NAEYC. www.naeyc.org/resources/position-statements/dap .  

Neuman, S.B., & K. Roskos. 1992. “Literacy Objects as Cultural Tools: Effects on Children’s Literacy Behaviors in Play.” Reading Research Quarterly 27 (3): 202–25.  

Nicolopoulou, A., K.S. Cortina, H. Ilgaz, C.B. Cates, & A.B. de Sá. 2015. “Using a Narrative- and Play-Based Activity to Promote Low-Income Preschoolers’ Oral Language, Emergent Literacy, and Social Competence.” Early Childhood Research Quarterly 31 (2): 147–62. 

Pagani, L.S., C. Fitzpatrick, I. Archambault, & M. Janosz. 2010. “School Readiness and Later Achievement: A French Canadian Replication and Extension.” Developmental Psychology 46 (5): 984–94.  

Pedulla, J.J., L.M. Abrams, G.F. Madaus, M.K. Russell, M.A. Ramos, & J. Miao. 2003. “Perceived Effect of State-Mandated Testing Programs on Teaching and Learning: Findings from a National Survey of Teachers” (ED481836). ERIC. https://eric.ed.gov/?id=ED481836 . 

Ravitch, D. 2010. “Why Public Schools Need Democratic Governance.” Phi Delta Kappan 91 (6): 24–27. 

Resnick, M. 2007. “All I Really Need to Know (About Creative Thinking) I Learned (by Studying How Children Learn) in Kindergarten.” In Proceedings of the 6th ACM SIGCHI Conference on Creativity & Cognition , 1–6. New York: Association for Computing Machinery. 

Romano, E., L. Babchishin, L.S. Pagani, & D. Kohen. 2010. “School Readiness and Later Achievement: Replication and Extension Using a Nationwide Canadian Survey.” Developmental Psychology 46 (5): 995–1007.  

Salinas-González, I., M.G. Arreguín-Anderson, & I. Alanís. 2018. “Supporting Language: Culturally Rich Dramatic Play.” Teaching Young Children 11 (2): 4–6. 

Siegler, R.S., & G.B. Ramani. 2008. “Playing Linear Numerical Board Games Promotes Low-Income Children’s Numerical Development.” Developmental Science 11 (5): 655–61. 

Siegler, R.S., & G.B. Ramani. 2009. “Playing Linear Number Board Games—but Not Circular Ones—Improves Low-Income Preschoolers’ Numerical Understanding. Journal of Educational Psychology 101 (3): 545–60. 

Sim, Z., & F. Xu. 2015. “Toddlers Learn from Facilitated Play, Not Free Play.” In Proceedings of the 34th Annual Conference of the Cognitive Science Society , Berkeley, CA. https://cognitivesciencesociety.org/past-conferences . 

Singer, D.G., & J.L. Singer. 1990. The House of Make-Believe: Children’s Play and the Developing Imagination . Cambridge, MA: Harvard University Press.  

Smith, K. 1993. “Becoming the ‘Guide on the Side.’” Educational Leadership 51 (2): 35–37.  

Smith P.K., & A. Pellegrini. 2008. “Learning Through Play.” In Encyclopedia on Early Childhood Development [online], eds. R.E. Tremblay, M. Boivin, & R.D. Peters, 1–6. Centre of Excellence for Early Childhood Development and Strategic Knowledge Cluster on Early Child Development. https://www.child-encyclopedia.com/pdf/expert/play/according-experts/learning-through-play . 

Souto-Manning, M. 2017. “Is Play a Privilege or a Right? And What’s Our Responsibility? On the Role of Play for Equity in Early Childhood Education.” Foreword. Early Child Development and Care 187 (5–6): 785–87. www.tandfonline.com/doi/full/10.1080/03004430.2016.1266588 . 

Toub, T.S., B. Hassinger-Das, K.T. Nesbitt, H. Ilgaz, D.S. Weisberg, K. Hirsh-Pasek, R.M. Golinkoff, A. Nicolopoulou, & D.K. Dickinson. 2018. “The Language of Play: Developing Preschool Vocabulary Through Play Following Shared Book-Reading.” Early Childhood Research Quarterly 45 (4): 1–17.  

Weisberg, D.S., K. Hirsh-Pasek, & R.M. Golinkoff. 2013. “Guided Play: Where Curricular Goals Meet a Playful Pedagogy.” Mind, Brain, and Education 7 (2): 104–12. 

Weisberg, D.S., K. Hirsh-Pasek, R.M. Golinkoff, & B.D. McCandliss. 2014. “Mise en place: Setting the Stage for Thought and Action.” Trends in Cognitive Science 18 (6): 276–78. 

Zosh, J.M., B. Hassinger-Das, T.S. Toub, K. Hirsh-Pasek, & R. Golinkoff. 2016. “Playing with Mathematics: How Play Supports Learning and the Common Core State Standards.” Journal of Mathematics Education at Teachers College 7 (1): 45–49. https://doi.org/10.7916/jmetc.v7i1.787 . 

Zosh, J.M., K. Hirsh-Pasek, E.J. Hopkins, H. Jensen, C. Liu, D. Neale, S.L. Solis, & D. Whitebread. 2018. “Accessing the Inaccessible: Redefining Play as a Spectrum.” Frontiers in Psychology 9: 1–12. https://doi.org/10.3389/fpsyg.2018.01124 . 

Jennifer M. Zosh, PhD, is professor of human development and family studies at Penn State Brandywine. Most recently, her work has focused on technology and its impact on children as well as playful learning as a powerful pedagogy. She publishes journal articles, book chapters, blogs, and white papers and focuses on the dissemination of developmental research.

Caroline Gaudreau, PhD, is a research professional at the TMW Center for Early Learning + Public Health at the University of Chicago. She received her PhD from the University of Delaware, where she studied how children learn to ask questions and interact with screen media. She is passionate about disseminating research and interventions to families across the country.

Roberta Michnick Golinkoff, PhD, conducts research on language development, the benefits of play, spatial learning, and the effects of media on children. A member of the National Academy of Education, she is a cofounder of Playful Learning Landscapes, Learning Science Exchange, and the Ultimate Playbook for Reimagining Education. Her last book, Becoming Brilliant: What Science Tells Us About Raising Successful Children (American Psychological Association, 2016), reached the New York Times bestseller list.

Kathy Hirsh-Pasek, PhD, is the Lefkowitz Faculty Fellow in the Psychology and Neuroscience department at Temple University in Philadelphia, Pennsylvania.  She is also a senior fellow at the Brookings Institution. Her research examines the development of early language and literacy, the role of play in learning, and learning and technology. [email protected]

Vol. 77, No. 2

Print this article

Enhancing generalization in Federated Learning with heterogeneous data: : A comparative literature review

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, non-iid data and continual learning processes in federated learning: a long road ahead.

Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. This decentralized approach is prone to suffer the consequences ...

  • We review the advances in Federated Learning, particularly on heterogeneous data.

Supplement data in federated learning with a generator transparent to clients

Federated learning is a decentralized learning approach that shows promise for preserving users' privacy by avoiding local data sharing. However, the heterogeneous data in federated learning limits its applications in wider scopes. The data ...

Fair Federated Learning for Heterogeneous Data

We consider the problem of achieving fair classification in Federated Learning (FL) under data heterogeneity. Most of the approaches proposed for fair classification require diverse data that represent the different demographic groups involved. In ...

Information

Published in.

Elsevier Science Publishers B. V.

Netherlands

Publication History

Author tags.

  • Federated learning
  • Decentralized learning
  • Data heterogeneity
  • Non-IIDness
  • Concept drift
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Examining Attitudes and Constructs in AI-Based Digital Knowledge Ensemble for Teaching–Learning in the 4th Industrial Revolution

  • First Online: 23 August 2024

Cite this chapter

literature review on play based learning

  • Don Anton Robles Balida   ORCID: orcid.org/0000-0002-2169-3614 9 ,
  • Rene Ymbong Paquibut   ORCID: orcid.org/0000-0002-8202-5601 10 ,
  • Michael Canoy Legion   ORCID: orcid.org/0009-0000-4992-9548 11 ,
  • Ambrosio Arbutante Millanes Jr.   ORCID: orcid.org/0009-0004-1706-3873 10 &
  • Ma Gerlia Aujero Blanza   ORCID: orcid.org/0000-0001-8014-2620 12  

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 545))

17 Accesses

This mixed-method study examines the integration of Artificial Intelligence-based Digital Knowledge Ensemble (DKE) into education for the 4th Industrial Revolution, employing both systematic literature review and descriptive analysis. The research explores educators’ perspectives on the benefits and challenges of Artificial Intelligence in teaching and learning. While educators anticipate increased efficiency, personalized learning, and improved engagement, concerns about losing the human touch, data privacy, job displacement, and bias in assessments emerge. The results underscore the necessity for an ethical and cautious approach, emphasizing transparency, human oversight, and ongoing evaluation to ensure artificial Intelligence enhances education without compromising student wellbeing and teacher–student connections. The study recommends prioritizing adaptable tools, investing in educator training, implementing robust data security, and fostering collaboration to address ethical concerns and refine its use in education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Schwab, K.: The fourth industrial revolution. In: World Economic Forum (2016). https://www3.weforum.org/docs/WEF_Future_of_Jobs_2016.pdf

Kim, J.H., Kim, J.: Artificial intelligence in education: an overview of benefits and challenges. J. Educ. Technol. Dev. Exchange 3 (1), 1–10 (2020)

MathSciNet   Google Scholar  

Chen, Y., Lu, Y., Chen, X.: Artificial intelligence in education: a systematic review of the literature. J. Educ. Technol. Dev. Exchange 2 (1), 1–13 (2019)

Google Scholar  

Koutselini, M.P., Koutselini, E.: The impact of artificial intelligence on personalized learning: a review of the literature. J. Educ. Technol. Dev. Exchange 4 (1), 1–12 (2020)

Vygotsky, L.S.: Mind in society. In: Cole, M., John-Steiner, V., Scribner, S., Souberman, E. (eds.). Harvard University Press Cambridge, MA (1978)

Bruner, J.S.: Actual Minds, Possible Worlds. Harvard University Press, Cambridge, MA (1996)

Schwab, K.: The Fourth Industrial Revolution. Crown Business, Dunedin (2017)

Rifkin, J.: The Third Industrial Revolution: How Lateral Power is Transforming the World. Palgrave Macmillan, London (2011)

Mayer, R.E.: Multimedia Learning, 2nd edn. Cambridge University Press, Cambridge (2014)

Lave, J., Wenger, E.: Situated Learning: Legitimate Peripheral Participation. Cambridge University Press, Cambridge (1991)

Book   Google Scholar  

World Economic Forum: The future of jobs report 2016. Author. https://www.weforum.org/reports/the-future-of-jobs-2016 , Accessed 15 January 2023.

Siemens, G.: Connectivism: a learning theory for the digital age. Int. J. Learn. 10 (5), 5–15 (2004)

Bates, T.: Teaching in a Digital Age: Guidelines for Designing Teaching and Learning. BCcampus Open Education, Vancouver (2019)

Balida, D.A.R., Encarnacion, R.E.: Proceedings of the 19th European Conference on e-Learning: ECEL 2020 a Virtual Conference Supported by University of Applied Sciences HTW Berlin, 28–30 October 2020 . Academic Conferences International Ltd. (2020)

Spreitzenbarth, J.M., Bode, C., Stuckenschmidt, H.: Methods of artificial intelligence and machine learning in procurement: Review, practical application, and future research. https://www.researchgate.net/lab/Christoph-Bode-Lab (2021). Accessed 2 Feb 2023

Balida, D.A.R.: Exploring academic honesty in the digital age: a qualitative study of the eduintegrity suite through document and narrative analyses. J. Harbin Univ. Eng. 44 (12), 302–315 (2023)

Crisostomo, A.S.I., Balida, D.A.R., Gustilo, R.C.: K -means clustering of online learning profiles of higher education teachers and students amid Covid19 pandemic. In: 2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), 1–5. https://doi.org/10.1109/HNICEM51456.2020.9400036 (2020)

Vasista, T.G., Zamil, A.M.A.: Role of metaverse in the fourth industrial revolution for providing customer experiences. In: El Khoury, R., Alareeni, B. (eds.) How the Metaverse Will Reshape Business and Sustainability. Contributions to Environmental Sciences & Innovative Business Technology. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-5126-0_14

Mathew, D., Brintha, N.C., Jappes, J.T.W.: Artificial intelligence powered automation for industry 4.0. In: Nayyar, A., Naved, M., Rameshwar, R. (eds.) New horizons for industry 4.0 in modern business. Contributions to Environmental Sciences & Innovative Business Technology. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-20443-2_1

European Schoolnet: DigComp 2.1: The Digital Competence Framework for Citizens. European Commission. https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/digcomp-21-digital-competence-framework-citizens (2018). Accessed 12 Jan 2023

Kaufman, J.C., Beghetto, R.A.: In praise of convergent thinking. Creat. Res. J. 21 (1), 1–15 (2009)

Article   Google Scholar  

Salovey, P., Mayer, J.D.: Emotional intelligence. Imagin. Cogn. Pers. 9 (3), 185–211 (1990)

Collaborative for Academic, Social, and Emotional Learning: CASEL Guide: Effective Social and Emotional Learning Programs, Pre-K-12. Collaborative for Academic, Social, and Emotional Learning, Chicago, IL (2013)

Creswell, J.W.: Research Design: Qualitative, Quantitative, and Mixed Methods Approach. Sage publications, Thousand Oaks, CA (2014)

OpenAI: ChatGPT. https://openai.com/products/gpt-3/ . Accessed 1 Feb 2023

Grammarly, Inc.: Grammarly.com. https://www.grammarly.com/ . Accessed 1 Feb 2023

Quillbot: Quillbot.com. https://www.quillbot.com/ . Accessed 1 Feb 2023

WordTune. Wordtune.com. https://www.wordtune.com/ . Accessed 1 Feb 2023

Moodle [Computer software]. Moodle. [Moodle Pty Ltd]. https://moodle.org/ . Accessed 1 Feb 2023

Blackboard Inc.: Blackboard: empowering the modern-day student experience. https://www.blackboard.com . Accessed 01 Feb 2023

CiteThisforMe. https://www.citethisforme.com . Accessed 1 Feb 2023

Google. https://www.google.com . Accessed 1 Feb 2023

Download references

Author information

Authors and affiliations.

International College of Engineering and Management, Muscat, Oman

Don Anton Robles Balida

Modern College of Business and Science, Muscat, Oman

Rene Ymbong Paquibut & Ambrosio Arbutante Millanes Jr.

University of Technology and Applied Sciences, Muscat, Oman

Michael Canoy Legion

West Visayas State University, Iloilo, Philippines

Ma Gerlia Aujero Blanza

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rene Ymbong Paquibut .

Editor information

Editors and affiliations.

Northern Cyprus Campus, Middle East Technical University, Kalkanlı, Türkiye

Bahaaeddin Alareeni

computer science and engineering, Alamein International University,, New Alamein City, Egypt

Islam Elgedawy

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Balida, D.A.R., Paquibut, R.Y., Legion, M.C., Millanes, A.A., Blanza, M.G.A. (2024). Examining Attitudes and Constructs in AI-Based Digital Knowledge Ensemble for Teaching–Learning in the 4th Industrial Revolution. In: Alareeni, B., Elgedawy, I. (eds) Opportunities and Risks in AI for Business Development. Studies in Systems, Decision and Control, vol 545. Springer, Cham. https://doi.org/10.1007/978-3-031-65203-5_64

Download citation

DOI : https://doi.org/10.1007/978-3-031-65203-5_64

Published : 23 August 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-65202-8

Online ISBN : 978-3-031-65203-5

eBook Packages : Engineering Engineering (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

information-logo

Article Menu

literature review on play based learning

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Advancements in deep learning techniques for time series forecasting in maritime applications: a comprehensive review.

literature review on play based learning

1. Introduction

2. literature collection procedure.

  • Search scope: Titles, Keywords, and Abstracts
  • Keywords 1: ‘deep’ AND ‘learning’, AND
  • Keywords 2: ‘time AND series’, AND
  • Keywords 3: ‘maritime’, OR
  • Keywords 4: ‘vessel’, OR
  • Keywords 5: ‘shipping’, OR
  • Keywords 6: ‘marine’, OR
  • Keywords 7: ‘ship’, OR
  • Keywords 8: ‘port’, OR
  • Keywords 9: ‘terminal’
  • Retain only articles related to maritime operations. For example, studies on ship-surrounding weather and risk prediction based on ship data will be kept, while research solely focused on marine weather or wave prediction that is unrelated to any aspect of maritime operations will be excluded.
  • Exclude neural network studies that do not employ deep learning techniques, such as ANN or MLP with only one hidden layer.
  • The language of the publications must be English.
  • The original data used in the papers must include time series sequences.

3. Deep Learning Algorithms

3.1. artificial neural network (ann), 3.1.1. multilayer perceptron (mlp)/deep neural networks (dnn), 3.1.2. wavenet, 3.1.3. randomized neural network, 3.2. convolutional neural network (cnn), 3.3. recurrent neural network (rnn), 3.3.1. long short-term memory (lstm), 3.3.2. gated recurrent unit (gru), 3.4. attention mechanism (am)/transformer, 3.5. overview of algorithms usage, 4. time series forecasting in maritime applications, 4.1. ship operation-related applications, 4.1.1. ship trajectory prediction, 4.1.2. meteorological factor prediction, 4.1.3. ship fuel consumption prediction, 4.1.4. others, 4.2. port operation-related applications, 4.3. shipping market-related applications, 4.4. overview of time series forecasting in maritime applications, 5. overall analysis, 5.1. literature description, 5.1.1. literature distribution, 5.1.2. literature classification, 5.2. data utilized in maritime research, 5.2.1. automatic identification system data (ais data), 5.2.2. high-frequency radar data and sensor data, 5.2.3. container throughput data, 5.2.4. other datasets, 5.3. evaluation parameters, 5.4. real-world application examples, 5.5. future research directions, 5.5.1. data processing and feature extraction, 5.5.2. model optimization and application of new technologies, 5.5.3. specific application scenarios, 5.5.4. practical applications and long-term predictions, 5.5.5. environmental impact, fault prediction, and cross-domain applications, 6. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • UNCTAD. Review of Maritime Transport 2023 ; United Nations Conference on Trade and Development: Geneva, Switzerland, 2023; Available online: https://www.un-ilibrary.org/content/books/9789213584569 (accessed on 1 April 2024).
  • Liang, M.; Liu, R.W.; Zhan, Y.; Li, H.; Zhu, F.; Wang, F.Y. Fine-Grained Vessel Traffic Flow Prediction With a Spatio-Temporal Multigraph Convolutional Network. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 23694–23707. [ Google Scholar ] [ CrossRef ]
  • Liu, R.W.; Liang, M.; Nie, J.; Lim, W.Y.B.; Zhang, Y.; Guizani, M. Deep Learning-Powered Vessel Trajectory Prediction for Improving Smart Traffic Services in Maritime Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022 , 9 , 3080–3094. [ Google Scholar ] [ CrossRef ]
  • Dui, H.; Zheng, X.; Wu, S. Resilience analysis of maritime transportation systems based on importance measures. Reliab. Eng. Syst. Saf. 2021 , 209 , 107461. [ Google Scholar ] [ CrossRef ]
  • Liang, M.; Li, H.; Liu, R.W.; Lam, J.S.L.; Yang, Z. PiracyAnalyzer: Spatial temporal patterns analysis of global piracy incidents. Reliab. Eng. Syst. Saf. 2024 , 243 , 109877. [ Google Scholar ] [ CrossRef ]
  • Chen, Z.S.; Lam, J.S.L.; Xiao, Z. Prediction of harbour vessel emissions based on machine learning approach. Transp. Res. Part D Transp. Environ. 2024 , 131 , 104214. [ Google Scholar ] [ CrossRef ]
  • Chen, Z.S.; Lam, J.S.L.; Xiao, Z. Prediction of harbour vessel fuel consumption based on machine learning approach. Ocean Eng. 2023 , 278 , 114483. [ Google Scholar ] [ CrossRef ]
  • Liang, M.; Weng, L.; Gao, R.; Li, Y.; Du, L. Unsupervised maritime anomaly detection for intelligent situational awareness using AIS data. Knowl.-Based Syst. 2024 , 284 , 111313. [ Google Scholar ] [ CrossRef ]
  • Dave, V.S.; Dutta, K. Neural network based models for software effort estimation: A review. Artif. Intell. Rev. 2014 , 42 , 295–307. [ Google Scholar ] [ CrossRef ]
  • Uslu, S.; Celik, M.B. Prediction of engine emissions and performance with artificial neural networks in a single cylinder diesel engine using diethyl ether. Eng. Sci. Technol. Int. J. 2018 , 21 , 1194–1201. [ Google Scholar ] [ CrossRef ]
  • Chaudhary, L.; Sharma, S.; Sajwan, M. Systematic Literature Review of Various Neural Network Techniques for Sea Surface Temperature Prediction Using Remote Sensing Data. Arch. Comput. Methods Eng. 2023 , 30 , 5071–5103. [ Google Scholar ] [ CrossRef ]
  • Dharia, A.; Adeli, H. Neural network model for rapid forecasting of freeway link travel time. Eng. Appl. Artif. Intell. 2003 , 16 , 607–613. [ Google Scholar ] [ CrossRef ]
  • Hecht-Nielsen, R. Applications of counterpropagation networks. Neural Netw. 1988 , 1 , 131–139. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Veerappa, M.; Anneken, M.; Burkart, N. Evaluation of Interpretable Association Rule Mining Methods on Time-Series in the Maritime Domain. Springer International Publishing: Cham, Switzerland, 2021; pp. 204–218. [ Google Scholar ]
  • Frizzell, J.; Furth, M. Prediction of Vessel RAOs: Applications of Deep Learning to Assist in Design. In Proceedings of the SNAME 27th Offshore Symposium, Houston, TX, USA, 22 February 2022. [ Google Scholar ] [ CrossRef ]
  • Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016 , arXiv:1609.03499. [ Google Scholar ]
  • He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [ Google Scholar ]
  • Ning, C.X.; Xie, Y.Z.; Sun, L.J. LSTM, WaveNet, and 2D CNN for nonlinear time history prediction of seismic responses. Eng. Struct. 2023 , 286 , 116083. [ Google Scholar ] [ CrossRef ]
  • Schmidt, W.F.; Kraaijveld, M.A.; Duin, R.P. Feed forward neural networks with random weights. In International Conference on Pattern Recognition ; IEEE Computer Society Press: Washington, DC, USA, 1992; pp. 1–4. [ Google Scholar ]
  • Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006 , 70 , 489–501. [ Google Scholar ] [ CrossRef ]
  • Pao, Y.H.; Park, G.H.; Sobajic, D.J. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 1994 , 6 , 163–180. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Suganthan, P.N. A comprehensive evaluation of random vector functional link networks. Inf. Sci. 2016 , 367 , 1094–1105. [ Google Scholar ] [ CrossRef ]
  • Huang, G.; Huang, G.B.; Song, S.J.; You, K.Y. Trends in extreme learning machines: A review. Neural Netw. 2015 , 61 , 32–48. [ Google Scholar ] [ CrossRef ]
  • Shi, Q.S.; Katuwal, R.; Suganthan, P.N.; Tanveer, M. Random vector functional link neural network based ensemble deep learning. Pattern Recognit. 2021 , 117 , 107978. [ Google Scholar ] [ CrossRef ]
  • Du, L.; Gao, R.B.; Suganthan, P.N.; Wang, D.Z.W. Graph ensemble deep random vector functional link network for traffic forecasting. Appl. Soft Comput. 2022 , 131 , 109809. [ Google Scholar ] [ CrossRef ]
  • Rehman, A.; Xing, H.L.; Hussain, M.; Gulzar, N.; Khan, M.A.; Hussain, A.; Mahmood, S. HCDP-DELM: Heterogeneous chronic disease prediction with temporal perspective enabled deep extreme learning machine. Knowl.-Based Syst. 2024 , 284 , 111316. [ Google Scholar ] [ CrossRef ]
  • Gao, R.B.; Li, R.L.; Hu, M.H.; Suganthan, P.N.; Yuen, K.F. Online dynamic ensemble deep random vector functional link neural network for forecasting. Neural Netw. 2023 , 166 , 51–69. [ Google Scholar ] [ CrossRef ]
  • Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998 , 86 , 2278–2324. [ Google Scholar ] [ CrossRef ]
  • Palaz, D.; Magimai-Doss, M.; Collobert, R. End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Commun. 2019 , 108 , 15–32. [ Google Scholar ] [ CrossRef ]
  • Fang, W.; Love, P.E.D.; Luo, H.; Ding, L. Computer vision for behaviour-based safety in construction: A review and future directions. Adv. Eng. Inform. 2020 , 43 , 100980. [ Google Scholar ] [ CrossRef ]
  • Qin, L.; Yu, N.; Zhao, D. Applying the convolutional neural network deep learning technology to behavioural recognition in intelligent video. Teh. Vjesn. 2018 , 25 , 528–535. [ Google Scholar ]
  • Hoseinzade, E.; Haratizadeh, S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 2019 , 129 , 273–285. [ Google Scholar ] [ CrossRef ]
  • Rasp, S.; Dueben, P.D.; Scher, S.; Weyn, J.A.; Mouatadid, S.; Thuerey, N. WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting. J. Adv. Model. Earth Syst. 2020 , 12 , e2020MS002203. [ Google Scholar ] [ CrossRef ]
  • Crivellari, A.; Beinat, E.; Caetano, S.; Seydoux, A.; Cardoso, T. Multi-target CNN-LSTM regressor for predicting urban distribution of short-term food delivery demand. J. Bus. Res. 2022 , 144 , 844–853. [ Google Scholar ] [ CrossRef ]
  • Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018 , arXiv:1803.01271. [ Google Scholar ]
  • Lin, Z.; Yue, W.; Huang, J.; Wan, J. Ship Trajectory Prediction Based on the TTCN-Attention-GRU Model. Electronics 2023 , 12 , 2556. [ Google Scholar ] [ CrossRef ]
  • He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Computer Vision–ECCV 2016 ; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [ Google Scholar ]
  • Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014 , 15 , 1929–1958. [ Google Scholar ]
  • Bin Syed, M.A.; Ahmed, I. A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data. Sensors 2023 , 23 , 6400. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, M.-W.; Xu, D.-Y.; Geng, J.; Hong, W.-C. A hybrid approach for forecasting ship motion using CNN–GRU–AM and GCWOA. Appl. Soft Comput. 2022 , 114 , 108084. [ Google Scholar ] [ CrossRef ]
  • Zhang, B.; Wang, S.; Deng, L.; Jia, M.; Xu, J. Ship motion attitude prediction model based on IWOA-TCN-Attention. Ocean Eng. 2023 , 272 , 113911. [ Google Scholar ] [ CrossRef ]
  • Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990 , 14 , 179–211. [ Google Scholar ] [ CrossRef ]
  • Shan, F.; He, X.; Armaghani, D.J.; Sheng, D. Effects of data smoothing and recurrent neural network (RNN) algorithms for real-time forecasting of tunnel boring machine (TBM) performance. J. Rock Mech. Geotech. Eng. 2024 , 16 , 1538–1551. [ Google Scholar ] [ CrossRef ]
  • Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.-W. Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water 2020 , 12 , 1500. [ Google Scholar ] [ CrossRef ]
  • Ma, Z.; Zhang, H.; Liu, J. MM-RNN: A Multimodal RNN for Precipitation Nowcasting. IEEE Trans. Geosci. Remote Sens. 2023 , 61 , 1–14. [ Google Scholar ] [ CrossRef ]
  • Lu, M.; Xu, X. TRNN: An efficient time-series recurrent neural network for stock price prediction. Inf. Sci. 2024 , 657 , 119951. [ Google Scholar ] [ CrossRef ]
  • Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994 , 5 , 157–166. [ Google Scholar ] [ CrossRef ]
  • Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018 , 22 , 6005–6022. [ Google Scholar ] [ CrossRef ]
  • Hochreiter, S. Untersuchungen zu dynamischen neuronalen Netzen. Diploma Tech. Univ. München 1991 , 91 , 31. [ Google Scholar ]
  • Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997 , 9 , 1735–1780. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000 , 12 , 2451–2471. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Witten, I.H.; Frank, E. Data mining: Practical machine learning tools and techniques with Java implementations. SIGMOD Rec. 2002 , 31 , 76–77. [ Google Scholar ] [ CrossRef ]
  • Mo, J.X.; Gao, R.B.; Liu, J.H.; Du, L.; Yuen, K.F. Annual dilated convolutional LSTM network for time charter rate forecasting. Appl. Soft Comput. 2022 , 126 , 109259. [ Google Scholar ] [ CrossRef ]
  • Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014 , arXiv:1409.1259. [ Google Scholar ]
  • Yang, S.; Yu, X.; Zhou, Y. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 12–14 June 2020; pp. 98–101. [ Google Scholar ] [ CrossRef ]
  • Zhao, Z.N.; Yun, S.N.; Jia, L.Y.; Guo, J.X.; Meng, Y.; He, N.; Li, X.J.; Shi, J.R.; Yang, L. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 2023 , 121 , 105982. [ Google Scholar ] [ CrossRef ]
  • Pan, N.; Ding, Y.; Fu, J.; Wang, J.; Zheng, H. Research on Ship Arrival Law Based on Route Matching and Deep Learning. J. Phys. Conf. Ser. 2021 , 1952 , 022023. [ Google Scholar ] [ CrossRef ]
  • Ma, J.; Li, W.K.; Jia, C.F.; Zhang, C.W.; Zhang, Y. Risk Prediction for Ship Encounter Situation Awareness Using Long Short-Term Memory Based Deep Learning on Intership Behaviors. J. Adv. Transp. 2020 , 2020 , 8897700. [ Google Scholar ] [ CrossRef ]
  • Suo, Y.F.; Chen, W.K.; Claramunt, C.; Yang, S.H. A Ship Trajectory Prediction Framework Based on a Recurrent Neural Network. Sensors 2020 , 20 , 5133. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014 , arXiv:1409.0473. [ Google Scholar ] [ CrossRef ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017 , 30 , 03762. [ Google Scholar ]
  • Nascimento, E.G.S.; de Melo, T.A.C.; Moreira, D.M. A transformer-based deep neural network with wavelet transform for forecasting wind speed and wind energy. Energy 2023 , 278 , 127678. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Zhang, J.; Niu, J.; Wu, Q.M.J.; Li, G. Track Prediction for HF Radar Vessels Submerged in Strong Clutter Based on MSCNN Fusion with GRU-AM and AR Model. Remote Sens. 2021 , 13 , 2164. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Fu, X.; Xiao, Z.; Xu, H.; Zhang, W.; Koh, J.; Qin, Z. A Dynamic Context-Aware Approach for Vessel Trajectory Prediction Based on Multi-Stage Deep Learning. IEEE Trans. Intell. Veh. 2024 , 1–16. [ Google Scholar ] [ CrossRef ]
  • Jiang, D.; Shi, G.; Li, N.; Ma, L.; Li, W.; Shi, J. TRFM-LS: Transformer-Based Deep Learning Method for Vessel Trajectory Prediction. J. Mar. Sci. Eng. 2023 , 11 , 880. [ Google Scholar ] [ CrossRef ]
  • Violos, J.; Tsanakas, S.; Androutsopoulou, M.; Palaiokrassas, G.; Varvarigou, T. Next Position Prediction Using LSTM Neural Networks. In Proceedings of the 11th Hellenic Conference on Artificial Intelligence, Athens, Greece, 2–4 September 2020; pp. 232–240. [ Google Scholar ] [ CrossRef ]
  • Hoque, X.; Sharma, S.K. Ensembled Deep Learning Approach for Maritime Anomaly Detection System. In Proceedings of the 1st International Conference on Emerging Trends in Information Technology (ICETIT), Inst Informat Technol & Management, New Delhi, India, 21–22 June 2020; In Lecture Notes in Electrical Engineering. Volume 605, pp. 862–869. [ Google Scholar ]
  • Wang, Y.; Zhang, M.; Fu, H.; Wang, Q. Research on Prediction Method of Ship Rolling Motion Based on Deep Learning. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7182–7187. [ Google Scholar ] [ CrossRef ]
  • Choi, J. Predicting the Frequency of Marine Accidents by Navigators’ Watch Duty Time in South Korea Using LSTM. Appl. Sci. 2022 , 12 , 11724. [ Google Scholar ] [ CrossRef ]
  • Li, T.; Li, Y.B. Prediction of ship trajectory based on deep learning. J. Phys. Conf. Ser. 2023 , 2613 , 012023. [ Google Scholar ] [ CrossRef ]
  • Chondrodima, E.; Pelekis, N.; Pikrakis, A.; Theodoridis, Y. An Efficient LSTM Neural Network-Based Framework for Vessel Location Forecasting. IEEE Trans. Intell. Transp. Syst. 2023 , 24 , 4872–4888. [ Google Scholar ] [ CrossRef ]
  • Long, Z.; Suyuan, W.; Zhongma, C.; Jiaqi, F.; Xiaoting, Y.; Wei, D. Lira-YOLO: A lightweight model for ship detection in radar images. J. Syst. Eng. Electron. 2020 , 31 , 950–956. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Li, G.; Skulstad, R.; Zhang, H.; Chen, S. SpectralSeaNet: Spectrogram and Convolutional Network-based Sea State Estimation. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5069–5074. [ Google Scholar ] [ CrossRef ]
  • Wang, K.; Cheng, X.; Shi, F. Learning Dynamic Graph Structures for Sea State Estimation with Deep Neural Networks. In Proceedings of the 2023 6th International Conference on Intelligent Autonomous Systems (ICoIAS), Qinhuangdao, China, 22–24 September 2023; pp. 161–166. [ Google Scholar ]
  • Yu, J.; Huang, D.; Shi, X.; Li, W.; Wang, X. Real-Time Moving Ship Detection from Low-Resolution Large-Scale Remote Sensing Image Sequence. Appl. Sci. 2023 , 13 , 2584. [ Google Scholar ] [ CrossRef ]
  • Ilias, L.; Kapsalis, P.; Mouzakitis, S.; Askounis, D. A Multitask Learning Framework for Predicting Ship Fuel Oil Consumption. IEEE Access 2023 , 11 , 132576–132589. [ Google Scholar ] [ CrossRef ]
  • Selimovic, D.; Hrzic, F.; Prpic-Orsic, J.; Lerga, J. Estimation of sea state parameters from ship motion responses using attention-based neural networks. Ocean Eng. 2023 , 281 , 114915. [ Google Scholar ] [ CrossRef ]
  • Ma, J.; Jia, C.; Yang, X.; Cheng, X.; Li, W.; Zhang, C. A Data-Driven Approach for Collision Risk Early Warning in Vessel Encounter Situations Using Attention-BiLSTM. IEEE Access 2020 , 8 , 188771–188783. [ Google Scholar ] [ CrossRef ]
  • Ji, Z.; Gan, H.; Liu, B. A Deep Learning-Based Fault Warning Model for Exhaust Temperature Prediction and Fault Warning of Marine Diesel Engine. J. Mar. Sci. Eng. 2023 , 11 , 1509. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Gan, H.; Cong, Y.; Hu, G. Research on fault prediction of marine diesel engine based on attention-LSTM. Proc. Inst. Mech. Eng. Part M J. Eng. Marit. Environ. 2023 , 237 , 508–519. [ Google Scholar ] [ CrossRef ]
  • Li, M.W.; Xu, D.Y.; Geng, J.; Hong, W.C. A ship motion forecasting approach based on empirical mode decomposition method hybrid deep learning network and quantum butterfly optimization algorithm. Nonlinear Dyn. 2022 , 107 , 2447–2467. [ Google Scholar ] [ CrossRef ]
  • Yang, C.H.; Chang, P.Y. Forecasting the Demand for Container Throughput Using a Mixed-Precision Neural Architecture Based on CNN–LSTM. Mathematics 2020 , 8 , 1784. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Wu, P.; Peng, Y.; Liu, D. Roll Motion Prediction of Unmanned Surface Vehicle Based on Coupled CNN and LSTM. Future Internet 2019 , 11 , 243. [ Google Scholar ] [ CrossRef ]
  • Kamal, I.M.; Bae, H.; Sunghyun, S.; Yun, H. DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index. Appl. Sci. 2020 , 10 , 1504. [ Google Scholar ] [ CrossRef ]
  • Li, M.Z.; Li, B.; Qi, Z.G.; Li, J.S.; Wu, J.W. Enhancing Maritime Navigational Safety: Ship Trajectory Prediction Using ACoAtt–LSTM and AIS Data. ISPRS Int. J. Geo-Inform. 2024 , 13 , 85. [ Google Scholar ] [ CrossRef ]
  • Yu, T.; Zhang, Y.; Zhao, S.; Yang, J.; Li, W.; Guo, W. Vessel trajectory prediction based on modified LSTM with attention mechanism. In Proceedings of the 2024 4th International Conference on Neural Networks, Information and Communication Engineering, NNICE, Guangzhou, China, 19–21 January 2024; pp. 912–918. [ Google Scholar ] [ CrossRef ]
  • Xia, C.; Peng, Y.; Qu, D. A pre-trained model specialized for ship trajectory prediction. In Proceedings of the IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 15–17 March 2024; pp. 1857–1860. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Li, G.; Skulstad, R.; Chen, S.; Hildre, H.P.; Zhang, H. Modeling and Analysis of Motion Data from Dynamically Positioned Vessels for Sea State Estimation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6644–6650. [ Google Scholar ] [ CrossRef ]
  • Xia, C.; Qu, D.; Zheng, Y. TATBformer: A Divide-and-Conquer Approach to Ship Trajectory Prediction Modeling. In Proceedings of the 2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 8–10 December 2023; pp. 335–339. [ Google Scholar ] [ CrossRef ]
  • Ran, Y.; Shi, G.; Li, W. Ship Track Prediction Model based on Automatic Identification System Data and Bidirectional Cyclic Neural Network. In Proceedings of the 2021 4th International Symposium on Traffic Transportation and Civil Architecture, ISTTCA, Suzhou, China, 12–14 November 2021; pp. 297–301. [ Google Scholar ] [ CrossRef ]
  • Yang, C.H.; Wu, C.H.; Shao, J.C.; Wang, Y.C.; Hsieh, C.M. AIS-Based Intelligent Vessel Trajectory Prediction Using Bi-LSTM. IEEE Access 2022 , 10 , 24302–24315. [ Google Scholar ] [ CrossRef ]
  • Sadeghi, Z.; Matwin, S. Anomaly detection for maritime navigation based on probability density function of error of reconstruction. J. Intell. Syst. 2023 , 32 , 20220270. [ Google Scholar ] [ CrossRef ]
  • Perumal, V.; Murugaiyan, S.; Ravichandran, P.; Venkatesan, R.; Sundar, R. Real time identification of anomalous events in coastal regions using deep learning techniques. Concurr. Comput. Pract. Exp. 2021 , 33 , e6421. [ Google Scholar ] [ CrossRef ]
  • Xie, J.L.; Shi, W.F.; Shi, Y.Q. Research on Fault Diagnosis of Six-Phase Propulsion Motor Drive Inverter for Marine Electric Propulsion System Based on Res-BiLSTM. Machines 2022 , 10 , 736. [ Google Scholar ] [ CrossRef ]
  • Han, P.; Li, G.; Skulstad, R.; Skjong, S.; Zhang, H. A Deep Learning Approach to Detect and Isolate Thruster Failures for Dynamically Positioned Vessels Using Motion Data. IEEE Trans. Instrum. Meas. 2021 , 70 , 1–11. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Wang, K.; Liu, X.; Yu, Q.; Shi, F.; Ren, Z.; Chen, S. A Novel Class-Imbalanced Ship Motion Data-Based Cross-Scale Model for Sea State Estimation. IEEE Trans. Intell. Transp. Syst. 2023 , 24 , 15907–15919. [ Google Scholar ] [ CrossRef ]
  • Lei, L.; Wen, Z.; Peng, Z. Prediction of Main Engine Speed and Fuel Consumption of Inland Ships Based on Deep Learning. J. Phys. Conf. Ser. 2021 , 2025 , 012012. [ Google Scholar ]
  • Ljunggren, H. Using Deep Learning for Classifying Ship Trajectories. In Proceedings of the 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2158–2164. [ Google Scholar ]
  • Kulshrestha, A.; Yadav, A.; Sharma, H.; Suman, S. A deep learning-based multivariate decomposition and ensemble framework for container throughput forecasting. J. Forecast. 2024 , in press . [ Google Scholar ] [ CrossRef ]
  • Shankar, S.; Ilavarasan, P.V.; Punia, S.; Singh, S.P. Forecasting container throughput with long short-term memory networks. Ind. Manag. Data Syst. 2020 , 120 , 425–441. [ Google Scholar ] [ CrossRef ]
  • Lee, E.; Kim, D.; Bae, H. Container Volume Prediction Using Time-Series Decomposition with a Long Short-Term Memory Models. Appl. Sci. 2021 , 11 , 8995. [ Google Scholar ] [ CrossRef ]
  • Cuong, T.N.; You, S.-S.; Long, L.N.B.; Kim, H.-S. Seaport Resilience Analysis and Throughput Forecast Using a Deep Learning Approach: A Case Study of Busan Port. Sustainability 2022 , 14 , 13985. [ Google Scholar ] [ CrossRef ]
  • Song, X.; Chen, Z.S. Shipping market time series forecasting via an Ensemble Deep Dual-Projection Echo State Network. Comput. Electr. Eng. 2024 , 117 , 109218. [ Google Scholar ] [ CrossRef ]
  • Li, X.; Hu, Y.; Bai, Y.; Gao, X.; Chen, G. DeepDLP: Deep Reinforcement Learning based Framework for Dynamic Liner Trade Pricing. In Proceedings of the Proceedings of the 2023 17th International Conference on Ubiquitous Information Management and Communication, IMCOM, Seoul, Republic of Korea, 3–5 January 2023; pp. 1–8. [ Google Scholar ] [ CrossRef ]
  • Alqatawna, A.; Abu-Salih, B.; Obeid, N.; Almiani, M. Incorporating Time-Series Forecasting Techniques to Predict Logistics Companies’ Staffing Needs and Order Volume. Computation 2023 , 11 , 141. [ Google Scholar ] [ CrossRef ]
  • Lim, S.; Kim, S.J.; Park, Y.; Kwon, N. A deep learning-based time series model with missing value handling techniques to predict various types of liquid cargo traffic. Expert Syst. Appl. 2021 , 184 , 115532. [ Google Scholar ] [ CrossRef ]
  • Cheng, R.; Gao, R.; Yuen, K.F. Ship order book forecasting by an ensemble deep parsimonious random vector functional link network. Eng. Appl. Artif. Intell. 2024 , 133 , 108139. [ Google Scholar ] [ CrossRef ]
  • Xiao, Z.; Fu, X.J.; Zhang, L.Y.; Goh, R.S.M. Traffic Pattern Mining and Forecasting Technologies in Maritime Traffic Service Networks: A Comprehensive Survey. IEEE Trans. Intell. Transp. Syst. 2020 , 21 , 1796–1825. [ Google Scholar ] [ CrossRef ]
  • Yan, R.; Wang, S.A.; Psaraftis, H.N. Data analytics for fuel consumption management in maritime transportation: Status and perspectives. Transp. Res. Part E Logist. Transp. Rev. 2021 , 155 , 102489. [ Google Scholar ] [ CrossRef ]
  • Filom, S.; Amiri, A.M.; Razavi, S. Applications of machine learning methods in port operations—A systematic literature review. Transp. Res. Part E-Logist. Transp. Rev. 2022 , 161 , 102722. [ Google Scholar ] [ CrossRef ]
  • Ksciuk, J.; Kuhlemann, S.; Tierney, K.; Koberstein, A. Uncertainty in maritime ship routing and scheduling: A Literature review. Eur. J. Oper. Res. 2023 , 308 , 499–524. [ Google Scholar ] [ CrossRef ]
  • Jia, H.; Prakash, V.; Smith, T. Estimating vessel payloads in bulk shipping using AIS data. Int. J. Shipp. Transp. Logist. 2019 , 11 , 25–40. [ Google Scholar ] [ CrossRef ]
  • Yang, D.; Wu, L.X.; Wang, S.A.; Jia, H.Y.; Li, K.X. How big data enriches maritime research—A critical review of Automatic Identification System (AIS) data applications. Transp. Rev. 2019 , 39 , 755–773. [ Google Scholar ] [ CrossRef ]
  • Liu, M.; Zhao, Y.; Wang, J.; Liu, C.; Li, G. A Deep Learning Framework for Baltic Dry Index Forecasting. Procedia Comput. Sci. 2022 , 199 , 821–828. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.C.; Wang, H.; Zou, D.X.; Fu, H.X. Ship roll prediction algorithm based on Bi-LSTM-TPA combined model. J. Mar. Sci. Eng. 2021 , 9 , 387. [ Google Scholar ] [ CrossRef ]
  • Xie, H.T.; Jiang, X.Q.; Hu, X.; Wu, Z.T.; Wang, G.Q.; Xie, K. High-efficiency and low-energy ship recognition strategy based on spiking neural network in SAR images. Front. Neurorobotics 2022 , 16 , 970832. [ Google Scholar ] [ CrossRef ]
  • Muñoz, D.U.; Ruiz-Aguilar, J.J.; González-Enrique, J.; Domínguez, I.J.T. A Deep Ensemble Neural Network Approach to Improve Predictions of Container Inspection Volume. In Proceedings of the 15th International Work-Conference on Artificial Neural Networks (IWANN), Gran Canaria, Spain, 12–14 June 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11506, pp. 806–817. [ Google Scholar ] [ CrossRef ]
  • Velasco-Gallego, C.; Lazakis, I. Mar-RUL: A remaining useful life prediction approach for fault prognostics of marine machinery. Appl. Ocean Res. 2023 , 140 , 103735. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Zheng, K.; Wang, C.; Chen, J.; Qi, H. A novel deep reinforcement learning for POMDP-based autonomous ship collision decision-making. Neural Comput. Appl. 2023 , 1–15. [ Google Scholar ] [ CrossRef ]
  • Guo, X.X.; Zhang, X.T.; Lu, W.Y.; Tian, X.L.; Li, X. Real-time prediction of 6-DOF motions of a turret-moored FPSO in harsh sea state. Ocean Eng. 2022 , 265 , 112500. [ Google Scholar ] [ CrossRef ]
  • Kim, D.; Kim, T.; An, M.; Cho, Y.; Baek, Y.; IEEE. Edge AI-based early anomaly detection of LNG Carrier Main Engine systems. In Proceedings of the OCEANS Conference, Limerick, Ireland, 5–8 June 2023. [ Google Scholar ] [ CrossRef ]
  • Theodoropoulos, P.; Spandonidis, C.C.; Giannopoulos, F.; Fassois, S. A Deep Learning-Based Fault Detection Model for Optimization of Shipping Operations and Enhancement of Maritime Safety. Sensors 2021 , 21 , 5658. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr. Inf. Sci. 2010 , 24 , 383–401. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Xu, Y.; Streets, D.G.; Wang, C. How does decarbonization of the central heating industry affect employment? A spatiotemporal analysis from the perspective of urbanization. Energy Build. 2024 , 306 , 113912. [ Google Scholar ] [ CrossRef ]
  • Zhang, D.; Li, X.; Wan, C.; Man, J. A novel hybrid deep-learning framework for medium-term container throughput forecasting: An application to China’s Guangzhou, Qingdao and Shanghai hub ports. Marit. Econ. Logist. 2024 , 26 , 44–73. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Wang, H.; Zhou, B.; Fu, H. Multi-dimensional prediction method based on Bi-LSTMC for ship roll. Ocean Eng. 2021 , 242 , 110106. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Ref.ArchitectureDatasetAdvantage
[ ]MSCNN-GRU-AMHF radarIt is applicable for high-frequency radar ship track prediction in environments with significant clutter and interference
[ ]CNN-BiLSTM-Attention6L34DF dual fuel diesel engineThe high prediction accuracy and early warning timeliness can provide interpretable fault prediction results
[ ]LSTMTwo LNG carriersEnables early anomaly detection in new ships and new equipment
[ ]LSTMsensorsbetter and high-precision effects
[ ]Self-Attention-BiLSTMA real military shipNot only can it better capture complex ship attitude changes, but it also shows greater accuracy and stability in long-term forecasting tasks
[ ]CNN–GRU–AMA C11 containershipbetter accuracy of forecasting
[ ]GRUA scaled model testgood prediction accuracy
[ ]CNNA bulk carriergood prediction accuracy
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Wang, M.; Guo, X.; She, Y.; Zhou, Y.; Liang, M.; Chen, Z.S. Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review. Information 2024 , 15 , 507. https://doi.org/10.3390/info15080507

Wang M, Guo X, She Y, Zhou Y, Liang M, Chen ZS. Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review. Information . 2024; 15(8):507. https://doi.org/10.3390/info15080507

Wang, Meng, Xinyan Guo, Yanling She, Yang Zhou, Maohan Liang, and Zhong Shuo Chen. 2024. "Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review" Information 15, no. 8: 507. https://doi.org/10.3390/info15080507

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • < Previous

Home > Departments > EDUCATION > EDUCATION_MASTERS > 303

Master's Theses & Capstone Projects

The value of play-based learning in early childhood classrooms.

Brandice TeGrootenhuis , Northwestern College - Orange City

Document Type

Publication date.

Spring 2021

The purpose of this literature review is to demonstrate the value of play-based learning in early childhood classrooms and encourage educators and policy-makers to focus more on this developmentally appropriate practice of learning through play in early childhood classrooms. The findings of this literature review are that play is an essential component in all areas of child development. The reviewed research suggests play and academic and social development are closely linked and indicate success in the areas of 21st century skills, literacy and language, and STEM (science, technology, engineering, and mathematics) education. The conclusion of this review includes recommendations for implementation of play-based learning and how it can work alongside intentional teaching practices to provide a well-rounded education to students in early childhood classrooms.

Since May 11, 2021

Included in

Early Childhood Education Commons , Educational Methods Commons

Search the Site:

Advanced Search

  • Notify me via email or RSS
  • DeWitt Library
  • Northwestern Review

Browse the Site

  • Collections
  • Disciplines

Meet the Authors

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

  • Open access
  • Published: 21 August 2024

Evaluating the effectiveness of large language models in abstract screening: a comparative analysis

  • Michael Li 1 ,
  • Jianping Sun 2 &
  • Xianming Tan   ORCID: orcid.org/0000-0002-5478-2269 1 , 3  

Systematic Reviews volume  13 , Article number:  219 ( 2024 ) Cite this article

139 Accesses

Metrics details

This study aimed to evaluate the performance of large language models (LLMs) in the task of abstract screening in systematic review and meta-analysis studies, exploring their effectiveness, efficiency, and potential integration into existing human expert-based workflows.

We developed automation scripts in Python to interact with the APIs of several LLM tools, including ChatGPT v4.0, ChatGPT v3.5, Google PaLM 2, and Meta Llama 2, and latest tools including ChatGPT v4.0 turbo, ChatGPT v3.5 turbo, Google Gemini 1.0 pro, Meta Llama 3, and Claude 3. This study focused on three databases of abstracts and used them as benchmarks to evaluate the performance of these LLM tools in terms of sensitivity, specificity, and overall accuracy. The results of the LLM tools were compared to human-curated inclusion decisions, gold standard for systematic review and meta-analysis studies.

Different LLM tools had varying abilities in abstract screening. Chat GPT v4.0 demonstrated remarkable performance, with balanced sensitivity and specificity, and overall accuracy consistently reaching or exceeding 90%, indicating a high potential for LLMs in abstract screening tasks. The study found that LLMs could provide reliable results with minimal human effort and thus serve as a cost-effective and efficient alternative to traditional abstract screening methods.

While LLM tools are not yet ready to completely replace human experts in abstract screening, they show great promise in revolutionizing the process. They can serve as autonomous AI reviewers, contribute to collaborative workflows with human experts, and integrate with hybrid approaches to develop custom tools for increased efficiency. As technology continues to advance, LLMs are poised to play an increasingly important role in abstract screening, reshaping the workflow of systematic review and meta-analysis studies.

Peer Review reports

Introduction

A systematic review is a rigorous form of research that collates and synthesizes all existing evidence on a specific research question [ 1 ]. It stands as a cornerstone not just in medical research but across diverse academic disciplines. Unlike traditional literature reviews, systematic reviews follow a comprehensive and standardized process, such as the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guideline [ 2 ], designed to minimize bias and ensure reproducibility. Therefore, these reviews are recognized as one of the zenith levels of evidence in evidence-based research [ 3 ] and play a pivotal role in shaping clinical guidelines, healthcare policies, and informing medical decisions [ 4 ].

Commencing with a well-articulated research question, a typical systematic review launches an exhaustive search strategy that subsequently sweeps through databases, such as PubMed and Embase, supplemented by additional sources such as clinical trial registries and pertinent article reference lists, with aims to capture all relevant studies and mitigate bias. Predetermined inclusion and exclusion criteria guide the subsequent screening and selection of studies, encompassing facts like study design, patient demographics, and intervention types. Reviewers, working independently, appraise each study’s eligibility, reconciling disagreements through discussions or third-party reviews. Data extraction and synthesis follow, either through meta-analysis or narrative synthesis, depending on the heterogeneity of the selected studies.

The practice of conducting systematic reviews has gained substantial popularity with considerable demand within the academic community. A notable reference [ 5 ] to this trend found that approximately 75 trials and potentially 11 systematic reviews are disseminated daily, based on data from around 2010. Moreover, a query for “systematic review” in the Google Scholar database yields approximately 17,000 entries that have been published within the year 2023 alone, which translates into about 51 systematic reviews per day, as observed on November 26, 2023. This expanding volume of literature underscores the critical role that systematic reviews play in the consolidation of research findings across various fields of study.

Despite their pivotal role, executing systematic reviews remains a formidable task due to the abstract screening, a key phase that can be overwhelmingly time-consuming due to its volume. For example, in Polanin et al. [ 6 ], the authors reported that the research staff screened 29,846 abstracts independently (14,923 unique citations were double screened) over the course of 189 days. In addition, the variability in reporting standards, use of jargon, and varied study designs can further complicate the abstract screening process [ 7 ]. Nevertheless, the repetitive nature of the task, combined with cognitive fatigue, can lead to human errors [ 8 , 9 ]. Recent advancements in machine learning (ML) and deep learning propose possible solutions to these challenges. However, traditional ML models, while promising, require domain-specific training, a time-consuming process that often demands manual labeling of datasets [ 10 ].

Advancements in natural language processing (NLP) and artificial intelligence (AI) are opening doors to address challenges in systematic reviews. Large language models (LLMs) like ChatGPT [ 11 ], PaLM [ 12 ], Llama [ 13 ], and Claude [ 14 ] are capturing the research community’s attention. Their collective potential, especially their capability to operate without exhaustive domain-specific training, makes them prime candidates for revolutionizing the systematic review process.

While each of the aforementioned AI tools brings its unique capabilities to the table, the fundamental question remains: How do they stack up, both individually and collectively, in the realm of abstract screening, against the human expert-based process? In pursuit of answers, this research seeks to thoroughly investigate the potential of ChatGPT, Google PaLM, Llama, and Claude in automating the crucial abstract screening phase integral to systematic reviews. Our goal is to rigorously compare the performance of these advanced AI-driven methods with existing machine learning (ML)-based approaches. In doing so, we aim to develop AI strategies that masterfully balance efficiency and accuracy with minimal human intervention, ultimately transforming systematic review practice across disciplines.

The use of NLP for abstract screening is established [ 10 , 15 ]. However, the application of LLMs specifically for this task is a nascent field [ 16 , 17 ]. This emerging area offers significant potential to improve efficiency and accuracy. Our study aims to fill this gap by providing a comprehensive analysis of LLM capabilities in abstract screening, laying the groundwork for future research and application. This is particularly relevant considering the rapid evolution of this technology, highlighting its potential to streamline systematic reviews now and in the future.

The remainder of this paper is structured to provide a comprehensive exploration of our topic. We begin with an in-depth examination of existing methods for abstract screening, including both manual and NLP-based approaches, laying the groundwork for understanding the current state of the field. We then introduce the use of large language model (LLM) tools for abstract screening, detailing our experimental design to meticulously evaluate their performance in this context. Subsequent sections present our empirical findings and results, shedding light on the capabilities and limitations of the AI tools in question. Finally, we engage in a thoughtful discussion, reflecting on the implications of our findings and considering the future trajectory of abstract screening in systematic reviews.

Existing approaches to abstract screening in systematic reviews

In the vast realm of systematic reviews, the critical task of abstract screening serves as a foundational step in curating the highest quality of evidence [ 2 ]. However, this process often presents significant challenges due to the involvement of sifting through large volumes of literature to identify those that align with predefined criteria. Over time, various methodologies, ranging from manual evaluations to sophisticated AI-driven techniques, have been proposed to address the complexities of this task. In this section, we first describe the existing approaches on their operational mechanisms and associated advantages and disadvantages.

Manual approach

Historically, the process of abstract screening was firmly rooted in manual evaluations. In this conventional approach, individual reviewers would scrutinize each abstract against predefined criteria [ 1 ]. The meticulous nature of this method required that multiple experts independently evaluate the literature to ensure both reliability and reduced biases [ 8 ]. While the depth of human expertise brought about nuanced understanding, the manual nature of this method made it both time-consuming and, at times, prone to human error [ 6 , 9 ].

NLP-based approach

As technology evolved, the field witnessed the incorporation of natural language processing (NLP) to automate abstract screening [ 10 ]. In this framework, abstract text undergoes preprocessing and vectorization. Supervised machine learning models, notably the support vector machine (SVM) and the random forest (RF), are then trained on this vectorized data to classify literature based on specific criteria [ 15 ]. The strength of this approach lies in its potential for efficiency. However, its efficacy and accuracy hinges heavily on the availability of a well-curated, labeled training set.

Zero-shot classification

A more recent and innovative approach is zero-shot classification, which was notably highlighted by Xian et al. [ 18 ]. Eschewing the need for an extensive labeled training dataset, zero-shot classification offers the allure of categorizing abstracts without prior domain-specific training. By calculating a probability score for each abstract, researchers obtain a dynamic measure of its alignment with predefined categories. While it does not require training for the machine learning model, the tradeoff is a loss of sensitivity and potential omissions of pertinent studies [ 19 ]. In this study, for a given set of abstracts, we first obtain embeddings (using OpenAI’s text embedding API, the babbage model) for abstracts and for a pre-specified description of an ideal study to include. We indeed use the inclusion/exclusion criteria (see “Examples of abstract screening by using LLMs,” “Automated workflow for streamlining abstract screening via ChatGPT and other tools,” and “ Study design ” sections) as the description of an ideal study to include. We then compute the cosine similarity score between each abstract (its embedding) and the pre-specified description (its embedding). We claim the top 10% (in terms of similarity) of abstracts as positive (i.e., a study should be included).

Hybrid approach

To reconcile the strengths and limitations of these models, a hybrid methodology has emerged [ 18 ]. It cleverly marries the rapid categorization capabilities of zero-shot classification with the precision of traditional machine learning models such as SVM and RF. Here, zero-shot classification provides an initial curation of all abstracts in the training set. Then, based on zero-shot classification’s curation, the abstracts with classification score higher than a pre-specified threshold are selected to undergo manual review to rectify zero-shot classification. The rectified classification then serves as a foundation for training traditional models, which are then employed for screening the broader dataset (e.g., in the testing set). The hybrid approach balances the speed of zero-shot and the precision of traditional ML, and potentially offers enhanced accuracy at reduced human efforts. However, this approach involves the use of multiple methodologies and still relies on well-curated, labeled training data (in this case, a subset of the whole training set). In this study, for a given set of abstracts, we rectify the “positive” abstracts (i.e., the top 10% of abstracts) identified by zero-shot, and then fit a classification model for rectified labels to the text embeddings of the corresponding abstracts using SVM. This learned classification model is then used to predict the class of other abstracts. We added the R code for the zero-shot and hybrid approaches to the GitHub repository https://github.com/mikeli380/LLMAbstractScreening .

Active learning approach

Active learning [ 20 , 21 , 22 ] is an innovative approach to machine learning that optimizes the training process by allowing the model to selectively query a human annotator for labels on the most informative data points. This method is particularly advantageous in scenarios where labeled data is scarce or expensive to obtain. Active learning models can start with minimal datasets, often requiring only one relevant and one irrelevant abstract, making them particularly suitable for tasks such as abstract screening, where the acquisition of large labeled datasets can be prohibitive.

The main advantage of active learning is its efficiency. By focusing on the most informative samples, it reduces the amount of data that needs to be labeled while still effectively training the model. This can significantly reduce the time and resources required for the annotation process. However, the effectiveness of active learning depends heavily on the initial selection of samples and the criteria used to determine the informativeness of subsequent data points. If not well calibrated, the model may request labels for data that do not significantly improve its performance, resulting in an inefficient use of resources. In addition, the iterative nature of querying and updating the model based on new labels can introduce complexity into the training process, potentially requiring more sophisticated infrastructure and oversight than traditional supervised learning methods.

While this study did not test active learning approaches for abstract screening, readers interested in exploring this methodology further are referred to [ 23 ] for detailed information on the application of active learning in abstract screening contexts.

  • Large language models

Amidst the evolving methodologies described in the “ Existing approaches to abstract screening in systematic reviews ” section, modern AI tools based on large language models (LLMs), such as ChatGPT, PaLM, Llama, and Claude, are emerging as potential game-changers. Grounded in advanced language processing capabilities, these tools can be tailored to evaluate abstracts against nuanced criteria and offer detailed assessments and classifications. Their prowess signals transformative potential for abstract screening. In this section, we first present two examples to illustrate the potential of LLMs in the context of abstract screening, and then precede with our in-depth discussion on the study’s workflow and design that aims to critically investigate the performance of LLMs in this domain.

Examples of abstract screening by using LLMs

In this subsection, we demonstrate the process of LLM-based abstract screening using ChatGPT with two specified abstracts: (1) Millard et al. [ 24 ] and (2) Zawia et al. [ 25 ]. Using other LLM tools such as Llama, Google PaLM, or Claude for abstract screening is similar and will be skipped for brevity.

In constructing the prompts, we adopted a standardized approach to mimic a typical interaction between a senior researcher and a research assistant. Each prompt was designed to include three critical elements: a brief statement of the topic under review, the exact eligibility criteria as specified in the study guidelines, and the text of the abstract to be reviewed. This approach was designed to simulate a realistic and practical scenario that reflects common academic practices. In addition, to ensure consistency and objectivity in the responses, we required that the results of the LLM be in a binary “yes” or “no” format, directly indicating whether each abstract met the inclusion criteria. This format was chosen to eliminate any ambiguity in the model’s decisions and to facilitate straightforward analysis of its accuracy.

Abstract (1)

ChatGPT prompt: “I would like you to help me with conducting a systematic review on animal depression. I will provide the title and abstract for one journal article and would like you to screen the paper for inclusion. Here are the inclusion criteria: Does the study provide primary data of an animal model of depression or depressive-like phenotype with an appropriate control group? Here is the abstract:

Wistar-Kyoto (WKY) rats show endogenous depressive behavior that can be reversed by antidepressants. Given that WKYs exhibit decreased sensitivity to some antidepressants and treatment-resistant depressed patients often show hypothalamic-pituitary-thyroid (HPT) dysregulation, we examined the behavioral and HPT hormonal responses of WKYs to altered thyroid status. “Euthyroid” WKYs had elevated basal plasma TSH and T(3) levels as compared to Wistars. Hypothyroidism increased TSH levels more in WKYs than in Wistars and increased response latency in the open field test (OFT) of WKYs only. Administration of T(4) and T(3) suppressed plasma TSH equally in both strains. Wistars responded to increased T(3) levels with decreased response latency and increased activity in the OFT, but increased immobility in the forced swim test. In contrast, WKYs responded only to the high T(3) levels with decreased response latency in the OFT. These results suggest the existence of a decreased central nervous system sensitivity to thyroid hormones in WKYs that could be related to their depressive behavior.”

ChatGPT v4 response (accessed on 10/15/2023):

Given the abstract and the inclusion criteria, this study appears to meet the inclusion criteria for your systematic review on animal models of depression.

Consideration for inclusion:

 1. Animal model and depressive-like phenotype:

  • The study utilizes Wistar-Kyoto (WKY) rats which are mentioned to show endogenous depressive behavior.

  • This study investigates behaviors and hormonal responses associated with depression in this particular animal model, with a focus on the role of thyroid hormones.

 2. Primary data and appropriate control group:

  • The study provides primary data on behavioral and hormonal responses of WKY rats, comparing them to Wistars (likely serving as the control group), under altered thyroid status, allowing for comparisons between the depressive-like phenotype and controls.

  • The utilization of both “Euthyroid” WKYs and Wistars as controls helps in effectively investigating the behavioral changes and responses in WKY rats due to alterations in thyroid status.

 3. Behavioral test indicating depression:

  • Depressive behavior and responses are assessed using standardized tests like the open field test (OFT) and the forced swim test, which are conventional methods to investigate depressive-like behaviors in animal models.

  • The mention of antidepressant’s impact on WKY rats indicates a focus on depressive-like states.

 4. Relevance to human condition:

  • Insights gained from this study on decreased central nervous system sensitivity to thyroid hormones and its possible relation to depressive behavior in WKYs can have implications in understanding treatment-resistant depression in humans, making it valuable for the review.

Conclusion:

Given the provided abstract, this study does seem to fit the inclusion criteria for a systematic review on animal depression, as it provides primary data on an animal model demonstrating depressive-like behavior, includes an appropriate control group, and employs standardized tests to study the depressive-like states and their relation to thyroid hormones. The detailed exploration of depressive behaviors and hormonal imbalances in relation to altered thyroid status can offer valuable insights for understanding and treating depression.

Abstract (2)

The induction of the activity of cerebral ornithine decarboxylase (EC 4.1.1.17) and mRNA by electrical stimulation exhibits regional differences. The effects of the enzyme inhibitor difluoromethylornithine on these regional variations was examined. Administration of this inhibitor resulted in pronounced depression of both basal and induced activity of ornithine decarboxylase in the hippocampus. Basal activity of the enzyme in the neocortex and the cerebellum appeared to be resistant to difluoromethylornithine but the induced enzyme activity was sensitive to the effects of this inhibitor. Susceptibility to difluoromethylornithine may be directly correlated with a slower turnover rate for ornithine decarboxylase. These results suggest that ornithine decarboxylase in the hippocampus may possess a longer half-life than its counterparts in other regions of the brain. Pretreatment with difluoromethylornithine had no effect on the induced ornithine decarboxylase mRNA in the neocortex. Thus, elevated activity of ornithine decarboxylase enzyme, due to electrical stimulation, appears to not have any effect on either the transcription or the decay rate of the induced ornithine decarboxylase mRNA. These findings support the concept of region-specific regulation of cerebral ornithine decarboxylase. ”

Given the provided abstract and the inclusion criteria, this study does not appear to meet the criteria for inclusion in a systematic review on animal models of depression or depressive-like phenotypes with an appropriate control group.

Consideration for inclusion:

 1. Depression or depressive-like phenotype:

  • The abstract does not provide any information or data related to depression or depressive-like behavior or phenotypes in animals. It is mainly focused on the activity of cerebral ornithine decarboxylase and its regional differences in response to electrical stimulation and inhibitor administration.

 2. Animal model:

  • There is no mention of the use of any specific animal model or any behavioral assays to assess depressive-like symptoms or states in the provided abstract.

 3. Behavioral assessment and primary data:

  • The abstract does not contain any details about behavioral assessments related to depression or depressive-like states. It primarily provides data on the effects of enzyme inhibitor difluoromethylornithine on cerebral ornithine decarboxylase activity and mRNA, but does not relate these findings to depression or depressive-like behaviors.

 4. Relevance to depression:

  • While the study explores biochemical aspects in brain regions like the hippocampus, which is implicated in mood disorders like depression, the connection between the findings and depressive symptoms or states is not established in the provided abstract.

Conclusion:

This article does not seem to fit the inclusion criteria for a systematic review on animal models of depression or depressive-like phenotypes as it lacks primary data on depressive-like behaviors or phenotypes in animals, and there is no mention of the use of animal models representative of depression or an appropriate control group for such a comparison.

From the above two examples presented, it is evident that ChatGPT adeptly comprehends the posed queries and efficiently extracted the inclusion and exclusion criteria. It not only makes clear decisions regarding study inclusion but also articulates its rationale by cross-referencing each criterion.

It is worth noting that the same ChatGPT prompt might result in slightly different responses between different users due to the variability in its response generation, but the conclusion for the eligibility of an abstract is typically consistent. In addition, enhancing the specificity of the criteria within the prompts can further refine the screening process. To illustrate, consider the following enhanced prompts:

Enhanced ChatGPT prompt for abstract (1): “I seek assistance with a systematic review focused on animal depression. I’ll provide the abstract of a particular journal article and would appreciate an assessment for its inclusion based on the following criteria:

The study employs an animal model reflecting depression or a depressive-like phenotype.

It incorporates an appropriate control group.

Primary data is presented within the study.

and here are the exclusion criteria:

The study bypasses the use of an animal model.

The utilized animal model doesn’t characterize depression or a depressive-like phenotype.

Absence of a control group within the study.

The study is primarily a review, devoid of primary data.

Here is the abstract: …”.

Let’s assess the provided abstract based on the inclusion and exclusion criteria:

Inclusion criteria:

 1. Animal model reflecting depression: The study uses Wistar-Kyoto (WKY) rats, which are described as showing “endogenous depressive behavior.” Thus, this criterion is met.

 2. Appropriate control group: The study compares WKY rats to Wistars (another strain), which serves as a comparison or control group. This criterion is met.

 3. Primary data: The abstract mentions specific results such as the elevated basal plasma TSH and T(3) levels in “Euthyroid” WKYs compared to Wistars and the different responses in the open field test (OFT). This suggests the presence of primary data. Thus, this criterion is met.

Exclusion criteria:

 1. Bypasses the use of an animal model: The study uses WKY rats as an animal model. Thus, this criterion is not met, which is good for inclusion.

 2. Animal model doesn’t characterize depression: The WKY rats show “endogenous depressive behavior.” Thus, this criterion is not met, which again supports inclusion.

 3. Absence of a control group: The study does use a control group (Wistars). Thus, this criterion is not met, supporting inclusion.

 4. Study is primarily a review: The abstract presents experimental data and does not indicate that the study is a review. Thus, this criterion is not met, which is favorable for inclusion.

Conclusion:

Given that the abstract meets all the inclusion criteria and none of the exclusion criteria, it should be considered in your systematic review on animal depression.

With these revised prompts applied to the previously discussed abstracts, the conclusions remain consistent. Nonetheless, we contend that this enhanced structure not only provides a more rigorous framework but also aligns more closely with widely accepted practices in systematic review criteria definition.

Automated workflow for streamlining abstract screening via ChatGPT and other tools

For systematic reviews, particularly when dealing with voluminous data, efficient workflows are paramount. The ChatGPT API (application programming interface) offers a dynamic solution, enabling the automation of abstract screening on a large scale, which circumvents the labor-intensive process of manually inputting abstracts into a chatbot interface. In this subsection, we present an automated workflow for streamlining abstract screening via ChatGPT. Note, though this automated workflow uses ChatGPT as the platform, analogous workflows work for other AI platforms like PaLM, Llama, and Claude.

Automated workflow:

Data collection: The preliminary step entails accumulating a list of titles and abstracts. By utilizing carefully crafted keywords, we retrieve these from PubMed and other pertinent databases. This comprehensive approach ensures the potential inclusion of all relevant studies for a subsequent detailed screening. It is worth noting that while this list is expansive, most of these studies may not find their way into the final meta-analysis post-screening.

Automation through Python: We have devised a python script aimed at harnessing the capabilities of ChatGPT for evaluating the amassed abstracts.

This script interacts with the ChatGPT API (specifically, the GPT-4 version) and, when furnished with tailored prompts, extracts structured responses from ChatGPT.

Typically, the AI’s response commences with a succinct summary, delves into explanations aligned with each criterion, and concludes with a decisive judgment, as exemplified by the examples in the “ Examples of abstract screening by using LLMs ” section.

This automated process efficiently saves ChatGPT’s verdicts on each abstract for ensuing analyses. For instance, it extracts the final decisions regarding the inclusion or exclusion of studies and determines the stance on each pre-specified criterion for every abstract, as exemplified by the last example in the “ Examples of abstract screening by using LLMs ” section.

Additionally, to ascertain the efficiency and cost-effectiveness of this methodology, the script also monitors the time, token usage, and the financial implications of querying the OpenAI API.

In essence, we envision this procedure as delegating the meticulous task of poring over scientific summaries to an AI assistant. This virtual entity meticulously sifts through each summary, determining its alignment with stipulated criteria.

Tuning parameters in the ChatGPT API: The effectiveness of the ChatGPT API is not only dependent on the input data; it is also significantly influenced by adjustable parameters that can refine the responses. Parameters such as temperature, top k, and top p critically affect model performance by modulating the randomness and focus of the output. While fine-tuning these parameters can improve results, it requires significant technical expertise and resources. Defaults, which are rigorously tested by developers, strike a balance between output quality and ease of use, making LLMs more accessible to a wider range of users without the need for complex parameter optimization. While customization holds promise for custom applications, the default settings provide an efficient and practical solution that facilitates wider adoption of LLM technologies.

Given the complexity of fine-tuning these parameters to optimize performance specifically for abstract screening, our study primarily used the recommended default settings for these parameters as provided by the respective platforms (detailed in Table 1 ). This approach was chosen to maintain the feasibility of our experiments and to ensure that our findings are applicable to typical deployment scenarios.

The automated workflow described applies to other LLM tools or different versions of the same tools. Throughout our study, we have tested several popular LLM tools available to us. The specific LLM models and their versions used at the time of our first submission are detailed in Table  1 , along with the tuning parameters.

The field of large language models (LLMs) has evolved rapidly since we started this study in 2023. New models are frequently released, and existing versions are constantly updated. To account for these advances, we have expanded our analysis to include results from the latest versions of previously studied models, as well as a few entirely new ones. As a best practice, we set the temperature parameter to 0 for all latest models. Table 1 now includes these latest models and their versions.

A brief explanation of these parameters is as follows:

Temperature: The temperature controls the randomness of the outputs, with a range from 0 to 2. A temperature value greater than 1 is random and 0 is deterministic. The maximum temperature of 2 gives the most creative and variable outputs.

Max length: The max length is the maximum number of tokens for the model to generate as a response. A single word is generally 2–3 tokens.

Stop sequences: This parameter controls which tokens or phrases will stop the LLM from generating more text.

Top p: When generating text, the model samples from the top p percentage of most likely tokens to generate. The top p is the cumulative probability cutoff for the model’s selection of tokens to generate. Lower top p values mean sampling from a smaller, more top-weighted nucleus.

Top k: When generating text, the model samples from the top k most likely tokens. When deciding the next word to output, the model will select the most likely word when top k is lower.

Frequency penalty: This frequency penalty parameter controls how the LLM penalizes tokens that have already appeared in the input and output text. A frequency penalty of 0 implies that the frequency of the tokens does not impact the LLM and will generate them based on their probability.

Presence penalty: This parameter controls how the LLM penalizes tokens that have not been used recently. A presence penalty of 0 means that the LLM does not care about the recency of the tokens and will generate them based on their probability.

Study design

In our pursuit to assess ChatGPT’s proficiency in abstract screening, we selected certain benchmark databases that have existing performance data from other methodologies. This selection aids in a comparative analysis of performance.

In selecting the systematic reviews for our study, we used a systematic approach guided by specific criteria to ensure relevance and reliability. These studies were selected from the publicly available SYNERGY [ 23 ] dataset, which contains 26 systematic reviews from different disciplines. Key selection criteria included:

Clarity and conciseness of eligibility criteria: The selected studies had well-defined and explicit eligibility criteria. This clarity is essential for accurate replication of the study selection process, which is critical for assessing the performance of LLM tools in an analogous real-world application.

Completeness and cleanliness of data: We ensured that the selected reviews had complete datasets, with all necessary information on included and excluded studies clearly documented, minimizing the risk of ambiguities affecting our analysis.

In addition, to comply with the AMSTAR-2 [ 26 ] guidelines, in particular point 5, we reviewed the methodologies of these reviews to confirm the selection of studies was performed in duplicate and disagreements were resolved by consensus. While our analysis assumes that these systematic reviews adhere to high standards, we recognize the inherent limitations of using pre-existing datasets as a proxy for gold standards in the discussion section.

We picked the following 3 databases from the publicly available SYNERGY dataset [ 23 ]:

Bannach-Brown 2016 [ 27 ]—topic: use of animal models to study depressive behavior

Human-curated (gold standard) results: 1258 excluded abstracts and 230 included abstracts.

We randomly selected 100 excluded abstracts and 100 included abstracts for screening by LLM tools.

Meijboom 2021 [ 28 ]—topic: retransitioning of etanercept in patients with a rheumatic disease

Human-curated (gold standard) results: 599 excluded abstracts and all 32 included abstracts.

We randomly selected 100 excluded abstracts and 32 included abstracts for screening by LLM tools.

Menon 2022 [ 29 ]—topic: methodological rigor of systematic reviews in environmental health

Human-curated (gold standard) results: 896 excluded abstracts and 73 included abstracts.

We randomly selected 100 excluded abstracts and all 73 included abstracts for screening by LLM tools.

For each chosen database, abstracts were categorized as either “cases” (those included based on a gold standard) or “controls” (those excluded per the gold standard). From each category, we randomly selected 100 abstracts (we use all abstracts if there are less than 100 abstracts in that category in a database). These abstracts underwent evaluation by ChatGPT (v4.0) as per our established workflow. Subsequently, ChatGPT’s decisions were juxtaposed against the gold standard to determine sensitivity, specificity, and overall accuracy. The same abstracts were also processed using other LLM tools as listed in Table  1 to record their respective verdicts.

Statistical analysis

To quantify the efficacy of ChatGPT and other AI tools for each database, we calculated the following metrics: (1) sensitivity, (2) specificity, and (3) overall accuracy, where sensitivity is defined as the number of true positives divided by the sum of true positives and false negatives, specificity as the number of true negatives divided by the sum of true negatives and false positives, and accuracy as sum of true positives and true negatives divided by the total number of abstracts. For each metric, associated 95% confidence intervals were also determined. Although it is very common in the field to report F1 score, recall rate, and precision, we believe it is more appropriate to report sensitivity and specificity given this study design. In addition, F1 score, recall rate, and precision can be derived from sensitivity and specificity.

Furthermore, to explore the potential of a unified decision-making process, we combined the decisions from all AI tools using a voting mechanism. The majority decision across the tools was taken as the final verdict for each abstract. For this consolidated approach, we again computed sensitivity, specificity, overall accuracy, and the associated 95% CIs for each database. We also explore the use of latent class analysis (LCA), a model-based clustering approach, to derive consolidated decisions. More details on this LCA approach are provided in the “ Beyond majority voting ” section.

For a given database, 100 cases and 100 controls yield a two-sided 95% confidence interval with a half-width equal to 0.048 when the underline sensitivity (specificity) is approximately 95%.

All statistical analyses were conducted using the R statistical software (version 4.3.1). All tests were two-sided with an alpha level set at 0.05 unless otherwise mentioned.

To improve the transparency and reproducibility of studies using AI tools, we have included the TRIPOD + AI checklist [ 30 ] in our report. This checklist has been adapted to reflect the specifics of our research, which focuses on the evaluation of large language models for abstract screening rather than diagnostic or prognostic modeling. The completed checklist is presented in Table S 1 , to provide readers with a comprehensive overview of our study’s adherence to established reporting standards.

We present the results for each of the 3 databases. For each, we first present the prompts we used when we called LLM tools to screen an abstract, then present the performance data (accuracy, sensitivity, and specificity for each method or LLM tool), followed by a summary of the performance, and a comparison of the performance of the different methods to that based on ChatGPT v4.0.

Results on the Bannach-Brown 2016 database (see Table  2 )

The prompts we used for screening abstracts in this database are as follows:

Conduct a systematic review on animal depression. I provide the title and abstract for one journal article. Provide an overall assessment based on eligibility criteria with only one word answer yes or no with no explanation. Then, for each inclusion or exclusion criterion, answer with only one word, yes if it is included by the inclusion criterion or excluded by the exclusion criterion, and answer no if it does not meet the inclusion criterion or not excluded by the exclusion criterion. After answering all the criteria with yes or no, then provide an overall explanation.

Here is the eligibility criteria: Inclusion Criteria: 1. Any article providing primary data of an animal model of depression or depressive-like phenotype with an appropriate control group (specified above). 2. Animals of all ages, sexes and species, where depression-like phenotype intended to mimic the human condition have been induced. Including animal models where depressive-like phenotypes are induced in the presence of a comorbidity (e.g. obesity or cancer). 3. All studies that claim to model depression or depressive-like phenotypes in animals. Studies that induce depressive behavior or model depression and that also test a treatment or intervention (prior or subsequent to model induction), with no exclusion criteria based on dosage, timing or frequency. 4. Studies measuring behavioral, anatomical and structural, electrophysiological, histological and/or neurochemical outcomes and where genomic, proteomic or metabolomic outcomes are measured in addition to behavioral, anatomical, electrophysiological, histological or neurochemical outcomes. Exclusion Criteria: 1. Review article, editorials, case reports, letters or comments, conference or seminar abstracts, studies providing primary data but not appropriate control group. 2. Human studies and ex vivo, in vitro or in silico studies. Studies will be excluded if authors state an intention to induce or investigate only anxiety or anxious behavior. Studies will be excluded if there is no experimental intervention on the animals (e.g. purely observational studies). 3. Studies that investigate treatments or interventions, but no depressive behavior or model of depression is induced (e.g. toxicity and side-effect studies). 4. Where metabolic outcome measures are the primary outcome measure of a study. Where genomic, proteomic, metabolic or metabolomic outcomes are the sole outcome measures in a study, they will be excluded.

Here is the abstract:

Among all the LLM tools we tested, ChatGPT v4.0 stood out with the highest accuracy (0.945) and specificity (0.960), and satisfactory sensitivity (0.930). Combined decision using major voting seemed to improve sensitivity (0.970) considerably but did not improve specificity (0.870) much. Comparatively, the zero-shot method was less effective across these metrics, while the hybrid method attained superior accuracy (0.961) and specificity (0.982), albeit with reduced sensitivity (0.843). We acknowledge that our findings regarding the zero-shot and hybrid methods differ from those reported in [ 18 ] for the same set of abstracts. This discrepancy could be attributed to the use of distinct sets of embeddings in our analysis compared to the previous study.

For this database, the newer versions of ChatGPT (3.5-Turbo and 4-Turbo) did not improve performance over ChatGPT (v4.0). Gemini-1.0-pro (vs. PaLM 2) and Llama 3 (vs. Llama 2) improved over their older versions, but did not surpass the performance of ChatGPT (v4.0). Claude 3 performed well, but still did not surpass the performance of ChatGPT (v4.0).

Comparison between LLM tools. We compared the performance (sensitivity and specificity) between ChatGPTv4.0 and other LLM tools using the McNemar test and found that ChatGPTv4.0 performed significantly better ( p value = 0.002) than Google PaLM 2 in terms of sensitivity; ChatGPTv4.0 performed significantly better than ChatGPTv3.5 ( p value = 0.008) and better than Llama-2 ( p value < 0.001) in terms of specificity. Combining the decisions of different LLM tools using majority voting did not improve the overall accuracy compared to ChatGPTv4.0. Specifically, there was no statistically significant difference ( p value = 0.134) in sensitivity between the combined decision (majority voting) and ChatGPTv4.0, and the combined decision was significantly worse ( p value = 0.008) than ChatGPTv4.0 in terms of specificity.

Comparison between ChatGPT v4.0 and zero-shot and hybrid methods: We assessed the performance (sensitivity and specificity) of ChatGPT v4.0 against both the zero-shot and hybrid approaches using the McNemar test. Specifically, we aligned the screening results from 100 cases and 100 controls as per the ChatGPT v4.0 method and similarly for the zero-shot and hybrid methods, testing for inconsistencies between these approaches as previously done. Our analysis revealed that ChatGPT v4.0 significantly outperformed the zero-shot method in sensitivity ( p value < 0.001) but showed comparable effectiveness in specificity ( p value = 0.37). Additionally, ChatGPT v4.0 demonstrated marginally superior sensitivity compared to the hybrid method ( p value = 0.07), while its performance in specificity was similar ( p value = 1.00).

Results on the Meijboom 2021 database (see Table  3 )

Conduct a systematic review on transitioning patients from an originator to a corresponding biosimilar.

I provide the title and abstract for one journal article. Provide an overall assessment based on eligibility criteria with only one word answer yes or no, with no explanation. Then, for each inclusion or exclusion criterion, answer with only one word, yes if it is included by the inclusion criterion or excluded by the exclusion criterion, and answer no if it does not meet the inclusion criterion or not excluded by the exclusion criterion.

After answering all the criteria with yes or no, then provide an overall explanation.

Here is the eligibility criteria:

Articles were included if they met the following criteria:

Study involved transitioning from a TNFα inhibitor (including etanercept, infliximab, and adalimumab) originator to a biosimilar

The number of patients who retransitioned was reported or could be calculated

The article was an original research article published in a peer-reviewed journal

The article included baseline characteristics of the patients who transitioned

The article was written in English

The full-text version of the article could be obtained.

Transitioning was defined as patients in whom the biosimilar was introduced after the originator, without treatment with other drugs in between. Retransitioning was defined as restarting the originator directly after discontinuing a biosimilar, without treatment with other drugs in between. In summary, transitioning was defined as switching from the originator to a biosimilar; retransitioning was defined as switching from the originator to a biosimilar and back to the originator. Both transitioning and retransitioning involved changes with the same active biological substance.

Among all the LLM tools we tested, ChatGPT v4.0 stood out with the highest accuracy (0.840), but not with specificity (0.860) or satisfactory sensitivity (0.812). Compared to ChatGPTv4.0, combined decision using major voting did not improve overall accuracy (0.720), but improved sensitivity (1.000) at the sacrifice of specificity (0.630).

Comparison between LLM tools. We compared the performance (sensitivity and specificity) between ChatGPTv4.0 and other LLM tools using the McNemar test and found that ChatGPTv4.0 performed significantly better ( p value < 0.001) than Google PaLM 2, but significantly worse than ChatGPT3.5 ( p value = 0.001) and Llama 2 in terms of sensitivity; ChatGPTv4.0 performed significantly better than ChatGPTv3.5 ( p value < 0.001) and better than Llama 2 ( p value < 0.001), but worse than Google PaLM 2 ( p value = 0.002), in terms of specificity. Combining the decisions of different LLM tools using majority voting did not improve the overall accuracy compared to ChatGPTv4.0. Specifically, there was statistically significant difference ( p value = 0.008) in sensitivity between the combined decision (majority voting) and ChatGPTv4.0, and the combined decision was not significantly worse ( p value > 0.50) than ChatGPTv4.0 in terms of specificity.

For this database, the newer versions of ChatGPT (3.5-Turbo and 4-Turbo) did not improve performance over ChatGPT (v4.0), and Gemini-1.0-pro (vs. PaLM 2) did not improve performance either. However, Llama 3 (vs. Llama 2) improved over its older version and surpassed the performance of ChatGPT (v4.0). Claude 3 also slightly surpassed the performance of ChatGPT (v4.0).

Comparison between ChatGPT v4.0 and zero-shot and hybrid methods: We evaluated the performance of ChatGPT v4.0, focusing on sensitivity and specificity, in comparison with the zero-shot and hybrid approaches, employing the McNemar test as described above. In this analysis, we aligned the screening results from 32 cases and 100 controls for the tests. Our findings indicated that ChatGPT v4.0 significantly surpassed the zero-shot method in sensitivity ( p value = 0.0002) and exhibited marginally improved specificity ( p value = 0.099). Furthermore, ChatGPT v4.0 showed notably higher sensitivity than the hybrid method ( p value < 0.001), although its specificity was comparatively lower.

Results on the Menon 2022 database (see Table  4 )

“Conduct a systematic review on the methodological rigour of systematic reviews in environmental health.

I provide the title and abstract for one journal article.

Provide an overall assessment based on eligibility criteria with only one word answer yes or no, with no explanation.

Then, for each inclusion or exclusion criterion, answer with only one word, yes if it is included by the inclusion criterion or excluded by the exclusion criterion and answer no if it does not meet the inclusion criterion or not excluded by the exclusion criterion.

Here are the eligibility criteria:

To be eligible for inclusion in the SR sample, documents had to fulfill the following criteria:

Identify explicitly as a “systematic review” in their title

Assess the effect of a non-acute, non-communicable, environmental exposure on a health outcome. Environmental exposures can include air and water pollutants, radiation, noise, occupational hazards, lifestyle factors (like diet or physical activity) and lifestyle choices influenced by family and peers (like substance use), social and economic factors (like stress from work or living conditions).

Include studies in people or mammalian models

Be available in HTML format

Abstract X”

Among all the LLM tools we tested, ChatGPT v4.0 stood out with the highest accuracy (0.913) and specificity (0.932), but not with specificity (0.900). Compared to ChatGPTv4.0, combined decision using major voting did not improve overall accuracy (0.884) or sensitivity (0.808), but improved specificity (0.940).

Comparison between LLM tools. We compared the performance (sensitivity and specificity) between ChatGPTv4.0 and other LLM tools using the McNemar test and found that ChatGPTv4.0 performed significantly better than ChatGPT3.5 ( p value < 0.001), Google PaLM 2, and Llama 2 ( p value = 0.02) in terms of sensitivity; ChatGPTv4.0 performed worse than ChatGPTv3.5 and Google PaLM 2, in terms of specificity. Combining the decisions of different LLM tools using majority voting did not improve the overall accuracy compared to ChatGPTv4.0. Specifically, there was statistically significant difference ( p value = 0.008) in sensitivity between the combined decision (majority voting) and ChatGPTv4.0, and the combined decision was not significantly different ( p value = 0.134) than ChatGPTv4.0 in terms of specificity.

For this database, the newer versions of ChatGPT (3.5-Turbo and 4-Turbo) did not improve performance over ChatGPT (v4.0). However, both Gemini-1.0-pro (vs. PaLM 2) and Llama 3 (vs. Llama 2) improved over their older versions and surpassed the performance of ChatGPT (v4.0). Claude 3 also performed well but did not surpass the performance of ChatGPT (v4.0).

Comparison between ChatGPT v4.0 and zero-shot and hybrid methods: We aligned the screening results from 73 cases and 100 controls based on the ChatGPT v4.0 method, and similarly for the zero-shot and hybrid methods, to test for inconsistencies between these approaches, using the McNemar test as done in previous assessments. Our analysis showed that ChatGPT v4.0 significantly outperformed the zero-shot method in both sensitivity ( p value < 0.001) and specificity ( p value = 0.016). In comparison with the hybrid method, ChatGPT v4.0 also demonstrated superior sensitivity ( p value < 0.001) and better specificity ( p value = 0.04).

Monetary cost and time cost

To use the ChatGPT API or other LLM tools, the owners of these platforms charge a predetermined rate for access to the corresponding APIs. These fees are calculated in USD per thousand tokens, where tokens are the basic units used by these LLM platforms to quantify text length. In this context, a token can represent a word, a punctuation mark, or a character. The financial cost of screening 200 abstracts was approximately $6 for ChatGPT v4.0, $0.2 for ChatGPT v3.5, $10 for Llama 2 (using Replicate), while Google PaLM 2 offered its services for free to invited developers. Thus, the cumulative cost of evaluating 200 abstracts across all platforms was approximately $16.2. The cumulative cost of evaluating 200 abstracts across all latest models ($3 for GPT-4-Turbo, $0.05 for GPT-3.5-Turbo, free for Gemini-1.0-pro, $0.05 for Llama-3, $4 for Claude) was less, approximately $7.1. In terms of time efficiency, processing 200 abstracts with each of these LLM tools took approximately 10–20 min using a single thread. However, it is imperative to recognize that abstract screening lends itself well to parallelization. Consequently, one could significantly speed up the process by setting up multiple threads to simultaneously screen different subsets of abstracts, thereby reducing the overall time required for completion. This parallel approach not only increases efficiency, but also ensures that large amounts of data can be processed in a timely manner, making LLM tools even more attractive for large-scale abstract screening tasks. In summary, the monetary and time costs of using LLM tools for abstract screening are negligible compared to manual labeling.

Beyond majority voting

We have expanded our analysis to include a variety of approaches for synthesizing decisions across different LLM tools. Our methodology is inspired by the concept of combining multiple diagnostic tests in the absence of a gold standard, akin to situations where human expert consensus is not available. There are several publications discussing such scenarios [ 31 , 32 ], among which we proposed to use the latent class analysis (LCA) models.

Latent class analysis (LCA) is a statistical method used to identify subgroups within a population, which are not directly observed (hence “latent”) [ 33 , 34 , 35 ]. It is particularly useful when the research interest lies in categorizing individuals into mutually exclusive groups based on their responses to multiple observed variables. In the context of abstract screening, LCA can offer a sophisticated means of integrating decisions from different LLM tools without relying on a gold standard, typically provided by human expert consensus. This approach assumes that the unobserved subgroups (or “latent classes”) explain the dependence between the observed decisions made by each of the LLM tools.

Utilizing the LCA model, we treat the decisions from all LLM tools as dichotomous variables, corresponding to the adherence to each inclusion or exclusion criterion, as well as the overall decision. For instance, within the Bannach-Brown 2016 database (BB2016), there are eight criteria in total with four criteria each for inclusion and exclusion and one overall decision for inclusion, resulting in a total of nine binary items per LLM tool. In our analysis, we incorporated decisions from GPT v4.0, v3.5, and Llama 2. Decisions from Google PaLM 2 were excluded due to a high frequency (10% or more) of incomplete responses. Consequently, for the Bannach-Brown 2016 database, we worked with 27 binary items. For other databases such as Meijboom 2021 (Mj2021) and Menon 2022 (Me2022), the binary items totaled 21 and 15, respectively. It is important to note that LCA models were fitted to the binary data of each database independently.

The LCA model fitting process enables us to calculate the posterior probabilities of each abstract belonging to specific latent classes or subgroups. Abstracts are then categorized based on these probabilities, with assignment to the class for which an abstract has the highest posterior membership probability. The determination of the number of latent classes is a critical step in the LCA model fitting, which requires a priori specification. In our evaluation, we explored models with class numbers ranging from 2 to 6 and utilized the Bayesian information criterion (BIC) to identify the most “optimal” LCA model for our datasets.

Table 5 shows that after applying the Bayesian information criterion (BIC) to determine the most appropriate model for our data, we identified a 3-class model as the best fit for the binary decisions derived from the BB2016 database. Similarly, a 4-class model was optimal for the Mj2021 database, while a 3-class model was again best for the Me2022 database. The confusion matrices generated by the selected LCA models for each database provided a clear juxtaposition between the LLM-assigned classes and the actual labels of the abstracts (see Table  6 ).

The performance metrics derived from these models are noteworthy. For the BB2016 database, should we categorize abstracts within class with label 2 as “included” and assign the others in classes with label 1 or 3 to the “excluded” category, the LCA model achieved a sensitivity rate of 93% and a specificity rate of 96%, indicating a high degree of accuracy in classifying relevant and irrelevant abstracts. In the case of the Mj2021 database, if we view class 1 as “included” group and classes 2 and 3 as “excluded” group, the model achieved a perfect sensitivity rate of 100%, meaning that it correctly identified all relevant abstracts, although the specificity was lower at 79%, indicating a higher rate of false positives. Similarly, for the Me2022 database, considering class 1 to be the “excluded” category and classes 2 and 3 to be the “included” group, the model showed a sensitivity of 94.5% and a specificity of 83%, a good balance between identifying relevant abstracts and avoiding false inclusions.

These results highlight the robustness of the latent class analysis approach to the abstract screening task, providing an effective method for classifying abstracts when a gold standard is not available. The varying levels of sensitivity and specificity across databases underscore the need to tailor the LCA model to the specific characteristics of each dataset as well as further future research.

We have incorporated results from the latest LLM models to increase the robustness and relevance of our findings. However, to maintain consistency and comparability with our original analyses, we have not incorporated these new results into the previously established majority voting or latent class analysis (LCA) approaches. Instead, we have chosen to make all raw data, including results from these newer models, freely available in our GitHub repository. This approach allows interested researchers and practitioners to conduct further investigations or apply alternative methods of analysis. By providing access to this additional data, we aim to promote transparency and enable the broader community to engage with and potentially build upon our work.

This study began with a rigorous exploration of the capabilities of large language models (LLMs) in abstract screening. We used automation scripts developed in Python to interact with the APIs of several LLM tools, including Chat GPT v4.0, Google PaLM 2, and Meta Llama 2, as well as latest versions of these tools. Our central goal was to evaluate the efficiency and accuracy of these tools across three different databases of abstracts, leading us to a complex understanding of their potential in this context.

Large language models (LLMs), particularly ChatGPT, have garnered global attention since their inception. Employing LLMs for abstract screening in systematic reviews is an innovative concept [ 32 , 33 ] and remains underexplored. This study presents the first comprehensive evaluation of LLMs applied to systematic review processes. The findings are encouraging, suggesting that LLMs could revolutionize abstract screening. Specifically, ChatGPT v4.0 exhibited stellar performance across three test scenarios, achieving an accuracy of at least 85%. Furthermore, it attained sensitivity and specificity rates ranging from 80% to an impressive 95%. These exceptional outcomes highlight the substantial promise of LLMs in abstract screening, offering an efficient and capable alternative to the conventional, laborious approaches that typically necessitate extensive human annotation.

However, it is important to acknowledge that we are still in the early stages of integrating LLM tools into the abstract screening field, and they are not without their imperfections—for example, occasionally excluding many (20%) relevant studies, even with the best-performed LLM tools, ChatGPT v4.0. These tools are not a universal solution to all the challenges associated with abstract screening, and they are not ready to completely replace human expertise in this area. Instead, they should be embraced as invaluable assistants in the abstract screening process.

In discussing the limitations of our study, it is important to recognize the constraints associated with our dataset selection and model configuration. We used only three databases from the SYNERGY set, limiting the generalizability of our findings across disciplines and datasets. In addition, the reliance on human-curated labels as a gold standard, while necessary, introduces potential biases due to the retrospective nature of our analysis. These labels may contain errors, and the methodology used in the original reviews to resolve discrepancies may affect the validity of our conclusions.

The parameters chosen for our LLMs—temperature, top k, top p, and prompts—were set to defaults or based on natural conversations to balance output quality with ease of use. While this approach minimizes the need for technical expertise, it may not be optimal for all screening scenarios. In addition, the reliance of LLMs on abstracts alone, as opposed to full-text evaluations, presents a fundamental challenge; critical data influencing inclusion or exclusion may not be present in the abstracts, potentially compromising screening accuracy.

In addition, the rapid development of LLMs and their “black box” nature pose challenges to the transparency that is essential in scientific settings. The environmental impact of using these computationally intensive models is also significant [ 36 ], requiring sustainability considerations. Future research should focus on refining these tools to increase transparency and efficiency, and ensure their responsible development and use in systematic reviews.

Our research suggests that LLM tools are ready to take on a role in abstract screening and are poised to have an immediate and positive impact on the process. Their integration into abstract screening can manifest itself in a few ways. They can serve as autonomous AI reviewers, adding an extra layer of scrutiny and ensuring thoroughness. Our findings suggest that a collective decision, such as one derived from majority voting, can sometimes improve sensitivity, underscoring the potential of LLM tools as a reliable safeguard against oversight, ensuring both comprehensiveness and accuracy.

In addition, LLM tools can facilitate a synergistic partnership with human experts. They are adept at identifying “high-risk” abstracts where different LLM tools have reached different judgments, flagging them for further human evaluation, and promoting a careful and error-free screening process while minimizing human effort.

Another exciting prospect is the integration of LLM tools with hybrid approaches and active learning approach. In this scenario, LLM tools could autonomously annotate abstracts in the training set, minimizing the need for human labeling. Subsequently, these labeled abstracts could be used to train custom NLP models, paving the way for a streamlined and efficient abstract screening process, resulting in significant time and resource savings. Further research is needed to understand how the “uncertainty” in LLM-based decisions, when human-curated labels are not used, affects the performance of the hybrid approach. We also note from the test examples that the performance of hybrid approaches varies widely from example to example and depending on the text embedding tools used. Extensive research is needed to perfect the hybrid approaches.

Another future research direction is to explore how to fine-tune different versions of LLM tools and how to derive collective decisions. One idea is that by using different tuning parameters, one could propose different versions of LLM tools, and if they perform similarly in terms of accuracy but give different decisions (e.g., not highly correlated), we expect that better collective decisions would be obtained, as observed in many ensemble learning experiences [ 37 ]. However, this could be costly and require a lot of exploration.

More importantly, it would be particularly valuable to prospectively test these LLM tools, or integration of these tools with other approaches like zero-shot, active learning, in real-world meta-analysis and systematic review projects. This would provide deeper insights into their practical utility and effectiveness, and a more comprehensive understanding of their impact in live research environments.

In summary, while LLM tools may not be able to fully replace human experts in abstract screening, their ability to transform the screening process is both undeniable and imminent. With continued advances in technology and ongoing refinement, these tools will play a critical role in the future of abstract screening, ushering in a new era of efficiency and effectiveness.

Availability of data and materials

The datasets analyzed during the current study are available in the SYNERGY repository, https://github.com/asreview/synergy-dataset . We have created a GitHub repository https://github.com/mikeli380/LLMAbstractScreening to host our code (R/Python), abstracts, and intermediate results so that interested readers can replicate our findings.

Higgins JPT, et al. Cochrane handbook for systematic reviews of interventions. 2019.

Book   Google Scholar  

Moher D, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7). https://doi.org/10.1371/journal.pmed.1000097 .

Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB,  Richardson WS. Evidence based medicine: what it is and what it isn’t. 1996. Clin Orthop Relat Res. 2007;455.  https://doi.org/10.1136/bmj.312.7023.71 .

Lavis JN, Posada FB, Haines PA, Osei E. Use of research to inform public policymaking. Lancet. 2004;364(9445). https://doi.org/10.1016/S0140-6736(04)17317-0 .

Bastian H, Glasziou P,  Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up?. PLoS Med. 2010;7(9).   https://doi.org/10.1371/journal.pmed.1000326 .

Polanin JR, Pigott TD, Espelage DL, Grotpeter JK. Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses. Res Synth Methods. 2019;10(3). https://doi.org/10.1002/jrsm.1354 .

Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358(3). https://doi.org/10.1056/nejmsa065779 .

Edwards P, Clarke M, DiGuiseppi C, Pratap S, Roberts I, Wentz R. Identification of randomized controlled trials in systematic reviews: accuracy and reliability of screening records. Stat Med. 2002;21(11). https://doi.org/10.1002/sim.1190 .

Wang Z, Nayfeh T, Tetzlaff J, O’Blenis P, Murad MH. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE. 2020;15(1). https://doi.org/10.1371/journal.pone.0227742 .

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1). https://doi.org/10.1186/2046-4053-4-5 .

OpenAI. Introducing ChatGPT Plus. OpenAI Blog; 2023. https://openai.com/blog/chatgpt-plus . Accessed 8 Nov 2023.

Google DeepMind. Introducing PaLM 2. Google Blog; 2023.  https://blog.google/technology/ai/google-palm-2-ai-large-language-model/ . Accessed 8 Nov 2023.

Meta AI Research. Meta and Microsoft introduce the next generation of Llama. Facebook AI Blog; 2023. https://ai.meta.com/blog/llama-2/ . Accessed 8 Nov 2023.

Claude.ai. Claude AI API reference. 2024. https://docs.anthropic.com/en/api/getting-started . Accessed 4 Jun 2024.

Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. 2012. https://doi.org/10.1145/2110363.2110464 .

Alshami A, Elsayed M, Ali E, Eltoukhy AEE, Zayed T. Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions. Systems. 2023;11(7). https://doi.org/10.3390/systems11070351 .

Qureshi R, Shaughnessy D, Gill KAR, Robinson KA, Li T, Agai E. Are ChatGPT and large language models ‘the answer’ to bringing us closer to systematic review automation?  Syst Rev. 2023;12(1).   https://doi.org/10.1186/s13643-023-02243-z .

Xian Y, Lampert CH, Schiele B, Akata Z. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell. 2019;41(9). https://doi.org/10.1109/TPAMI.2018.2857768 .

Moreno-Garcia CF, Jayne C, Elyan E, Aceves-Martins M. A novel application of machine learning and zero-shot classification methods for automated abstract screening in systematic reviews. Decis Anal J. 2023;6.  https://doi.org/10.1016/j.dajour.2023.100162 .

Ferdinands G, et al. Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records. Syst Rev. 2023;12(1). https://doi.org/10.1186/s13643-023-02257-7 .

Boetje J, van de Schoot R. The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses. Syst Rev. 2024;13(1). https://doi.org/10.1186/s13643-024-02502-7 .

Yang Y, Loog M. A benchmark and comparison of active learning for logistic regression. Pattern Recognit. 2018;83. https://doi.org/10.1016/j.patcog.2018.06.004 .

De Bruin RJ, Ma Y, Ferdinands G, Teijema J, Van de Schoot. SYNERGY - open machine learning dataset on study selection in systematic reviews. DataverseNL; 2023. https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/HE6NAQ .

Millard SJ, Weston-Green K, Newell KA. The Wistar-Kyoto rat model of endogenous depression: a tool for exploring treatment resistance with an urgent need to focus on sex differences. Prog Neuropsychopharmacol Biol Psychiatry. 2020;101. https://doi.org/10.1016/j.pnpbp.2020.109908 .

Zawia NH, Mattia CJ, Bondy SC. Differential effects of difluoromethylornithine on basal and induced activity of cerebral ornithine decarboxylase and mRNA. Neuropharmacology. 1991;30(4). https://doi.org/10.1016/0028-3908(91)90058-J .

Shea BJ, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358. https://doi.org/10.1136/bmj.j4008 .

Bannach-Brown A, Liao J, Wegener G, Macleod M. Understanding in vivo modelling of depression in non-human animals: a systematic review protocol. Evid Based Preclin Med. 2016;3(2). https://doi.org/10.1002/ebm2.24 .

Meijboom RW, Gardarsdottir H, Becker ML, ten Wolde S, Egberts TCG, Giezen TJ. Incidence of and reasons and determinants associated with retransitioning from biosimilar etanercept to originator etanercept. BioDrugs. 2021;35(6). https://doi.org/10.1007/s40259-021-00501-x .

Menon JML, Struijs F, Whaley P. The methodological rigour of systematic reviews in environmental health. Crit Rev Toxicol. 2022;52(3). https://doi.org/10.1080/10408444.2022.2082917 .

Collins GS, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11(7). https://doi.org/10.1136/bmjopen-2020-048008 .

Joseph L, Gyorkos TW, Coupal L. Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am J Epidemiol. 1995;141(3). https://doi.org/10.1093/oxfordjournals.aje.a117428 .

Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics. 1980;36(1). https://doi.org/10.2307/2530508 .

Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1). https://doi.org/10.1111/j.0006-341X.2001.00158.x .

Collins LM, Lanza ST. Latent class and latent transition analysis: with applications in the social, behavioral, and health sciences. 2010.

Google Scholar  

McCutcheon AL. Latent class analysis. Newbury Park: Sage; 1987.

Rillig MC, Ågerstrand M, Bi M, Gould KA, Sauerland U. Risks and benefits of large language models for the environment. Environm Sci Technol. 2023;57(9). https://doi.org/10.1021/acs.est.3c01106 .

Sagi O, Rokach L. Ensemble learning: a survey.  Wiley Interdiscip Rev Data Min Knowl Discov. 2018;8(4). https://doi.org/10.1002/widm.1249 .

Download references

Acknowledgements

The authors thank the reviewers and the Editor for their insightful comments and suggestions, which have greatly improved the quality of this work.

Not applicable.

Author information

Authors and affiliations.

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA

Michael Li & Xianming Tan

Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC, 27402, USA

Jianping Sun

Department of Biostatistics and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA

Xianming Tan

You can also search for this author in PubMed   Google Scholar

Contributions

XT conceived the paper. ML and XT devised the study. ML, JS, and XT analyzed the datasets. ML, JS, and XT wrote the manuscript. All authors edited and approved the final manuscript.

Corresponding author

Correspondence to Xianming Tan .

Ethics declarations

Ethics approval and consent to participate, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1: table s1. tripod + ai checklist., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, M., Sun, J. & Tan, X. Evaluating the effectiveness of large language models in abstract screening: a comparative analysis. Syst Rev 13 , 219 (2024). https://doi.org/10.1186/s13643-024-02609-x

Download citation

Received : 20 December 2023

Accepted : 11 July 2024

Published : 21 August 2024

DOI : https://doi.org/10.1186/s13643-024-02609-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Abstract screening
  • Automated text analysis
  • Google PaLM

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review on play based learning

IMAGES

  1. (PDF) The Impact of Play-Based Learning Settings on Reading, Writing

    literature review on play based learning

  2. Play-based Learning and the Indoor Learning Environment

    literature review on play based learning

  3. Early Literacy Assessment in the Context of Play-Based Learning

    literature review on play based learning

  4. RESEARCH: Project-Based Learning, A Literature Review

    literature review on play based learning

  5. The Importance of Play Based Learning Printable Poster by Books Bees

    literature review on play based learning

  6. Learning Through Play Literature Review

    literature review on play based learning

COMMENTS

  1. ECE teachers' views on play-based learning: a systematic review

    Play-based learning: theoretical and empirical insights. Although there is a long-established agreement about the centrality of play in early childhood, conceptualizations and theories of play abound (Bennett, Wood, and Rogers Citation 1997; Bergen Citation 2014).Indeed, the vast scientific literature on play draws on multi-disciplinary perspectives and, rather than offering a universal ...

  2. The Value of Play-Based Learning in Early Childhood Classrooms

    VALUE OF PLAY-BASED LEARNING 6 . Literature Review Types of Play . When it comes to play and learning in early childhood education, it is important to know what different types of play there are and how they each benefit a child's learning. Play is used in an early childhood classroom to support students' learning (Wickstrom, et al., 2019).

  3. (PDF) The Effects of Play-Based Learning on Early ...

    The Effects of Play-Based Learning on Early Childhood Education and Development. October 2018. Journal of Evolution of Medical and Dental Sciences 7 (43) 7 (43) DOI: 10.14260/jemds/2018/1044 ...

  4. Play-Based Learning: Evidence-Based Research to Improve ...

    Play-based learning (PBL) unifies play and educational pedagogy. PBL is child-centered and focuses on children's development, interests and abilities through engaging and developmentally appropriate structuring of academic learning experiences (Pyle and DeLuca 2017). The essential purpose of PBL is for children to learn while playing.

  5. Play-based early childhood classrooms and the effect on pre

    The purpose of this literature review is to examine the effects of a play-based early childhood curriculum on the academic and social development of pre-kindergarten children. The findings in this literature review examine the relationships between free play, social skills, and academic outcomes in the early years of school.

  6. Pyle A scoping review of research on play based pedagogies

    Accordingly, the purpose of this scoping review is to systematically examine the empirical and. theoretical research related to play-based pedagogies and their effectiveness on early student. learning. Across the extant literature, the proposed benefits of play tend to be examined through.

  7. Can guidance during play enhance children's learning and development in

    In recent years there has been a conceptual shift toward a "play-based learning" approach that acknowledges the combined benefits of play and traditional teaching, particularly in an ECE context ... of play research (Whitebread, 2019), which includes important studies on guided play, this is an opportune moment to review the literature.

  8. PDF A continuum of play-based learning

    This broadened definition will transgress the binary distinction between pretend play and. learning, and move towards conceptualizing play-based learning as a continuum in order to. enhance both teachers' play-based pedagogies and the study of the benefits of these pedagogies in the early years. Method.

  9. PDF Conceptualizing play-based learning from kindergarten teachers

    From an education perspective, play-based pedagogy is discussed within the context of academic content and for its ability to. extend the learning of subjects taught in the classroom. A psychological perspective of play-. based learning, however, removes the impact of any external influencers (e.g. the socio-cultural.

  10. Learning through play

    The literature review starts with research findings on the relevance of mathematics education in kindergarten, followed by a focus on certain aspects of mathematical competencies. ... The significant learning gains for the play-based approach as compared to the control group underlines that it is possible to obtain learning gains with an ...

  11. The Value of Play-Based Learning in Early Childhood Classrooms

    benefits of play-based learning and why it is important to protect playtime in early childhood classrooms. This research aimed to answer these two questions - what are the social and ... Review of Literature Learning Through Play . Play has been a widely discussed topic in early childhood education throughout history. Dating back to 1896 ...

  12. 'They Get Fed Up with Playing': parents' views on play-based learning

    Hedges (2000) posited that the popular refrain 'learning through play' might be balanced with the idea of 'teaching through play' (p. 20), emphasising the crucial role of teachers in enriching children's play. This argument has been reiterated in research in Australia (Thorpe et al, 2004) and the United Kingdom (UK) (Wood, 2007).

  13. The Power of Playful Learning in the Early Childhood Setting

    But to promote happy, successful, lifelong learners, children must be immersed in developmentally appropriate practice and rich curricular learning that is culturally relevant (NAEYC 2020). Playful learning is a vehicle for achieving this. Schools must also address the inequitable access to play afforded to children (see "Both/And: Early ...

  14. Play-based interventions to support social and communication

    In their systematic review, play-based interventions were included if they met crucial elements of intrinsic motivation, internal control, and the freedom to suspend reality alongside evidence of a 'play frame' identified by playful exchanges of social play cues (Bundy, 2012). Using these criteria, 11 randomised controlled trials (RCTs ...

  15. Play-Based Learning: Evidence-Based Research to Improve Children's

    The design of play as learning is structured based on learning components.Learning through play makes learning more fun, meaningful, communicative, and able to motivate children to learn (Aslindah ...

  16. PDF The challenges of promoting literacy within a play-based learning

    Challenges of Literacy & Play 3 Literature Review Play-Based Learning Kindergarten education has endured significant curricular changes in recent years, including a shift towards incorporating more prescriptive academic standards (Hargreaves & Goodson, 2006) as well as a shift towards the promotion of play-based learning pedagogy (Synodi, 2010).

  17. The Impact of Play-based Learning

    IMPACT OF PLAY-BASED LEARNING 6 . Literature Review . Early childhood programs are essential to the future of children and provide a foundation for children's learning and development (Roden & Szabo, 2017). Many early childhood educators would agree that early childhood programs lay the foundation for the children entering school.

  18. Teacher's Implementation of Play-Based Learning Practices and Barriers

    Play-based learning is defined as an instructional model of enhancing student learning through a variety of learning experiences (Pyle & Danniels, 2018). Play-based learning consists of developmentally age-appropriate learning activities, including exploration, experimentation, engagement, collaboration, and problem solving, to engage

  19. PDF Learning through Play in Early Childhood: A Systematic Review

    Thus, the result of this study made several significant additions to the corpus of knowledge for practical reasons. The following findings display the significance of learning through the play method in early childhood: 1) Learning outcomes, 2) Pedagogical, 3) Professional development and 4) Parents beliefs.

  20. PDF Literature Review: Nature-Based Play and Learning

    This literature review briefly summarizes findings on the benefits of nature-based play and learning in the following areas: . Cognitive development; Academic performance; Social development; Mental and emotional health; Physical health; and Stewardship and sense of place. The range of these benefits highlights the multifaceted value of ...

  21. Young people's experiences with flexible and inclusive ...

    To better understand young people's experiences of engaging with these emerging alternatives to mainstream schooling, a review of current empirical literature was undertaken. Focusing on students' perspectives on learning in flexible and inclusive education environments, 28 qualitative studies met the inclusion criteria for this review.

  22. MTI

    The field of artificial intelligence is drastically advancing. This study aims to provide an overview of the integration of artificial intelligence into learning management systems. This study followed a bibliometric review approach. Specifically, following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, 256 documents from the Scopus and Web of ...

  23. Can Play-Based Learning Provide Adequate Learning Experiences for Young

    CAN PLAY-BASED LEARNING PROVIDE ADEQUATE LEARNING 7 . Review of the Literature Benefits of Play-Based Learning "Play-based learning has been identified as a developmentally appropriate practice incorporating elements of child-centered learning, open-ended inquiry and hands-on experiences" (Keung & Cheung, 2019, p. 627).

  24. Enhancing generalization in Federated Learning with heterogeneous data

    To this end, we compile an original taxonomy, highlighting the main algorithmic approaches and mechanisms behind each identified category. Advancing the current body of knowledge, we empirically evaluate the generalization performance on visual tasks of various methods under moderate and significant levels of data heterogeneity, as common practice within the surveyed literature.

  25. Examining Attitudes and Constructs in AI-Based Digital ...

    The input includes existing literature on the best practices and approaches and primary data on the challenges and limitations in teaching and learning 4IR competencies using AI-based DKE. The process involves a comprehensive review of the existing literature and analysis of data collected through surveys.

  26. Information

    The maritime industry is integral to global trade and heavily depends on precise forecasting to maintain efficiency, safety, and economic sustainability. Adopting deep learning for predictive analysis has markedly improved operational accuracy, cost efficiency, and decision-making. This technology facilitates advanced time series analysis, vital for optimizing maritime operations. This paper ...

  27. The Value of Play-Based Learning in Early Childhood Classrooms

    The purpose of this literature review is to demonstrate the value of play-based learning in early childhood classrooms and encourage educators and policy-makers to focus more on this developmentally appropriate practice of learning through play in early childhood classrooms. The findings of this literature review are that play is an essential component in all areas of child development.

  28. Evaluating the effectiveness of large language models in abstract

    This study aimed to evaluate the performance of large language models (LLMs) in the task of abstract screening in systematic review and meta-analysis studies, exploring their effectiveness, efficiency, and potential integration into existing human expert-based workflows. We developed automation scripts in Python to interact with the APIs of several LLM tools, including ChatGPT v4.0, ChatGPT v3 ...