• Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, are we there yet - a systematic literature review on chatbots in education.

www.frontiersin.org

  • 1 Information Center for Education, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany
  • 2 Educational Science Faculty, Open University of the Netherlands, Heerlen, Netherlands
  • 3 Computer Science Faculty, Goethe University, Frankfurt am Main, Germany

Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.

Introduction

Educational Technologies enable distance learning models and provide students with the opportunity to learn at their own pace. They have found their way into schools and higher education institutions through Learning Management Systems and Massive Open Online Courses, enabling teachers to scale up good teaching practices ( Ferguson and Sharples, 2014 ) and allowing students to access learning material ubiquitously ( Virtanen et al., 2018 ).

Despite the innovative power of educational technologies, most commonly used technologies do not substantially change teachers’ role. Typical teaching activities like providing students with feedback, motivating them, or adapting course content to specific student groups are still entrusted exclusively to teachers, even in digital learning environments. This can lead to the teacher-bandwidth problem ( Wiley and Edwards, 2002 ), the result of a shortage of teaching staff to provide highly informative and competence-oriented feedback at large scale. Nowadays, however, computers and other digital devices open up far-reaching possibilities that have not yet been fully exploited. For example, incorporating process data can provide students with insights into their learning progress and bring new possibilities for formative feedback, self-reflection, and competence development ( Quincey et al., 2019 ). According to ( Hattie, 2009 ), feedback in terms of learning success has a mean effect size of d = 0.75, while ( Wisniewski et al., 2019 ) even report a mean effect of d = 0.99 for highly informative feedback. Such feedback provides suitable conditions for self-directed learning ( Winne and Hadwin, 2008 ) and effective metacognitive control of the learning process ( Nelson and Narens, 1994 ).

One of the educational technologies designed to provide actionable feedback in this regard is Learning Analytics. Learning Analytics is defined as the research area that focuses on collecting traces that learners leave behind and using those traces to improve learning ( Duval and Verbert, 2012 ; Greller and Drachsler, 2012 ). Learning Analytics can be used both by students to reflect on their own learning progress and by teachers to continuously assess the students’ efforts and provide actionable feedback. Another relevant educational technology is Intelligent Tutoring Systems. Intelligent Tutoring Systems are defined as computerized learning environments that incorporate computational models ( Graesser et al., 2001 ) and provide feedback based on learning progress. Educational technologies specifically focused on feedback for help-seekers, comparable to raising hands in the classroom, are Dialogue Systems and Pedagogical Conversational Agents ( Lester et al., 1997 ). These technologies can simulate conversational partners and provide feedback through natural language ( McLoughlin and Oliver, 1998 ).

Research in this area has recently focused on chatbot technology, a subtype of dialog systems, as several technological platforms have matured and led to applications in various domains. Chatbots incorporate generic language models extracted from large parts of the Internet and enable feedback by limiting themselves to text or voice interfaces. For this reason, they have also been proposed and researched for a variety of applications in education ( Winkler and Soellner, 2018 ). Recent literature reviews on chatbots in education ( Winkler and Soellner, 2018 ; Hobert, 2019a ; Hobert and Meyer von Wolff, 2019 ; Jung et al., 2020 ; Pérez et al., 2020 ; Smutny and Schreiberova, 2020 ; Pérez-Marín, 2021 ) have reported on such applications as well as design guidelines, evaluation possibilities, and effects of chatbots in education.

In this paper, we contribute to the state-of-the-art of chatbots in education by presenting a systematic literature review, where we examine so-far unexplored areas such as implementation objectives, pedagogical roles, mentoring scenarios, the adaptations of chatbots to learners, and application domains. This paper is structured as follows: First, we review related work (section 2), derive research questions from it, then explain the applied method for searching related studies (section 3), followed by the results (section 4), and finally, we discuss the findings (section 5) and point to future research directions in the field (section 5).

Related Work

In order to accurately cover the field of research and deal with the plethora of terms for chatbots in the literature (e.g. chatbot, dialogue system or pedagogical conversational agent) we propose the following definition:

Chatbots are digital systems that can be interacted with entirely through natural language via text or voice interfaces. They are intended to automate conversations by simulating a human conversation partner and can be integrated into software, such as online platforms, digital assistants, or be interfaced through messaging services.

Outside of education, typical applications of chatbots are in customer service ( Xu et al., 2017 ), counseling of hospital patients ( Vaidyam et al., 2019 ), or information services in smart speakers ( Ram et al., 2018 ). One central element of chatbots is the intent classification, also named the Natural Language Understanding (NLU) component, which is responsible for the sense-making of human input data. Looking at the current advances in chatbot software development, it seems that this technology’s goal is to pass the Turing Test ( Saygin et al., 2000 ) one day, which could make chatbots effective educational tools. Therefore, we ask ourselves “ Are we there yet? - Will we soon have an autonomous chatbot for every learner?”

To understand and underline the current need for research in the use of chatbots in education, we first examined the existing literature, focusing on comprehensive literature reviews. By looking at research questions in these literature reviews, we identified 21 different research topics and extracted findings accordingly. To structure research topics and findings in a comprehensible way, a three-stage clustering process was applied. While the first stage consisted of coding research topics by keywords, the second stage was applied to form overarching research categories ( Table 1 ). In the final stage, the findings within each research category were clustered to identify and structure commonalities within the literature reviews. The result is a concept map, which consists of four major categories. Those categories are CAT1. Applications of Chatbots, CAT2. Chatbot Designs, CAT3. Evaluation of Chatbots and CAT4. Educational Effects of Chatbots. To standardize the terminology and concepts applied, we present the findings of each category in a separate sub-section, respectively ( see Figure 1 , Figure 2 , Figure 3 , and Figure 4 ) and extended it with the outcomes of our own literature study that will be reported in the remaining parts of this article. Due to the size of the concept map a full version can be found in Appendix A .

www.frontiersin.org

TABLE 1 . Assignment of coded research topics identified in related literature reviews to research categories.

www.frontiersin.org

FIGURE 1 . Applications of chatbots in related literature reviews (CAT1).

www.frontiersin.org

FIGURE 2 . Chatbot designs in related literature reviews (CAT2).

www.frontiersin.org

FIGURE 3 . Evaluation of chatbots in related literature reviews (CAT3).

www.frontiersin.org

FIGURE 4 . Educational Effects of chatbots in related literature reviews (CAT4).

Regarding the applications of chatbots (CAT1), application clusters (AC) and application statistics (AS) have been described in the literature, which we visualized in Figure 1 . The study of ( Pérez et al., 2020 ) identifies two application clusters, defined through chatbot activities: “service-oriented chatbots” and “teaching-oriented chatbots.” ( Winkler and Soellner, 2018 ) identify applications clusters by naming the domains “health and well-being interventions,” “language learning,” “feedback and metacognitive thinking” as well as “motivation and self-efficacy.” Concerning application statistics (AS), ( Smutny and Schreiberova, 2020 ) found that nearly 47% of the analyzed chatbots incorporate informing actions, and 18% support language learning by elaborating on chatbots integrated into the social media platform Facebook. Besides, the chatbots studied had a strong tendency to use English, at 89%. This high number aligns with results from ( Pérez-Marín, 2021 ), where 75% of observed agents, as a related technology, were designed to interact in the English language. ( Pérez-Marín, 2021 ) also shows that 42% of the analyzed chatbots had mixed interaction modalities. Finally, ( Hobert and Meyer von Wolff, 2019 ) observed that only 25% of examined chatbots were incorporated in formal learning settings, the majority of published material focuses on student-chatbot interaction only and does not enable student-student communication, as well as nearly two-thirds of the analyzed chatbots center only on a single domain. Overall, we can summarize that so far there are six application clusters for chatbots for education categorized by chatbot activities or domains. The provided statistics allow for a clearer understanding regarding the prevalence of chatbots applications in education ( see Figure 1 ).

Regarding chatbot designs (CAT2), most of the research questions concerned with chatbots in education can be assigned to this category. We found three aspects in this category visualized in Figure 2 : Personality (PS), Process Pipeline (PP), and Design Classifications (DC). Within these, most research questions can be assigned to Design Classifications (DC), which are separated into Classification Aspects (DC2) and Classification Frameworks (DC1). One classification framework is defined through “flow chatbots,” “artificially intelligent chatbots,” “chatbots with integrated speech recognition,” as well as “chatbots with integrated context-data” by ( Winkler and Soellner, 2018 ). A second classification framework by ( Pérez-Marín, 2021 ) covers pedagogy, social, and HCI features of chatbots and agents, which themselves can be further subdivided into more detailed aspects. Other Classification Aspects (DC2) derived from several publications, provide another classification schema, which distinguishes between “retrieval vs. generative” based technology, the “ability to incorporate context data,” and “speech or text interface” ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ). By specifying text interfaces as “Button-Based” or “Keyword Recognition-Based” ( Smutny and Schreiberova, 2020 ), text interfaces can be subdivided. Furthermore, a comparison of speech and text interfaces ( Jung et al., 2020 ) shows that text interfaces have advantages for conveying information, and speech interfaces have advantages for affective support. The second aspect of CAT2 concerns the chatbot processing pipeline (PP), highlighting user interface and back-end importance ( Pérez et al., 2020 ). Finally, ( Jung et al., 2020 ) focuses on the third aspect, the personality of chatbots (PS). Here, the study derives four guidelines helpful in education: positive or neutral emotional expressions, a limited amount of animated or visual graphics, a well-considered gender of the chatbot, and human-like interactions. In summary, we have found in CAT2 three main design aspects for the development of chatbots. CAT2 is much more diverse than CAT1 with various sub-categories for the design of chatbots. This indicates the huge flexibility to design chatbots in various ways to support education.

Regarding the evaluation of chatbots (CAT3), we found three aspects assigned to this category, visualized in Figure 3 : Evaluation Criteria (EC), Evaluation Methods (EM), and Evaluation Instruments (EI). Concerning Evaluation Criteria, seven criteria can be identified in the literature. The first and most important in the educational field, according to ( Smutny and Schreiberova, 2020 ) is the evaluation of learning success ( Hobert, 2019a ), which can have subcategories such as how chatbots are embedded in learning scenarios ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ) and teaching efficiency ( Pérez et al., 2020 ). The second is acceptance, which ( Hobert, 2019a ) names as “acceptance and adoption” and ( Pérez et al., 2020 ) as “students’ perception.” Further evaluation criteria are motivation, usability, technical correctness, psychological, and further beneficial factors ( Hobert, 2019a ). These Evaluation Criteria show broad possibilities for the evaluation of chatbots in education. However, ( Hobert, 2019a ) found that most evaluations are limited to single evaluation criteria or narrower aspects of them. Moreover, ( Hobert, 2019a ) introduces a classification matrix for chatbot evaluations, which consists of the following Evaluation Methods (EM): Wizard-of-Oz approach, laboratory studies, field studies, and technical validations. In addition to this, ( Winkler and Soellner, 2018 ) recommends evaluating chatbots by their embeddedness into a learning scenario, a comparison of human-human and human-chatbot interactions, and comparing spoken and written communication. Instruments to measure these evaluation criteria were identified by ( Hobert, 2019a ) by naming quantitative surveys, qualitative interviews, transcripts of dialogues, and technical log files. Regarding CAT3, we found three main aspects for the evaluation of chatbots. We can conclude that this is a more balanced and structured distribution in comparison to CAT2, providing researchers with guidance for evaluating chatbots in education.

Regarding educational effects of chatbots (CAT4), we found two aspects visualized in Figure 4 : Effect Size (ES) and Beneficial Chatbot Features for Learning Success (BF). Concerning the effect size, ( Pérez et al., 2020 ) identified a strong dependency between learning and the related curriculum, while ( Winkler and Soellner, 2018 ) elaborate on general student characteristics that influence how students interact with chatbots. They state that students’ attitudes towards technology, learning characteristics, educational background, self-efficacy, and self-regulation skills affect these interactions. Moreover, the study emphasizes chatbot features, which can be regarded as beneficial in terms of learning outcomes (BF): “Context-Awareness,” “Proactive guidance by students,” “Integration in existing learning and instant messaging tools,” “Accessibility,” and “Response Time.” Overall, for CAT4, we found two main distinguishing aspects for chatbots, however, the reported studies vary widely in their research design, making high-level results hardly comparable.

Looking at the related work, many research questions for the application of chatbots in education remain. Therefore, we selected five goals to be further investigated in our literature review. Firstly, we were interested in the objectives for implementing chatbots in education (Goal 1), as the relevance of chatbots for applications within education seems to be not clearly delineated. Secondly, we aim to explore the pedagogical roles of chatbots in the existing literature (Goal 2) to understand how chatbots can take over tasks from teachers. ( Winkler and Soellner, 2018 ) and ( Pérez-Marín, 2021 ), identified research gaps for supporting meta-cognitive skills with chatbots such as self-regulation. This requires a chatbot application that takes a mentoring role, as the development of these meta-cognitive skills can not be achieved solely by information delivery. Within our review we incorporate this by reviewing the mentoring role of chatbots as (Goal 3). Another key element for a mentoring chatbot is adaptation to the learners needs. Therefore, (Goal 4) of our review lies in the investigation of the adaptation approaches used by chatbots in education. For (Goal 5), we want to extend the work of ( Winkler and Soellner, 2018 ) and ( Pérez et al., 2020 ) regarding Application Clusters (AC) and map applications by further investigating specific learning domains in which chatbots have been studied.

To delineate and map the field of chatbots in education, initial findings were collected by a preliminary literature search. One of the takeaways is that the emerging field around educational chatbots has seen much activity in the last two years. Based on the experience of this preliminary search, search terms, queries, and filters were constructed for the actual structured literature review. This structured literature review follows the PRISMA framework ( Liberati et al., 2009 ), a guideline for reporting systematic reviews and meta-analyses. The framework consists of an elaborated structure for systematic literature reviews and sets requirements for reporting information about the review process ( see section 3.2 to 3.4).

Research Questions

Contributing to the state-of-the-art, we investigate five aspects of chatbot applications published in the literature. We therefore guided our research with the following research questions:

RQ1: Which objectives for implementing chatbots in education can be identified in the existing literature?

RQ2: Which pedagogical roles of chatbots can be identified in the existing literature?

RQ3: Which application scenarios have been used to mentor students?

RQ4: To what extent are chatbots adaptable to personal students’ needs?

RQ5: What are the domains in which chatbots have been applied so far?

Sources of Information

As data sources, Scopus, Web of Science, Google Scholar, Microsoft Academics, and the educational research database “Fachportal Pädagogik” (including ERIC) were selected, all of which incorporate all major publishers and journals. In ( Martín-Martín et al., 2018 ) it was shown that for the social sciences only 29.8% and for engineering and computer science, 46.8% of relevant literature is included in all of the first three databases. For the topic of chatbots in education, a value between these two numbers can be assumed, which is why an approach of integrating several publisher-independent databases was employed here.

Search Criteria

Based on the findings from the initial related work search, we derived the following search query:

( Education OR Educational OR Learning OR Learner OR Student OR Teaching OR School OR University OR Pedagogical ) AND Chatbot.

It combines education-related keywords with the “chatbot” keyword. Since chatbots are related to other technologies, the initial literature search also considered keywords such as “pedagogical agents,” “dialogue systems,” or “bots” when composing the search query. However, these increased the number of irrelevant results significantly and were therefore excluded from the query in later searches.

Inclusion and Exclusion Criteria

The queries were executed on 23.12.2020 and applied twice to each database, first as a title search query and secondly as a keyword-based search. This resulted in a total of 3.619 hits, which were checked for duplicates resulting in 2.678 candidate publications. The overall search and filtering process is shown in Figure 5 .

www.frontiersin.org

FIGURE 5 . PRISMA flow chart.

In the case of Google Scholar, the number of results sorted by relevance per query was limited to 300, as this database also delivers many less relevant works. The value was determined by looking at the search results in detail using several queries to exclude as few relevant works as possible. This approach showed promising results and, at the same time, did not burden the literature list with irrelevant items.

The further screening consisted of a four-stage filtering process. First, eliminating duplicates in the results of title and keyword queries of all databases independently and second, excluding publications based on the title and abstract that:

• were not available in English

• did not describe a chatbot application

• were not mainly focused on learner-centered chatbots applications in schools or higher education institutions, which is according to the preliminary literature search the main application area within education.

Third, we applied another duplicate filter, this time for the merged set of publications. Finally, a filter based on the full text, excluding publications that were:

• limited to improve chatbots technically (e.g., publications that compare or develop new algorithms), as research questions presented in these publications were not seeking for additional insights on applications in education

• exclusively theoretical in nature (e.g., publications that discuss new research projects, implementation concepts, or potential use cases of chatbots in education), as they either do not contain research questions or hypotheses or do not provide conclusions from studies with learners.

After the first, second, and third filters, we identified 505 candidate publications. We continued our filtering process by reading the candidate publications’ full texts resulting in 74 publications that were used for our review. Compared to 3.619 initial database results, the proportion of relevant publications is therefore about 2.0%.

The final publication list can be accessed under https://bit.ly/2RRArFT .

To analyze the identified publications and derive results according to the research questions, full texts were coded, considering for each publication the objectives for implementing chatbots (RQ1), pedagogical roles of chatbots (RQ2), their mentoring roles (RQ3), adaptation of chatbots (RQ4), as well as their implementation domains in education (RQ5) as separated sets of codes. To this end, initial codes were identified by open coding and iteratively improved through comparison, group discussion among the authors, and subsequent code expansion. Further, codes were supplemented with detailed descriptions until a saturation point was reached, where all included studies could be successfully mapped to codes, suggesting no need for further refinement. As an example, codes for RQ2 (Pedagogical Roles) were adapted and refined in terms of their level of abstraction from an initial set of only two codes, 1 ) a code for chatbots in the learning role and 2 ) a code for chatbots in a service-oriented role. After coding a larger set of publications, it became clear that the code for service-oriented chatbots needed to be further distinguished. This was because it summarized e.g. automation activities with activities related to self-regulated learning and thus could not be distinguished sharply enough from the learning role. After refining the code set in the next iteration into a learning role, an assistance role, and a mentoring role, it was then possible to ensure the separation of the individual codes. In order to avoid defining new codes for singular or a very small number of publications, studies were coded as “other” (RQ1) or “not defined” (RQ2), if their occurrence was less than eight publications, representing less than 10% of the publications in the final paper list.

By grouping the resulting relevant publications according to their date of publication, it is apparent that chatbots in education are currently in a phase of increased attention. The release distribution shows slightly lower publication numbers in the current than in the previous year ( Figure 6 ), which could be attributed to a time lag between the actual publication of manuscripts and their dissemination in databases.

www.frontiersin.org

FIGURE 6 . Identified chatbot publications in education per year.

Applying the curve presented in Figure 6 to Gartner’s Hype Cycle ( Linden and Fenn, 2003 ) suggests that technology around chatbots in education may currently be in the “Innovation Trigger” phase. This phase is where many expectations are placed on the technology, but the practical in-depth experience is still largely lacking.

Objectives for Implementing Chatbots in Education

Regarding RQ1, we extracted implementation objectives for chatbots in education. By analyzing the selected publications we identified that most of the objectives for chatbots in education can be described by one of the following categories: Skill improvement, Efficiency of Education, and Students’ Motivation ( see Figure 7 ). First, the “improvement of a student’s skill” (or Skill Improvement ) objective that the chatbot is supposed to help with or achieve. Here, chatbots are mostly seen as a learning aid that supports students. It is the most commonly cited objective for chatbots. The second objective is to increase the Efficiency of Education in general. It can occur, for example, through the automation of recurring tasks or time-saving services for students and is the second most cited objective for chatbots. The third objective is to increase Students’ Motivation . Finally, the last objective is to increase the Availability of Education . This objective is intended to provide learning or counseling with temporal flexibility or without the limitation of physical presence. In addition, there are other, more diverse objectives for chatbots in education that are less easy to categorize. In cases of a publication indicating more than one objective, the publication was distributed evenly across the respective categories.

www.frontiersin.org

FIGURE 7 . Objectives for implementing chatbots identified in chatbot publications.

Given these results, we can summarize four major implementing objectives for chatbots. Of these, Skill Improvement is the most popular objective, constituting around one-third of publications (32%). Making up a quarter of all publications, Efficiency of Education is the second most popular objective (25%), while addressing Students’ Motivation and Availability of Education are third (13%) and fourth (11%), respectively. Other objectives also make up a substantial amount of these publications (19%), although they were too diverse to categorize in a uniform way. Examples of these are inclusivity ( Heo and Lee, 2019 ) or the promotion of student teacher interactions ( Mendoza et al., 2020 ).

Pedagogical Roles

Regarding RQ2, it is crucial to consider the use of chatbots in terms of their intended pedagogical role. After analyzing the selected articles, we were able to identify four different pedagogical roles: a supporting learning role, an assisting role, and a mentoring role.

In the supporting learning role ( Learning ), chatbots are used as an educational tool to teach content or skills. This can be achieved through a fixed integration into the curriculum, such as conversation tasks (L. K. Fryer et al., 2020 ). Alternatively, learning can be supported through additional offerings alongside classroom teaching, for example, voice assistants for leisure activities at home ( Bao, 2019 ). Examples of these are chatbots simulating a virtual pen pal abroad ( Na-Young, 2019 ). Conversations with this kind of chatbot aim to motivate the students to look up vocabulary, check their grammar, and gain confidence in the foreign language.

In the assisting role ( Assisting ), chatbot actions can be summarized as simplifying the student's everyday life, i.e., taking tasks off the student’s hands in whole or in part. This can be achieved by making information more easily available ( Sugondo and Bahana, 2019 ) or by simplifying processes through the chatbot’s automation ( Suwannatee and Suwanyangyuen, 2019 ). An example of this is the chatbot in ( Sandoval, 2018 ) that answers general questions about a course, such as an exam date or office hours.

In the mentoring role ( Mentoring ), chatbot actions deal with the student’s personal development. In this type of support, the student himself is the focus of the conversation and should be encouraged to plan, reflect or assess his progress on a meta-cognitive level. One example is the chatbot in ( Cabales, 2019 ), which helps students develop lifelong learning skills by prompting in-action reflections.

The distribution of each pedagogical role is shown in Figure 8 . From this, it can be seen that Learning is the most frequently used role of the examined publications (49%), followed by Assisting (20%) and Mentoring (15%). It should be noted that pedagogical roles were not identified for all the publications examined. The absence of a clearly defined pedagogical role (16%) can be attributed to the more general nature of these publications, e.g. focused on students’ small talk behaviors ( Hobert, 2019b ) or teachers’ attitudes towards chatbot applications in classroom teaching (P. K. Bii et al., 2018 ).

www.frontiersin.org

FIGURE 8 . Pedagogical roles identified in chatbot publications.

Looking at pedagogical roles in the context of objectives for implementing chatbots, relations among publications can be inspected in a relations graph ( Figure 9 ). According to our results, the strongest relation in the examined publications can be considered between Skill Improvement objective and the Learning role. This strong relation is partly because both, the Skill Improvement objective and the Learning role, are the largest in their respective categories. In addition, two other strong relations can be observed: Between the Students’ Motivation objective and the Learning role, as well as between Efficiency of Education objective and Assisting role.

www.frontiersin.org

FIGURE 9 . Relations graph of pedagogical roles and objectives for implementing chatbots.

By looking at other relations in more detail, there is surprisingly no relation between Skill Improvement as the most common implementation objective and Assisting , as the 2nd most common pedagogical role. Furthermore, it can be observed that the Mentoring role has nearly equal relations to all of the objectives for implementing chatbots.

The relations graph ( Figure 9 ) can interactively be explored through bit.ly/32FSKQM.

Mentoring Role

Regarding RQ3, we identified eleven publications that deal with chatbots in this regard. The Mentoring role in these publications can be categorized in two dimensions. Starting with the first dimension, the mentoring method, three methods can be observed:

• Scaffolding ( n = 7)

• Recommending ( n = 3)

• Informing ( n = 1)

An example of Scaffolding can be seen in ( Gabrielli et al., 2020 ), where the chatbot coaches students in life skills, while an example of Recommending can be seen in ( Xiao et al., 2019 ), where the chatbot recommends new teammates. Finally, Informing can be seen in ( Kerly et al., 2008 ), where the chatbot informs students about their personal Open Learner Model.

The second dimension is the addressed mentoring topic, where the following topics can be observed:

• Self-Regulated Learning ( n = 5)

• Life Skills ( n = 4)

• Learning Skills ( n = 2)

While Mentoring chatbots to support Self-Regulated Learning are intended to encourage students to reflect on and plan their learning progress, Mentoring chatbots to support Life Skills address general student’s abilities such as self-confidence or managing emotions. Finally, Mentoring chatbots to support Learning Skills , in contrast to Self-Regulated Learning , address only particular aspects of the learning process, such as new learning strategies or helpful learning partners. An example for Mentoring chatbots supporting Life Skill is the Logo counseling chatbot, which promotes healthy self-esteem ( Engel et al., 2020 ). CALMsystem is an example of a Self-Regulated Learning chatbot, which informs students about their data in an open learner model ( Kerly et al., 2008 ). Finally, there is the Learning Skills topic. Here, the MCQ Bot is an example that is designed to introduce students to transformative learning (W. Huang et al., 2019 ).

Regarding RQ4, we identified six publications in the final publication list that address the topic of adaptation. Within these publications, five adaptation approaches are described:

The first approach (A1) is proposed by ( Kerly and Bull, 2006 ) and ( Kerly et al., 2008 ), dealing with student discussions based on success and confidence during a quiz. The improvement of self-assessment is the primary focus of this approach. The second approach (A2) is presented in ( Jia, 2008 ), where the personality of the chatbot is adapted to motivate students to talk to the chatbot and, in this case, learn a foreign language. The third approach (A3), as shown in the work of ( Vijayakumar et al., 2019 ), is characterized by a chatbot that provides personalized formative feedback to learners based on their self-assessment, again in a quiz situation. Here, the focus is on Hattie and Timperley’s three guiding questions: “Where am I going?,” “How am I going?” and “Where to go next?” ( Hattie and Timperley, 2007 ). In the fourth approach (A4), exemplified in ( Ruan et al., 2019 ), the chatbot selects questions within a quiz. Here, the chatbot estimates the student’s ability and knowledge level based on the quiz progress and sets the next question accordingly. Finally, a similar approach (A5) is shown in ( Davies et al., 2020 ). In contrast to ( Ruan et al., 2019 ), this chatbot adapts the amount of question variation and takes psychological features into account which were measured by psychological tests before.

We examined these five approaches by organizing them according to their information sources and extracted learner information. The results can be seen in Table 2 .

www.frontiersin.org

TABLE 2 . Adaptation approaches of chatbots in education.

Four out of five adaptation approaches (A1, A3, A4, and A5) are observed in the context of quizzes. These adaptations within quizzes can be divided into two mainstreams: One is concerned about students’ feedback (A1 and A3), while the other is concerned about learning material selection (A4 and A5). The only different adaptation approach is shown in A2, which focuses on the adaptation of the chatbot personality within a language learning application.

Domains for Chatbots in Education

Regarding RQ5, we identified 20 domains of chatbots in education. These can broadly be divided by their pedagogical role into three domain categories (DC): Learning Chatbots , Assisting Chatbots , and Mentoring Chatbots . The remaining publications are grouped in the Other Research domain category. The complete list of identified domains can be seen in Table 3 .

www.frontiersin.org

TABLE 3 . Domains of chatbots in education.

The domain category Learning Chatbots , which deals with chatbots incorporating the pedagogical role Learning , can be subdivided into seven domains: 1 ) Language Learning , 2 ) Learn to Program , 3 ) Learn Communication Skills , 4 ) Learn about Educational Technologies , 5 ) Learn about Cultural Heritage , 6 ) Learn about Laws , and 7 ) Mathematics Learning . With more than half of publications (53%), chatbots for Language Learning play a prominent role in this domain category. They are often used as chat partners to train conversations or to test vocabulary. An example of this can be seen in the work of ( Bao, 2019 ), which tries to mitigate foreign language anxiety by chatbot interactions in foreign languages.

The domain category Assisting Chatbots , which deals with chatbots incorporating the pedagogical role Assisting , can be subdivided into four domains: 1 ) Administrative Assistance , 2 ) Campus Assistance , 3 ) Course Assistance , and 4 ) Library Assistance . With one-third of publications (33%), chatbots in the Administrative Assistance domain that help to overcome bureaucratic hurdles at the institution, while providing round-the-clock services, are the largest group in this domain category. An example of this can be seen in ( Galko et al., 2018 ), where the student enrollment process is completely shifted to a conversation with a chatbot.

The domain category Mentoring Chatbots , which deals with chatbots incorporating the pedagogical role Mentoring , can be subdivided into three domains: 1 ) Scaffolding Chatbots , 2 ) Recommending Chatbots , and 3 ) Informing Chatbots . An example of a Scaffolding Chatbots is the CRI(S) chatbot ( Gabrielli et al., 2020 ), which supports life skills such as self-awareness or conflict resolution in discussion with the student by promoting helpful ideas and tricks.

The domain category Other Research , which deals with chatbots not incorporating any of these pedagogical roles, can be subdivided into three domains: 1 ) General Chatbot Research in Education , 2 ) Indian Educational System , and 3 ) Chatbot Interfaces . The most prominent domain, General Chatbot Research , cannot be classified in one of the other categories but aims to explore cross-cutting issues. An example for this can be seen in the publication of ( Hobert, 2020 ), which researches the importance of small talk abilities of chatbots in educational settings.

Discussions

In this paper, we investigated the state-of-the-art of chatbots in education according to five research questions. By combining our results with previously identified findings from related literature reviews, we proposed a concept map of chatbots in education. The map, reported in Appendix A , displays the current state of research regarding chatbots in education with the aim of supporting future research in the field.

Answer to Research Questions

Concerning RQ1 (implementation objectives), we identified four major objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . These four objectives cover over 80% of the analyzed publications ( see Figure 7 ). Based on the findings on CAT3 in section 2, we see a mismatch between the objectives for implementing chatbots compared to their evaluation. Most researchers only focus on narrow aspects for the evaluation of their chatbots such as learning success, usability, and technology acceptance. This mismatch of implementation objectives and suitable evaluation approaches is also well known by other educational technologies such as Learning Analytics dashboards ( Jivet et al., 2017 ). A more structured approach of aligning implementation objectives and evaluation procedures is crucial to be able to properly assess the effectiveness of chatbots. ( Hobert, 2019a ), suggested a structured four-stage evaluation procedure beginning with a Wizard-of-Oz experiment, followed by technical validation, a laboratory study, and a field study. This evaluation procedure systematically links hypotheses with outcomes of chatbots helping to assess chatbots for their implementation objectives. “Aligning chatbot evaluations with implementation objectives” is, therefore, an important challenge to be addressed in the future research agenda.

Concerning RQ2 (pedagogical roles), our results show that chatbots’ pedagogical roles can be summarized as Learning , Assisting , and Mentoring . The Learning role is the support in learning or teaching activities such as gaining knowledge. The Assisting role is the support in terms of simplifying learners’ everyday life, e.g. by providing opening times of the library. The Mentoring role is the support in terms of students’ personal development, e.g. by supporting Self-Regulated Learning. From a pedagogical standpoint, all three roles are essential for learners and should therefore be incorporated in chatbots. These pedagogical roles are well aligned with the four implementation objectives reported in RQ1. While Skill Improvement and Students’ Motivation is strongly related to Learning , Efficiency of Education is strongly related to Assisting . The Mentoring role instead, is evenly related to all of the identified objectives for implementing chatbots. In the reviewed publications, chatbots are therefore primarily intended to 1 ) improve skills and motivate students by supporting learning and teaching activities, 2 ) make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) support multiple effects by mentoring students.

Concerning RQ3 (mentoring role), we identified three main mentoring method categories for chatbots: 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . However, comparing the current mentoring of chatbots reported in the literature with the daily mentoring role of teachers, we can summarize that the chatbots are not at the same level. In order to take over mentoring roles of teachers ( Wildman et al., 1992 ), a chatbot would need to fulfill some of the following activities in their mentoring role. With respect to 1 ) Scaffolding , chatbots should provide direct assistance while learning new skills and especially direct beginners in their activities. Regarding 2 ) Recommending , chatbots should provide supportive information, tools or other materials for specific learning tasks to life situations. With respect to 3 ) Informing, chatbots should encourage students according to their goals and achievements, and support them to develop meta-cognitive skills like self-regulation. Due to the mismatch of teacher vs. chatbot mentoring we see here another research challenge, which we call “Exploring the potential of chatbots for mentoring students.”

Regarding RQ4 (adaptation), only six publications were identified that discuss an adaptation of chatbots, while four out of five adaptation approaches (A1, A3, A4, and A5) show similarities by being applied within quizzes. In the context of educational technologies, providing reasonable adaptations for learners requires a high level of experience. Based on our results, the research on chatbots does not seem to be at this point yet. Looking at adaptation literature like ( Brusilovsky, 2001 ) or ( Benyon and Murray, 1993 ), it becomes clear that a chatbot needs to consider the learners’ personal information to fulfill the requirement of the adaptation definition. Personal information must be retrieved and stored at least temporarily, in some sort of learner model. For learner information like knowledge and interest, adaptations seem to be barely explored in the reviewed publications, while the model of ( Brusilovsky and Millán, 2007 ) points out further learner information, which can be used to make chatbots more adaptive: personal goals, personal tasks, personal background, individual traits, and the learner’s context. We identify research in this area as a third future challenge and call it the “Exploring and leveraging adaptation capabilities of chatbots” challenge.

In terms of RQ5 (domains), we identified a detailed map of domains applying chatbots in education and their distribution ( see Table 3 ). By systematically analyzing 74 publications, we identified 20 domains and structured them according to the identified pedagogical role into four domain categories: Learning Chatbots , Assisting Chatbots , Mentoring Chatbots , and Other Research . These results extend the taxonomy of Application Clusters (AC) for chatbots in education, which previously comprised the work from ( Pérez et al., 2020 ), who took the chatbot activity as characteristic, and ( Winkler and Soellner, 2018 ), who characterized the chatbots by domains. It draws relationships between these two types of Application Clusters (AC) and structures them accordingly. Our structure incorporates Mentoring Chatbots and Other Research in addition to the “service-oriented chatbots” (cf. Assisting Chatbots ) and “teaching-oriented chatbots” (cf. Learning Chatbots ) identified by (Perez). Furthermore, the strong tendencies of informing students already mentioned by ( Smutny and Schreiberova, 2020 ) can also be recognized in our results, especially in Assisting Chatbots . Compared to ( Winkler and Soellner, 2018 ), we can confirm the prominent domains of “language learning” within Learning Chatbots and “metacognitive thinking” within Mentoring Chatbots . Moreover, through Table 3 , a more detailed picture of chatbot applications in education is reflected, which could help researchers to find similar works or unexplored application areas.

Limitations

One important limitation to be mentioned here is the exclusion of alternative keywords for our search queries, as we exclusively used chatbot as keyword in order to avoid search results that do not fit our research questions. Though we acknowledge that chatbots share properties with pedagogical agents, dialog systems, and bots, we carefully considered this trade-off between missing potentially relevant work and inflating our search procedure by including related but not necessarily pertinent work. A second limitation may lie in the formation of categories and coding processes applied, which, due to the novelty of the findings, could not be built upon theoretical frameworks or already existing code books. Although we have focused on ensuring that codes used contribute to a strong understanding, the determination of the abstraction level might have affected the level of detail of the resulting data representation.

In this systematic literature review, we explored the current landscape of chatbots in education. We analyzed 74 publications, identified 20 domains of chatbots and grouped them based on their pedagogical roles into four domain categories. These pedagogical roles are the supporting learning role ( Learning ), the assisting role ( Assisting ), and the mentoring role ( Mentoring ). By focusing on objectives for implementing chatbots, we identified four main objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . As discussed in section 5, these objectives do not fully align with the chosen evaluation procedures. We focused on the relations between pedagogical roles and objectives for implementing chatbots and identified three main relations: 1 ) chatbots to improve skills and motivate students by supporting learning and teaching activities, 2 ) chatbots to make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) chatbots to support multiple effects by mentoring students. We focused on chatbots incorporating the Mentoring role and found that these chatbots are mostly concerned with three mentoring topics 1 ) Self-Regulated Learning , 2 ) Life Skills , and 3 ) Learning Skills and three mentoring methods 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . Regarding chatbot adaptations, only six publications with adaptations were identified. Furthermore, the adaptation approaches found were mostly limited to applications within quizzes and thus represent a research gap.

Based on these outcomes we consider three challenges for chatbots in education that offer future research opportunities:

Challenge 1: Aligning chatbot evaluations with implementation objectives . Most chatbot evaluations focus on narrow aspects to measure the tool’s usability, acceptance or technical correctness. If chatbots should be considered as learning aids, student mentors, or facilitators, the effects on the cognitive, and emotional levels should also be taken into account for the evaluation of chatbots. This finding strengthens our conclusion that chatbot development in education is still driven by technology, rather than having a clear pedagogical focus of improving and supporting learning.

Challenge 2: Exploring the potential of chatbots for mentoring students . In order to better understand the potentials of chatbots to mentor students, more empirical studies on the information needs of learners are required. It is obvious that these needs differ from schools to higher education. However, so far there are hardly any studies investigating the information needs with respect to chatbots nor if chatbots address these needs sufficiently.

Challenge 3: Exploring and leveraging adaptation capabilities of chatbots . There is a large literature on adaptation capabilities of educational technologies. However, we have seen very few studies on the effect of adaptation of chatbots for education purposes. As chatbots are foreseen as systems that should personally support learners, the area of adaptable interactions of chatbots is an important research aspect that should receive more attention in the near future.

By addressing these challenges, we believe that chatbots can become effective educational tools capable of supporting learners with informative feedback. Therefore, looking at our results and the challenges presented, we conclude, “No, we are not there yet!” - There is still much to be done in terms of research on chatbots in education. Still, development in this area seems to have just begun to gain momentum and we expect to see new insights in the coming years.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author Contributions

SW, JS†, DM†, JW†, MR, and HD.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbasi, S., Kazi, H., and Hussaini, N. N. (2019). Effect of Chatbot Systems on Student’s Learning Outcomes. Sylwan 163(10).

CrossRef Full Text

Abbasi, S., and Kazi, H. (2014). Measuring Effectiveness of Learning Chatbot Systems on Student's Learning Outcome and Memory Retention. Asian J. Appl. Sci. Eng. 3, 57. doi:10.15590/AJASE/2014/V3I7/53576

CrossRef Full Text | Google Scholar

Almahri, F. A. J., Bell, D., and Merhi, M. (2020). “Understanding Student Acceptance and Use of Chatbots in the United Kingdom Universities: A Structural Equation Modelling Approach,” in 2020 6th IEEE International Conference on Information Management, ICIM 2020 , London, United Kingdom , March 27–29, 2020 , (IEEE), 284–288. doi:10.1109/ICIM49319.2020.244712

Bao, M. (2019). Can Home Use of Speech-Enabled Artificial Intelligence Mitigate Foreign Language Anxiety - Investigation of a Concept. Awej 5, 28–40. doi:10.24093/awej/call5.3

Benyon, D., and Murray, D. (1993). Applying User Modeling to Human-Computer Interaction Design. Artif. Intell. Rev. 7 (3-4), 199–225. doi:10.1007/BF00849555

Bii, P. K., Too, J. K., and Mukwa, C. W. (2018). Teacher Attitude towards Use of Chatbots in Routine Teaching. Univers. J. Educ. Res. . 6 (7), 1586–1597. doi:10.13189/ujer.2018.060719

Bii, P., Too, J., and Langat, R. (2013). An Investigation of Student’s Attitude Towards the Use of Chatbot Technology in Instruction: The Case of Knowie in a Selected High School. Education Research 4, 710–716. doi:10.14303/er.2013.231

Google Scholar

Bos, A. S., Pizzato, M. C., Vettori, M., Donato, L. G., Soares, P. P., Fagundes, J. G., et al. (2020). Empirical Evidence During the Implementation of an Educational Chatbot with the Electroencephalogram Metric. Creative Education 11, 2337–2345. doi:10.4236/CE.2020.1111171

Brusilovsky, P. (2001). Adaptive Hypermedia. User Model. User-Adapted Interaction 11 (1), 87–110. doi:10.1023/a:1011143116306

Brusilovsky, P., and Millán, E. (2007). “User Models for Adaptive Hypermedia and Adaptive Educational Systems,” in The Adaptive Web: Methods and Strategies of Web Personalization . Editors P. Brusilovsky, A. Kobsa, and W. Nejdl. Berlin: Springer , 3–53. doi:10.1007/978-3-540-72079-9_1

Cabales, V. (2019). “Muse: Scaffolding metacognitive reflection in design-based research,” in CHI EA’19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–6. doi:10.1145/3290607.3308450

Carayannopoulos, S. (2018). Using Chatbots to Aid Transition. Int. J. Info. Learn. Tech. 35, 118–129. doi:10.1108/IJILT-10-2017-0097

Chan, C. H., Lee, H. L., Lo, W. K., and Lui, A. K.-F. (2018). Developing a Chatbot for College Student Programme Advisement. in 2018 International Symposium on Educational Technology, ISET 2018 , Osaka, Japan , July 31–August 2, 2018 . Editors F. L. Wang, C. Iwasaki, T. Konno, O. Au, and C. Li, (IEEE), 52–56. doi:10.1109/ISET.2018.00021

Chang, M.-Y., and Hwang, J.-P. (2019). “Developing Chatbot with Deep Learning Techniques for Negotiation Course,” in 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019 , Toyama, Japan , July 7–11, 2019 , (IEEE), 1047–1048. doi:10.1109/IIAI-AAI.2019.00220

Chen, C.-A., Yang, Y.-T., Wu, S.-M., Chen, H.-C., Chiu, K.-C., Wu, J.-W., et al. (2018). “A Study of Implementing AI Chatbot in Campus Consulting Service”, in TANET 2018-Taiwan Internet Seminar , 1714–1719. doi:10.6861/TANET.201810.0317

Chen, H.-L., Widarso, G. V., and Sutrisno, H. (2020). A ChatBot for Learning Chinese: Learning Achievement and Technology Acceptance. J. Educ. Comput. Res. 58 (6), 1161–1189. doi:10.1177/0735633120929622

Daud, S. H. M., Teo, N. H. I., and Zain, N. H. M. (2020). E-java Chatbot for Learning Programming Language: A post-pandemic Alternative Virtual Tutor. Int. J. Emerging Trends Eng. Res. 8(7). 3290–3298. doi:10.30534/ijeter/2020/67872020

Davies, J. N., Verovko, M., Verovko, O., and Solomakha, I. (2020). “Personalization of E-Learning Process Using Ai-Powered Chatbot Integration,” in Selected Papers of 15th International Scientific-practical Conference, MODS, 2020: Advances in Intelligent Systems and Computing , Chernihiv, Ukraine , June 29–July 01, 2020 . Editors S. Shkarlet, A. Morozov, and A. Palagin, ( Springer ) Vol. 1265, 209–216. doi:10.1007/978-3-030-58124-4_20

Diachenko, A. V., Morgunov, B. P., Melnyk, T. P., Kravchenko, O. I., and Zubchenko, L. V. (2019). The Use of Innovative Pedagogical Technologies for Automation of the Specialists' Professional Training. Int. J. Hydrogen. Energy. 8, 288–295. doi:10.5430/ijhe.v8n6p288

Dibitonto, M., Leszczynska, K., Tazzi, F., and Medaglia, C. M. (2018). “Chatbot in a Campus Environment: Design of Lisa, a Virtual Assistant to Help Students in Their university Life,” in 20th International Conference, HCI International 2018 , Las Vegas, NV, USA , July 15–20, 2018 , Lecture Notes in Computer Science. Editors M. Kurosu, (Springer), 103–116. doi:10.1007/978-3-319-91250-9

Durall, E., and Kapros, E. (2020). “Co-design for a Competency Self-Assessment Chatbot and Survey in Science Education,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, Berlin: Springer Vol. 12206, 13–23. doi:10.1007/978-3-030-50506-6_2

Duval, E., and Verbert, K. (2012). Learning Analytics. Eleed 8 (1).

Engel, J. D., Engel, V. J. L., and Mailoa, E. (2020). Interaction Monitoring Model of Logo Counseling Website for College Students' Healthy Self-Esteem, I. J. Eval. Res. Educ. 9, 607–613. doi:10.11591/ijere.v9i3.20525

Febriani, G. A., and Agustia, R. D. (2019). Development of Line Chatbot as a Learning Media for Mathematics National Exam Preparation. Elibrary.Unikom.Ac.Id . https://elibrary.unikom.ac.id/1130/14/UNIKOM_GISTY%20AMELIA%20FEBRIANI_JURNAL%20DALAM%20BAHASA%20INGGRIS.pdf .

Ferguson, R., and Sharples, M. (2014). “Innovative Pedagogy at Massive Scale: Teaching and Learning in MOOCs,” in 9th European Conference on Technology Enhanced Learning, EC-TEL 2014 , Graz, Austria , September 16–19, 2014 , Lecture Notes in Computer Science. Editors C. Rensing, S. de Freitas, T. Ley, and P. J. Muñoz-Merino, ( Berlin : Springer) Vol. 8719, 98–111. doi:10.1007/978-3-319-11200-8_8

Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., and Sherlock, Z. (2017). Stimulating and Sustaining Interest in a Language Course: An Experimental Comparison of Chatbot and Human Task Partners. Comput. Hum. Behav. 75, 461–468. doi:10.1016/j.chb.2017.05.045

Fryer, L. K., Nakao, K., and Thompson, A. (2019). Chatbot Learning Partners: Connecting Learning Experiences, Interest and Competence. Comput. Hum. Behav. 93, 279–289. doi:10.1016/j.chb.2018.12.023

Fryer, L. K., Thompson, A., Nakao, K., Howarth, M., and Gallacher, A. (2020). Supporting Self-Efficacy Beliefs and Interest as Educational Inputs and Outcomes: Framing AI and Human Partnered Task Experiences. Learn. Individual Differences , 80. doi:10.1016/j.lindif.2020.101850

Gabrielli, S., Rizzi, S., Carbone, S., and Donisi, V. (2020). A Chatbot-Based Coaching Intervention for Adolescents to Promote Life Skills: Pilot Study. JMIR Hum. Factors 7 (1). doi:10.2196/16762

PubMed Abstract | CrossRef Full Text | Google Scholar

Galko, L., Porubän, J., and Senko, J. (2018). “Improving the User Experience of Electronic University Enrollment,” in 16th IEEE International Conference on Emerging eLearning Technologies and Applications, ICETA 2018 , Stary Smokovec, Slovakia , Nov 15–16, 2018 . Editors F. Jakab, (Piscataway, NJ: IEEE ), 179–184. doi:10.1109/ICETA.2018.8572054

Goda, Y., Yamada, M., Matsukawa, H., Hata, K., and Yasunami, S. (2014). Conversation with a Chatbot before an Online EFL Group Discussion and the Effects on Critical Thinking. J. Inf. Syst. Edu. 13, 1–7. doi:10.12937/EJSISE.13.1

Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., and Harter, D. (2001). Intelligent Tutoring Systems with Conversational Dialogue. AI Mag. 22 (4), 39–51. doi:10.1609/aimag.v22i4.1591

Greller, W., and Drachsler, H. (2012). Translating Learning into Numbers: A Generic Framework for Learning Analytics. J. Educ. Tech. Soc. 15 (3), 42–57. doi:10.2307/jeductechsoci.15.3.42

Haristiani, N., and Rifa’i, M. M. Combining Chatbot and Social Media: Enhancing Personal Learning Environment (PLE) in Language Learning. Indonesian J Sci Tech. 5 (3), 487–506. doi:10.17509/ijost.v5i3.28687

Hattie, J., and Timperley, H. (2007). The Power of Feedback. Rev. Educ. Res. 77 (1), 81–112. doi:10.3102/003465430298487

Hattie, J. (2009). Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement . Abingdon, UK: Routledge .

Heller, B., Proctor, M., Mah, D., Jewell, L., and Cheung, B. (2005). “Freudbot: An Investigation of Chatbot Technology in Distance Education,” in Proceedings of ED-MEDIA 2005–World Conference on Educational Multimedia, Hypermedia and Telecommunications , Montréal, Canada , June 27–July 2, 2005 . Editors P. Kommers, and G. Richards, ( AACE ), 3913–3918.

Heo, J., and Lee, J. (2019). “CiSA: An Inclusive Chatbot Service for International Students and Academics,” in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 11786, 153–167. doi:10.1007/978-3-030-30033-3

Hobert, S. (2019a). “How Are You, Chatbot? Evaluating Chatbots in Educational Settings - Results of a Literature Review,” in 17. Fachtagung Bildungstechnologien, DELFI 2019 - 17th Conference on Education Technologies, DELFI 2019 , Berlin, Germany , Sept 16–19, 2019 . Editors N. Pinkwart, and J. Konert, 259–270. doi:10.18420/delfi2019_289

Hobert, S., and Meyer von Wolff, R. (2019). “Say Hello to Your New Automated Tutor - A Structured Literature Review on Pedagogical Conversational Agents,” in 14th International Conference on Wirtschaftsinformatik , Siegen, Germany , Feb 23–27, 2019 . Editors V. Pipek, and T. Ludwig, ( AIS ).

Hobert, S. (2019b). Say Hello to ‘Coding Tutor’! Design and Evaluation of a Chatbot-Based Learning System Supporting Students to Learn to Program in International Conference on Information Systems (ICIS) 2019 Conference , Munich, Germany , Dec 15–18, 2019 , AIS 2661, 1–17.

Hobert, S. (2020). Small Talk Conversations and the Long-Term Use of Chatbots in Educational Settings ‐ Experiences from a Field Study in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 : Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, ( Springer ) 11970, 260–272. doi:10.1007/978-3-030-39540-7_18

Hsieh, S.-W. (2011). Effects of Cognitive Styles on an MSN Virtual Learning Companion System as an Adjunct to Classroom Instructions. Edu. Tech. Society 2, 161–174.

Huang, J.-X., Kwon, O.-W., Lee, K.-S., and Kim, Y.-K. (2018). Improve the Chatbot Performance for the DB-CALL System Using a Hybrid Method and a Domain Corpus in Future-proof CALL: language learning as exploration and encounters–short papers from EUROCALL 2018 , Jyväskylä, Finland , Aug 22–25, 2018 . Editors P. Taalas, J. Jalkanen, L. Bradley, and S. Thouësny, ( Research-publishing.net ). doi:10.14705/rpnet.2018.26.820

Huang, W., Hew, K. F., and Gonda, D. E. (2019). Designing and Evaluating Three Chatbot-Enhanced Activities for a Flipped Graduate Course. Int. J. Mech. Engineer. Robotics. Research. 813–818. doi:10.18178/ijmerr.8.5.813-818

Ismail, M., and Ade-Ibijola, A. (2019). “Lecturer's Apprentice: A Chatbot for Assisting Novice Programmers,”in Proceedings - 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC) , Vanderbijlpark, South Africa , (IEEE), 1–8. doi:10.1109/IMITEC45504.2019.9015857

Jia, J. (2008). “Motivate the Learners to Practice English through Playing with Chatbot CSIEC,” in 3rd International Conference on Technologies for E-Learning and Digital Entertainment, Edutainment 2008 , Nanjing, China , June 25–27, 2008 , Lecture Notes in Computer Science, (Springer) 5093, 180–191. doi:10.1007/978-3-540-69736-7_20

Jia, J. (2004). “The Study of the Application of a Keywords-Based Chatbot System on the Teaching of Foreign Languages,” in Proceedings of SITE 2004--Society for Information Technology and Teacher Education International Conference , Atlanta, Georgia, USA . Editors R. Ferdig, C. Crawford, R. Carlsen, N. Davis, J. Price, R. Weber, and D. Willis, (AACE), 1201–1207.

Jivet, I., Scheffel, M., Drachsler, H., and Specht, M. (2017). “Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice,” in 12th European Conference on Technology Enhanced Learning, EC-TEL 2017 , Tallinn, Estonia , September 12–15, 2017 , Lecture Notes in ComputerScience. Editors E. Lavoué, H. Drachsler, K. Verbert, J. Broisin, and M. Pérez-Sanagustín, (Springer), 82–96. doi:10.1007/978-3-319-66610-5_7

Jung, H., Lee, J., and Park, C. (2020). Deriving Design Principles for Educational Chatbots from Empirical Studies on Human-Chatbot Interaction. J. Digit. Contents Society , 21, 487–493. doi:10.9728/dcs.2020.21.3.487

Kerly, A., and Bull, S. (2006). “The Potential for Chatbots in Negotiated Learner Modelling: A Wizard-Of-Oz Study,” in 8th International Conference on Intelligent Tutoring Systems, ITS 2006 , Jhongli, Taiwan , June 26–30, 2006 , Lecture Notes in Computer Science. Editors M. Ikeda, K. D. Ashley, and T. W. Chan, ( Springer ) 4053, 443–452. doi:10.1007/11774303

Kerly, A., Ellis, R., and Bull, S. (2008). CALMsystem: A Conversational Agent for Learner Modelling. Knowledge-Based Syst. 21, 238–246. doi:10.1016/j.knosys.2007.11.015

Kerly, A., Hall, P., and Bull, S. (2007). Bringing Chatbots into Education: Towards Natural Language Negotiation of Open Learner Models. Knowledge-Based Syst. , 20, 177–185. doi:10.1016/j.knosys.2006.11.014

Kumar, M. N., Chandar, P. C. L., Prasad, A. V., and Sumangali, K. (2016). “Android Based Educational Chatbot for Visually Impaired People,” in 2016 IEEE International Conference on Computational Intelligence and Computing Research , Chennai, India , December 15–17, 2016 , 1–4. doi:10.1109/ICCIC.2016.7919664

Lee, K., Jo, J., Kim, J., and Kang, Y. (2019). Can Chatbots Help Reduce the Workload of Administrative Officers? - Implementing and Deploying FAQ Chatbot Service in a University in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 1032, 348–354. doi:10.1007/978-3-030-23522-2

Lester, J. C., Converse, S. A., Kahler, S. E., Barlow, S. T., Stone, B. A., and Bhogal, R. S. (1997). “The Persona Effect: Affective Impact of Animated Pedagogical Agents,” in Proceedings of the ACM SIGCHI Conference on Human factors in computing systems , Atlanta, Georgia, USA , March 22–27, 1997 , (ACM), 359–366.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration. J. Clin. Epidemiol. 62 (10), e1–e34. doi:10.1016/j.jclinepi.2009.06.006

Lin, M. P.-C., and Chang, D. (2020). Enhancing Post-secondary Writers’ Writing Skills with a Chatbot. J. Educ. Tech. Soc. 23, 78–92. doi:10.2307/26915408

Lin, Y.-H., and Tsai, T. (2019). “A Conversational Assistant on Mobile Devices for Primitive Learners of Computer Programming,” in TALE 2019 - 2019 IEEE International Conference on Engineering, Technology and Education , Yogyakarta, Indonesia , December 10–13, 2019 , (IEEE), 1–4. doi:10.1109/TALE48000.2019.9226015

Linden, A., and Fenn, J. (2003). Understanding Gartner’s Hype Cycles. Strategic Analysis Report No. R-20-1971 8. Stamford, CT: Gartner, Inc .

Liu, Q., Huang, J., Wu, L., Zhu, K., and Ba, S. (2020). CBET: Design and Evaluation of a Domain-specific Chatbot for mobile Learning. Univ. Access Inf. Soc. , 19, 655–673. doi:10.1007/s10209-019-00666-x

Mamani, J. R. C., Álamo, Y. J. R., Aguirre, J. A. A., and Toledo, E. E. G. (2019). “Cognitive Services to Improve User Experience in Searching for Academic Information Based on Chatbot,” in Proceedings of the 2019 IEEE 26th International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Lima, Peru , August 12–14, 2019 , (IEEE), 1–4. doi:10.1109/INTERCON.2019.8853572

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., and Delgado López-Cózar, E. (2018). Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories. J. Informetrics 12 (4), 1160–1177. doi:10.1016/j.joi.2018.09.002

Matsuura, S., and Ishimura, R. (2017). Chatbot and Dialogue Demonstration with a Humanoid Robot in the Lecture Class, in 11th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2017, held as part of the 19th International Conference on Human-Computer Interaction, HCI 2017 , Vancouver, Canada , July 9–14, 2017 , Lecture Notes in Computer Science. Editors M. Antona, and C. Stephanidis, (Springer) Vol. 10279, 233–246. doi:10.1007/978-3-319-58700-4

Matsuura, S., and Omokawa, R. (2020). Being Aware of One’s Self in the Auto-Generated Chat with a Communication Robot in UAHCI 2020 , 477–488. doi:10.1007/978-3-030-49282-3

McLoughlin, C., and Oliver, R. (1998). Maximising the Language and Learning Link in Computer Learning Environments. Br. J. Educ. Tech. 29 (2), 125–136. doi:10.1111/1467-8535.00054

Mendoza, S., Hernández-León, M., Sánchez-Adame, L. M., Rodríguez, J., Decouchant, D., and Meneses-Viveros, A. (2020). “Supporting Student-Teacher Interaction through a Chatbot,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, ( Springer ) 12206, 93–107. doi:10.1007/978-3-030-50506-6

Meyer, V., Wolff, R., Nörtemann, J., Hobert, S., and Schumann, M. (2020). “Chatbots for the Information Acquisition at Universities ‐ A Student’s View on the Application Area,“in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 , Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, (Springer) 11970, 231–244. doi:10.1007/978-3-030-39540-7

Na-Young, K. (2018c). A Study on Chatbots for Developing Korean College Students’ English Listening and Reading Skills. J. Digital Convergence 16. 19–26. doi:10.14400/JDC.2018.16.8.019

Na-Young, K. (2019). A Study on the Use of Artificial Intelligence Chatbots for Improving English Grammar Skills. J. Digital Convergence 17, 37–46. doi:10.14400/JDC.2019.17.8.037

Na-Young, K. (2018a). Chatbots and Korean EFL Students’ English Vocabulary Learning. J. Digital Convergence 16. 1–7. doi:10.14400/JDC.2018.16.2.001

Na-Young, K. (2018b). Different Chat Modes of a Chatbot and EFL Students’ Writing Skills Development . 1225–4975. doi:10.16933/sfle.2017.32.1.263

Na-Young, K. (2017). Effects of Different Types of Chatbots on EFL Learners’ Speaking Competence and Learner Perception. Cross-Cultural Studies 48, 223–252. doi:10.21049/ccs.2017.48.223

Nagata, R., Hashiguchi, T., and Sadoun, D. (2020). Is the Simplest Chatbot Effective in English Writing Learning Assistance?, in 16th International Conference of the Pacific Association for Computational Linguistics , PACLING, Hanoi, Vietnam , October 11–13, 2019 , Communications in Computer and Information Science. Editors L.-M. Nguyen, S. Tojo, X.-H. Phan, and K. Hasida, ( Springer ) Vol. 1215, 245–246. doi:10.1007/978-981-15-6168-9

Nelson, T. O., and Narens, L. (1994). Why Investigate Metacognition. in Metakognition: Knowing About Knowing . Editors J. Metcalfe, and P. Shimamura, (MIT Press) 13, 1–25.

Nghi, T. T., Phuc, T. H., and Thang, N. T. (2019). Applying Ai Chatbot for Teaching a Foreign Language: An Empirical Research. Int. J. Sci. Res. 8.

Ondas, S., Pleva, M., and Hládek, D. (2019). How Chatbots Can Be Involved in the Education Process. in ICETA 2019 - 17th IEEE International Conference on Emerging eLearning Technologies and Applications, Proceedings, Stary Smokovec , Slovakia , November 21–22, 2019 . Editors F. Jakab, (IEEE), 575–580. doi:10.1109/ICETA48886.2019.9040095

Pereira, J., Fernández-Raga, M., Osuna-Acedo, S., Roura-Redondo, M., Almazán-López, O., and Buldón-Olalla, A. (2019). Promoting Learners' Voice Productions Using Chatbots as a Tool for Improving the Learning Process in a MOOC. Tech. Know Learn. 24, 545–565. doi:10.1007/s10758-019-09414-9

Pérez, J. Q., Daradoumis, T., and Puig, J. M. M. (2020). Rediscovering the Use of Chatbots in Education: A Systematic Literature Review. Comput. Appl. Eng. Educ. 28, 1549–1565. doi:10.1002/cae.22326

Pérez-Marín, D. (2021). A Review of the Practical Applications of Pedagogic Conversational Agents to Be Used in School and University Classrooms. Digital 1 (1), 18–33. doi:10.3390/digital1010002

Pham, X. L., Pham, T., Nguyen, Q. M., Nguyen, T. H., and Cao, T. T. H. (2018). “Chatbot as an Intelligent Personal Assistant for mobile Language Learning,” in ACM International Conference Proceeding Series doi:10.1145/3291078.3291115

Quincey, E. de., Briggs, C., Kyriacou, T., and Waller, R. (2019). “Student Centred Design of a Learning Analytics System,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge , Tempe Arizona, USA , March 4–8, 2019 , (ACM), 353–362. doi:10.1145/3303772.3303793

Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q, et al. (2018). Conversational Ai: The Science behind the Alexa Prize, in 1st Proceedings of Alexa Prize (Alexa Prize 2017) . ArXiv [Preprint]. Available at: https://arxiv.org/abs/1801.03604 .

Rebaque-Rivas, P., and Gil-Rodríguez, E. (2019). Adopting an Omnichannel Approach to Improve User Experience in Online Enrolment at an E-Learning University, in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ), 115–122. doi:10.1007/978-3-030-23525-3

Robinson, C. (2019). Impressions of Viability: How Current Enrollment Management Personnel And Former Students Perceive The Implementation of A Chatbot Focused On Student Financial Communication. Higher Education Doctoral Projects.2 . https://aquila.usm.edu/highereddoctoralprojects/2 .

Ruan, S., Jiang, L., Xu, J., Tham, B. J.-K., Qiu, Z., Zhu, Y., Murnane, E. L., Brunskill, E., and Landay, J. A. (2019). “QuizBot: A Dialogue-based Adaptive Learning System for Factual Knowledge,” in 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019 , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–13. doi:10.1145/3290605.3300587

Sandoval, Z. V. (2018). Design and Implementation of a Chatbot in Online Higher Education Settings. Issues Inf. Syst. 19, 44–52. doi:10.48009/4.iis.2018.44-52

Sandu, N., and Gide, E. (2019). “Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in india,” in 18th International Conference on Information Technology Based Higher Education and Training , Magdeburg, Germany , September 26–27, 2019 , (IEEE), 1–5. doi:10.1109/ITHET46829.2019.8937382

Saygin, A. P., Cicekli, I., and Akman, V. (2000). Turing Test: 50 Years Later. Minds and Machines 10 (4), 463–518. doi:10.1023/A:1011288000451

Sinclair, A., McCurdy, K., Lucas, C. G., Lopez, A., and Gaševic, D. (2019). “Tutorbot Corpus: Evidence of Human-Agent Verbal Alignment in Second Language Learner Dialogues,” in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining .

Smutny, P., and Schreiberova, P. (2020). Chatbots for Learning: A Review of Educational Chatbots for the Facebook Messenger. Comput. Edu. 151, 103862. doi:10.1016/j.compedu.2020.103862

Song, D., Rice, M., and Oh, E. Y. (2019). Participation in Online Courses and Interaction with a Virtual Agent. Int. Rev. Res. Open. Dis. 20, 44–62. doi:10.19173/irrodl.v20i1.3998

Stapić, Z., Horvat, A., and Vukovac, D. P. (2020). Designing a Faculty Chatbot through User-Centered Design Approach, in 22nd International Conference on Human-Computer Interaction,HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors C. Stephanidis, D. Harris, W. C. Li, D. D. Schmorrow, C. M. Fidopiastis, and P. Zaphiris, ( Springer ), 472–484. doi:10.1007/978-3-030-60128-7

Subramaniam, N. K. (2019). Teaching and Learning via Chatbots with Immersive and Machine Learning Capabilities. In International Conference on Education (ICE 2019) Proceedings , Kuala Lumpur, Malaysia , April 10–11, 2019 . Editors S. A. H. Ali, T. T. Subramaniam, and S. M. Yusof, 145–156.

Sugondo, A. F., and Bahana, R. (2019). “Chatbot as an Alternative Means to Access Online Information Systems,” in 3rd International Conference on Eco Engineering Development, ICEED 2019 , Surakarta, Indonesia , November 13–14, 2019 , IOP Conference Series: Earth and Environmental Science, (IOP Publishing) 426. doi:10.1088/1755-1315/426/1/012168

Suwannatee, S., and Suwanyangyuen, A. (2019). “Reading Chatbot” Mahidol University Library and Knowledge Center Smart Assistant,” in Proceedings for the 2019 International Conference on Library and Information Science (ICLIS) , Taipei, Taiwan , July 11–13, 2019 .

Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, M. S., and Torous, J. B. (2019). Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can. J. Psychiatry 64 (7), 456–464. doi:10.1177/0706743719828977

Vijayakumar, B., Höhn, S., and Schommer, C. (2019). “Quizbot: Exploring Formative Feedback with Conversational Interfaces,” in 21st International Conference on Technology Enhanced Assessment, TEA 2018 , Amsterdam, Netherlands , Dec 10-11, 2018 . Editors S. Draaijer, B. D. Joosten-ten, and E. Ras, ( Springer ), 102–120. doi:10.1007/978-3-030-25264-9

Virtanen, M. A., Haavisto, E., Liikanen, E., and Kääriäinen, M. (2018). Ubiquitous Learning Environments in Higher Education: A Scoping Literature Review. Educ. Inf. Technol. 23 (2), 985–998. doi:10.1007/s10639-017-9646-6

Wildman, T. M., Magliaro, S. G., Niles, R. A., and Niles, J. A. (1992). Teacher Mentoring: An Analysis of Roles, Activities, and Conditions. J. Teach. Edu. 43 (3), 205–213. doi:10.1177/0022487192043003007

Wiley, D., and Edwards, E. K. (2002). Online Self-Organizing Social Systems: The Decentralized Future of Online Learning. Q. Rev. Distance Edu. 3 (1), 33–46.

Winkler, R., and Soellner, M. (2018). Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. in Academy of Management Annual Meeting Proceedings 2018 2018 (1), 15903. doi:10.5465/AMBPP.2018.15903abstract

Winne, P. H., and Hadwin, A. F. (2008). “The Weave of Motivation and Self-Regulated Learning,” in Motivation and Self-Regulated Learning: Theory, Research, and Applications . Editors D. H. Schunk, and B. J. Zimmerman, (Mahwah, NJ: Lawrence Erlbaum Associates Publishers ), 297–314.

Wisniewski, B., Zierer, K., and Hattie, J. (2019). The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research. Front. Psychol. 10, 1664–1078. doi:10.3389/fpsyg.2019.03087

Wolfbauer, I., Pammer-Schindler, V., and Rose, C. P. (2020). “Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot,” in Proceedings of the Impact Papers at EC-TEL 2020, co-located with the 15th European Conference on Technology-Enhanced Learning “Addressing global challenges and quality education” (EC-TEL 2020) , Virtual , Sept 14–18, 2020 . Editors T. Broos, and T. Farrell, 1–14.

Xiao, Z., Zhou, M. X., and Fu, W.-T. (2019). “Who should be my teammates: Using a conversational agent to understand individuals and help teaming,” in IUI’19: Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray , California, USA , March 17–20, 2019 , (ACM), 437–447. doi:10.1145/3301275.3302264

Xu, A., Liu, Z., Guo, Y., Sinha, V., and Akkiraju, R. (2017). “A New Chatbot for Customer Service on Social media,” in Proceedings of the 2017 CHI conference on human factors in computing systems , Denver, Colorado, USA , May 6–11, 2017 , ACM, 3506–3510. doi:10.1145/3025453.3025496

Yin, J., Goh, T.-T., Yang, B., and Xiaobin, Y. (2020). Conversation Technology with Micro-learning: The Impact of Chatbot-Based Learning on Students' Learning Motivation and Performance. J. Educ. Comput. Res. 59, 154–177. doi:10.1177/0735633120952067

Appendix a aconcept map of chatbots in education

Keywords: chatbots, education, literature review, pedagogical roles, domains

Citation: Wollny S, Schneider J, Di Mitri D, Weidlich J, Rittberger M and Drachsler H (2021) Are We There Yet? - A Systematic Literature Review on Chatbots in Education. Front. Artif. Intell. 4:654924. doi: 10.3389/frai.2021.654924

Received: 17 January 2021; Accepted: 10 June 2021; Published: 15 July 2021.

Reviewed by:

Copyright © 2021 Wollny, Schneider, Di Mitri, Weidlich, Rittberger and Drachsler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Sebastian Wollny, [email protected] ; Jan Schneider, [email protected]

This article is part of the Research Topic

Intelligent Conversational Agents

  • Review article
  • Open access
  • Published: 31 October 2023

Role of AI chatbots in education: systematic literature review

  • Lasha Labadze   ORCID: orcid.org/0000-0002-8884-2792 1 ,
  • Maya Grigolia   ORCID: orcid.org/0000-0001-9043-7932 2 &
  • Lela Machaidze   ORCID: orcid.org/0000-0001-5958-5662 3  

International Journal of Educational Technology in Higher Education volume  20 , Article number:  56 ( 2023 ) Cite this article

45k Accesses

22 Citations

48 Altmetric

Metrics details

A Correction to this article was published on 15 April 2024

This article has been updated

AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering explanations, and providing additional resources. Chatbots can also act as virtual teaching assistants, supporting educators through various means. In this paper, we try to understand the full benefits of AI chatbots in education, their opportunities, challenges, potential limitations, concerns, and prospects of using AI chatbots in educational settings. We conducted an extensive search across various academic databases, and after applying specific predefined criteria, we selected a final set of 67 relevant studies for review. The research findings emphasize the numerous benefits of integrating AI chatbots in education, as seen from both students' and educators' perspectives. We found that students primarily gain from AI-powered chatbots in three key areas: homework and study assistance, a personalized learning experience, and the development of various skills. For educators, the main advantages are the time-saving assistance and improved pedagogy. However, our research also emphasizes significant challenges and critical factors that educators need to handle diligently. These include concerns related to AI applications such as reliability, accuracy, and ethical considerations.

Introduction

The traditional education system faces several issues, including overcrowded classrooms, a lack of personalized attention for students, varying learning paces and styles, and the struggle to keep up with the fast-paced evolution of technology and information. As the educational landscape continues to evolve, the rise of AI-powered chatbots emerges as a promising solution to effectively address some of these issues. Some educational institutions are increasingly turning to AI-powered chatbots, recognizing their relevance, while others are more cautious and do not rush to adopt them in modern educational settings. Consequently, a substantial body of academic literature is dedicated to investigating the role of AI chatbots in education, their potential benefits, and threats.

AI-powered chatbots are designed to mimic human conversation using text or voice interaction, providing information in a conversational manner. Chatbots’ history dates back to the 1960s and over the decades chatbots have evolved significantly, driven by advancements in technology and the growing demand for automated communication systems. Created by Joseph Weizenbaum at MIT in 1966, ELIZA was one of the earliest chatbot programs (Weizenbaum, 1966 ). ELIZA could mimic human-like responses by reflecting user inputs as questions. Another early example of a chatbot was PARRY, implemented in 1972 by psychiatrist Kenneth Colby at Stanford University (Colby, 1981 ). PARRY was a chatbot designed to simulate a paranoid patient with schizophrenia. It engaged in text-based conversations and demonstrated the ability to exhibit delusional behavior, offering insights into natural language processing and AI. Developed by Richard Wallace in 1995, ALICE (Artificial Linguistic Internet Computer Entity) was an early example of a chatbot using natural language processing techniques that won the Loebner Prize Turing Test in 2000–2001 (Wallace, 1995 ), which challenged chatbots to convincingly simulate human-like conversation. Later in 2001 ActiveBuddy, Inc. developed the chatbot SmarterChild that operated on instant messaging platforms such as AOL Instant Messenger and MSN Messenger (Hoffer et al., 2001 ). SmarterChild was a chatbot that could carry on conversations with users about a variety of topics. It was also able to learn from its interactions with users, which made it more and more sophisticated over time. In 2011 Apple introduced Siri as a voice-activated personal assistant for its iPhone (Aron, 2011 ). Although not strictly a chatbot, Siri showcased the potential of conversational AI by understanding and responding to voice commands, performing tasks, and providing information. In the same year, IBM's Watson gained fame by defeating human champions in the quiz show Jeopardy (Lally & Fodor, 2011 ). It demonstrated the power of natural language processing and machine learning algorithms in understanding complex questions and providing accurate answers. More recently, in 2016, Facebook opened its Messenger platform for chatbot development, allowing businesses to create AI-powered conversational agents to interact with users. This led to an explosion of chatbots on the platform, enabling tasks like customer support, news delivery, and e-commerce (Holotescu, 2016 ). Google Duplex, introduced in May 2018, was able to make phone calls and carry out conversations on behalf of users. It showcased the potential of chatbots to handle complex, real-time interactions in a human-like manner (Dinh & Thai, 2018 ; Kietzmann et al., 2018 ).

More recently, more sophisticated and capable chatbots amazed the world with their abilities. Among them, ChatGPT and Google Bard are among the most profound AI-powered chatbots. ChatGPT is an artificial intelligence chatbot developed by OpenAI. It was first announced in November 2022 and is available to the general public. ChatGPT’s rival Google Bard chatbot, developed by Google AI, was first announced in May 2023. Both Google Bard and ChatGPT are sizable language model chatbots that undergo training on extensive datasets of text and code. They possess the ability to generate text, create diverse creative content, and provide informative answers to questions, although their accuracy may not always be perfect. The key difference is that Google Bard is trained on a dataset that includes text from the internet, while ChatGPT is trained on a dataset that includes text from books and articles. This means that Google Bard is more likely to be up-to-date on current events, while ChatGPT is more likely to be accurate in its responses to factual questions (AlZubi et al., 2022 ; Rahaman et al., 2023 ; Rudolph et al., 2023 ).

Chatbots are now used across various sectors, including education. Most of the latest intelligent AI chatbots are web-based platforms that adapt to the behaviors of both instructors and learners, enhancing the educational experience (Chassignol et al., 2018 ; Devedzic, 2004 ; Kahraman et al., 2010 ; Peredo et al., 2011 ). AI chatbots have been applied in both instruction and learning within the education sector. Chatbots specialize in personalized tutoring, homework help, concept learning, standardized test preparation, discussion and collaboration, and mental health support. Some of the most popular AI-based tools /chatbots used in education are:

Bard, introduced in 2022, is a large language model chatbot created by Google AI. Its capabilities include generating text, language translation, producing various types of creative content, and providing informative responses to questions. (Rudolph et al., 2023 ). Bard is still under development, but it has the potential to be a valuable tool for education.

ChatGPT, launched in 2022 by OpenAI, is a large language model chatbot that can generate text, produce diverse creative content, and deliver informative answers to questions (Dergaa et al., 2023 ; Khademi, 2023 ; Rudolph et al., 2023 ). However, as discussed in the results section of this paper, there are numerous concerns related to the use of ChatGPT in education, such as accuracy, reliability, ethical issues, etc.

Ada, launched in 2017, is a chatbot that is used to provide personalized tutoring to students. It can answer questions, provide feedback, and facilitate individualized learning for students (Kabiljo et al., 2020 ; Konecki et al., 2023 ). However, the Ada chatbot has limitations in understanding complex queries. It could misinterpret context and provide inaccurate responses

Replika, launched in 2017, is an AI chatbot platform that is designed to be a friend and companion for students. It can listen to students' problems, offer advice, and help them feel less alone (Pentina et al., 2023 ; Xie & Pentina, 2022 ). However, given the personal nature of conversations with Replika, there are valid concerns regarding data privacy and security.

Socratic, launched in 2013, had the goal of creating a community that made learning accessible to all students. Currently, Socratic is an AI-powered educational platform that was acquired by Google in 2018. While not a chatbot per se, it has a chatbot-like interface and functionality designed to assist students in learning new concepts (Alsanousi et al., 2023 ; Moppel, 2018 ; St-Hilaire et al., 2022 ). Like with other chatbots, a concern arises where students might excessively rely on Socratic for learning. This could lead to a diminished emphasis on critical thinking, as students may opt to use the platform to obtain answers without gaining a genuine understanding of the underlying concepts.

Habitica, launched in 2013, is used to help students develop good study habits. It gamifies the learning process, making it more fun and engaging for students. Students can use Habitica to manage their academic tasks, assignments, and study schedules. By turning their to-do list into a game-like experience, students are motivated to complete their tasks and build productive habits (Sales & Antunes, 2021 ; Zhang, 2023 ). However, the gamified nature of Habitica could inadvertently introduce distractions, especially for students who are easily drawn into the gaming aspect rather than focusing on their actual academic responsibilities.

Piazza launched in 2009, is used to facilitate discussion and collaboration in educational settings, particularly in classrooms and academic institutions. It provides a space for students and instructors to engage in discussions, ask questions, and share information related to course content and assignments (Ruthotto et al., 2020 ; Wang et al., 2020 ). Because discussions on Piazza are user-generated, the quality and accuracy of responses can vary. This variability may result in situations where students do not receive accurate and helpful information.

We will likely see even more widespread adoption of chatbots in education in the years to come as technology advances further. Chatbots have enormous potential to improve teaching and learning. A large body of literature is devoted to exploring the role, challenges, and opportunities of chatbots in education. This paper gathers and synthesizes this vast amount of literature, providing a comprehensive understanding of the current research status concerning the influence of chatbots in education. By conducting a systematic review, we seek to identify common themes, trends, and patterns in the impact of chatbots on education and provide a holistic view of the research, enabling researchers, policymakers, and educators to make evidence-based decisions. One of the main objectives of this paper is to identify existing research gaps in the literature to pinpoint areas where further investigation is needed, enabling researchers to contribute to the knowledge base and guide future research efforts. Firstly, we aim to understand the primary advantages of incorporating AI chatbots in education, focusing on the perspectives of students. Secondly, we seek to explore the key advantages of integrating AI chatbots from the standpoint of educators. Lastly, we endeavor to comprehensively analyze the major concerns expressed by scholars regarding the integration of AI chatbots in educational settings. Corresponding research questions are formulated in the section below. Addressing these research questions, we aim to contribute valuable insights that shed light on the potential benefits and challenges associated with the utilization of AI chatbots in the field of education.

The paper follows a structured outline comprising several sections. Initially, we provide a summary of existing literature reviews. Subsequently, we delve into the methodology, encompassing aspects such as research questions, the search process, inclusion and exclusion criteria, as well as the data extraction strategy. Moving on, we present a comprehensive analysis of the results in the subsequent section. Finally, we conclude by addressing the limitations encountered during the study and offering insights into potential future research directions.

Summary of existing literature reviews

Drawing from extensive systematic literature reviews, as summarized in Table 1 , AI chatbots possess the potential to profoundly influence diverse aspects of education. They contribute to advancements in both teaching and learning processes. However, it is essential to address concerns regarding the irrational use of technology and the challenges that education systems encounter while striving to harness its capacity and make the best use of it.

It is evident that chatbot technology has a significant impact on overall learning outcomes. Specifically, chatbots have demonstrated significant enhancements in learning achievement, explicit reasoning, and knowledge retention. The integration of chatbots in education offers benefits such as immediate assistance, quick access to information, enhanced learning outcomes, and improved educational experiences. However, there have been contradictory findings related to critical thinking, learning engagement, and motivation. Deng and Yu ( 2023 ) found that chatbots had a significant and positive influence on numerous learning-related aspects but they do not significantly improve motivation among students. Contrary, Okonkwo and Ade-Ibijola (Okonkwo & Ade-Ibijola, 2021 ), as well as (Wollny et al., 2021 ) find that using chatbots increases students’ motivation.

In terms of application, chatbots are primarily used in education to teach various subjects, including but not limited to mathematics, computer science, foreign languages, and engineering. While many chatbots follow predetermined conversational paths, some employ personalized learning approaches tailored to individual student needs, incorporating experiential and collaborative learning principles. Challenges in chatbot development include insufficient training datasets, a lack of emphasis on usability heuristics, ethical concerns, evaluation methods, user attitudes, programming complexities, and data integration issues.

Although existing systematic reviews have provided valuable insights into the impact of chatbot technology in education, it's essential to acknowledge that the field of chatbot development is continually emerging and requires timely, and updated analysis to ensure that the information and assessments reflect the most recent advancements, trends, or developments in chatbot technology. The latest chatbot models have showcased remarkable capabilities in natural language processing and generation. Additional research is required to investigate the role and potential of these newer chatbots in the field of education. Therefore, our paper focuses on reviewing and discussing the findings of these new-generation chatbots' use in education, including their benefits and challenges from the perspectives of both educators and students.

There are a few aspects that appear to be missing from the existing literature reviews: (a) The existing findings focus on the immediate impact of chatbot usage on learning outcomes. Further research may delve into the enduring impacts of integrating chatbots in education, aiming to assess their sustainability and the persistence of the observed advantages over the long term. (b) The studies primarily discuss the impact of chatbots on learning outcomes as a whole, without delving into the potential variations based on student characteristics. Investigating how different student groups, such as age, prior knowledge, and learning styles, interact with chatbot technology could provide valuable insights. (c) Although the studies highlight the enhancements in certain learning components, further investigation could explore the specific pedagogical strategies employed by chatbots to achieve these outcomes. Understanding the underlying mechanisms and instructional approaches utilized by chatbots can guide the development of more effective and targeted educational interventions. (d) While some studies touch upon user attitudes and acceptance, further research can delve deeper into the user experience of interacting with chatbots in educational settings. This includes exploring factors such as usability, perceived usefulness, satisfaction, and preferences of students and teachers when using chatbot technology.

Addressing these gaps in the existing literature would significantly benefit the field of education. Firstly, further research on the impacts of integrating chatbots can shed light on their long-term sustainability and how their advantages persist over time. This knowledge is crucial for educators and policymakers to make informed decisions about the continued integration of chatbots into educational systems. Secondly, understanding how different student characteristics interact with chatbot technology can help tailor educational interventions to individual needs, potentially optimizing the learning experience. Thirdly, exploring the specific pedagogical strategies employed by chatbots to enhance learning components can inform the development of more effective educational tools and methods. Lastly, a deeper exploration of the user experience with chatbots, encompassing usability, satisfaction, and preferences, can provide valuable insights into enhancing user engagement and overall satisfaction, thus guiding the future design and implementation of chatbot technology in education.

Methodology

A systematic review follows a rigorous methodology, including predefined search criteria and systematic screening processes, to ensure the inclusion of relevant studies. This comprehensive approach ensures that a wide range of research is considered, minimizing the risk of bias and providing a comprehensive overview of the impact of AI in education. Firstly, we define the research questions and corresponding search strategies and then we filter the search results based on predefined inclusion and exclusion criteria. Secondly, we study selected articles and synthesize results and lastly, we report and discuss the findings. To improve the clarity of the discussion section, we employed Large Language Model (LLM) for stylistic suggestions.

Research questions

Considering the limitations observed in previous literature reviews, we have developed three research questions for further investigation:

What are the key advantages of incorporating AI chatbots in education from the viewpoint of students?

What are the key advantages of integrating AI chatbots in education from the viewpoint of educators?

What are the main concerns raised by scholars regarding the integration of AI chatbots in education?

Exploring the literature that focuses on these research questions, with specific attention to contemporary AI-powered chatbots, can provide a deeper understanding of the impact, effectiveness, and potential limitations of chatbot technology in education while guiding its future development and implementation. This paper will help to better understand how educational chatbots can be effectively utilized to enhance education and address the specific needs and challenges of students and educators.

Search process

The search for the relevant literature was conducted in the following databases: ACM Digital Library, Scopus, IEEE Xplore, and Google Scholar. The search string was created using Boolean operators, and it was structured as follows: (“Education” or “Learning” or “Teaching”) and (“Chatbot” or “Artificial intelligence” or “AI” or “ChatGPT”). Initially, the search yielded a total of 563 papers from all four databases. Search filters were applied based on predefined inclusion and exclusion criteria, followed by a rigorous data extraction strategy as explained below.

Inclusion and exclusion criteria

In our review process, we carefully adhered to the inclusion and exclusion criteria specified in Table 2 . Criteria were determined to ensure the studies chosen are relevant to the research question (content, timeline) and maintain a certain level of quality (literature type) and consistency (language, subject area).

Data extraction strategy

All three authors collaborated to select the articles, ensuring consistency and reliability. Each article was reviewed by at least two co-authors. The article selection process involved the following stages: Initially, the authors reviewed the studies' metadata, titles, abstracts, keywords and eliminated articles that were not relevant to research questions. This reduced the number of studies to 139. Next, the authors evaluated the quality of the studies by assessing research methodology, sample size, research design, and clarity of objectives, further refining the selection to 85 articles. Finally, the authors thoroughly read the entire content of the articles. Studies offering limited empirical evidence related to our research questions were excluded. This final step reduced the number of papers to 67. Figure  1 presents the article selection process.

figure 1

Flow diagram of selecting studies

In this section, we present the results of the reviewed articles, focusing on our research questions, particularly with regard to ChatGPT. ChatGPT, as one of the latest AI-powered chatbots, has gained significant attention for its potential applications in education. Within just eight months of its launch in 2022, it has already amassed over 100 million users, setting new records for user and traffic growth. ChatGPT stands out among AI-powered chatbots used in education due to its advanced natural language processing capabilities and sophisticated language generation, enabling more natural and human-like conversations. It excels at capturing and retaining contextual information throughout interactions, leading to more coherent and contextually relevant conversations. Unlike some educational chatbots that follow predetermined paths or rely on predefined scripts, ChatGPT is capable of engaging in open-ended dialogue and adapting to various user inputs. Its adaptability allows it to write articles, stories, and poems, provide summaries, accommodate different perspectives, and even write and debug computer code, making it a valuable tool in educational settings (Baidoo-Anu & Owusu Ansah, 2023 ; Tate et al., 2023 ; Williams, 2023 ).

Advantages for students

Research question 1. what are the key advantages of incorporating ai chatbots in education from the viewpoint of students.

The integration of chatbots and virtual assistants into educational settings has the potential to transform support services, improve accessibility, and contribute to more efficient and effective learning environments (Chen et al., 2023 ; Essel et al., 2022 ). AI tools have the potential to improve student success and engagement, particularly among those from disadvantaged backgrounds (Sullivan et al., 2023 ). However, the existing literature highlights an important gap in the discussion from a student’s standpoint. A few existing research studies addressing the student’s perspective of using ChatGPT in the learning process indicate that students have a positive view of ChatGPT, appreciate its capabilities, and find it helpful for their studies and work (Kasneci et al., 2023 ; Shoufan, 2023 ). Students acknowledge that ChatGPT's answers are not always accurate and emphasize the need for solid background knowledge to utilize it effectively, recognizing that it cannot replace human intelligence (Shoufan, 2023 ). Common most important benefits identified by scholars are:

Homework and Study Assistance. AI-powered chatbots can provide detailed feedback on student assignments, highlighting areas of improvement and offering suggestions for further learning (Celik et al., 2022 ). For example, ChatGPT can act as a helpful study companion, providing explanations and clarifications on various subjects. It can assist with homework questions, offering step-by-step solutions and guiding students through complex problems (Crawford et al., 2023 ; Fauzi et al., 2023 ; Lo, 2023 ; Qadir, 2023 ; Shidiq, 2023 ). According to Sedaghat ( 2023 ) experiment, ChatGPT performed similarly to third-year medical students on medical exams, and could write quite impressive essays. Students can also use ChatGPT to quiz themselves on various subjects, reinforcing their knowledge and preparing for exams (Choi et al., 2023 ; Eysenbach, 2023 ; Sevgi et al., 2023 ; Thurzo et al., 2023 ).

Flexible personalized learning. AI-powered chatbots in general are now able to provide individualized guidance and feedback to students, helping them navigate through challenging concepts and improve their understanding. These systems can adapt their teaching strategies to suit each student's unique needs (Fariani et al., 2023 ; Kikalishvili, 2023 ; Schiff, 2021 ). Students can access ChatGPT anytime, making it convenient. According to Kasneci et al. ( 2023 ), ChatGPT's interactive and conversational nature can enhance students' engagement and motivation, making learning more enjoyable and personalized. (Khan et al., 2023 ) examine the impact of ChatGPT on medical education and clinical management, highlighting its ability to offer students tailored learning opportunities.

Skills development. It can aid in the enhancement of writing skills (by offering suggestions for syntactic and grammatical corrections) (Kaharuddin, 2021 ), foster problem-solving abilities (by providing step-by-step solutions) (Benvenuti et al., 2023 ), and facilitate group discussions and debates (by furnishing discussion structures and providing real-time feedback) (Ruthotto et al., 2020 ; Wang et al., 2020 ).

It's important to note that some papers raise concerns about excessive reliance on AI-generated information, potentially leading to a negative impact on student’s critical thinking and problem-solving skills (Kasneci et al., 2023 ). For instance, if students consistently receive solutions or information effortlessly through AI assistance, they might not engage deeply in understanding the topic.

Advantages for educators

Research question 2. what are the key advantages of integrating ai chatbots in education from the viewpoint of educators.

With the current capabilities of AI and its future potential, AI-powered chatbots, like ChatGPT, can have a significant impact on existing instructional practices. Major benefits from educators’ viewpoint identified in the literature are:

Time-Saving Assistance. AI chatbot administrative support capabilities can help educators save time on routine tasks, including scheduling, grading, and providing information to students, allowing them to allocate more time for instructional planning and student engagement. For example, ChatGPT can successfully generate various types of questions and answer keys in different disciplines. However, educators should exercise critical evaluation and customization to suit their unique teaching contexts. The expertise, experience, and comprehension of the teacher are essential in making informed pedagogical choices, as AI is not yet capable of replacing the role of a science teacher (Cooper, 2023 ).

Improved pedagogy. Educators can leverage AI chatbots to augment their instruction and provide personalized support. According to Herft ( 2023 ), there are various ways in which teachers can utilize ChatGPT to enhance their pedagogical approaches and assessment methods. For instance, Educators can leverage the capabilities of ChatGPT to generate open-ended question prompts that align precisely with the targeted learning objectives and success criteria of the instructional unit. By doing so, teachers can tailor educational content to cater to the distinct needs, interests, and learning preferences of each student, offering personalized learning materials and activities (Al Ka’bi,  2023 ; Fariani et al., 2023 ).

Concerns raised by scholars

Research question 3. what are the main concerns raised by scholars regarding the integration of ai chatbots in education.

Scholars' opinions on using AI in this regard are varied and diverse. Some see AI chatbots as the future of teaching and learning, while others perceive them as a potential threat. The main arguments of skeptical scholars are threefold:

Reliability and Accuracy. AI chatbots may provide biased responses or non-accurate information (Kasneci et al., 2023 ; Sedaghat, 2023 ). If the chatbot provides incorrect information or guidance, it could mislead students and hinder their learning progress. According to Sevgi et al. ( 2023 ), although ChatGPT exhibited captivating and thought-provoking answers, it should not be regarded as a reliable information source. This point is especially important for medical education. Within the field of medical education, it is crucial to guarantee the reliability and accuracy of the information chatbots provide (Khan et al., 2023 ). If the training data used to develop an AI chatbot contains biases, the chatbot may inadvertently reproduce those biases in its responses, potentially including skewed perspectives, stereotypes, discriminatory language, or biased recommendations. This is of particular concern in an educational context.

Fair assessments. One of the challenges that educators face with the integration of Chatbots in education is the difficulty in assessing students' work, particularly when it comes to written assignments or responses. AI-generated text detection, while continually improving, is not yet foolproof and can produce false negatives or positives. This creates uncertainty and can undermine the credibility of the assessment process. Educators may struggle to discern whether the responses are genuinely student-generated or if they have been provided by an AI, affecting the accuracy of grading and feedback. This raises concerns about academic integrity and fair assessment practices (AlAfnan et al., 2023 ; Kung et al., 2023 ).

Ethical issues. The integration of AI chatbots in education raises several ethical implications, particularly concerning data privacy, security, and responsible AI use. As AI chatbots interact with students and gather data during conversations, necessitating the establishment of clear guidelines and safeguards. For example, medical education frequently encompasses the acquisition of knowledge pertaining to delicate and intimate subjects, including patient confidentiality and ethical considerations within the medical field and thus ethical and proper utilization of chatbots holds significant importance (Masters, 2023 ; Miao & Ahn, 2023 ; Sedaghat, 2023 ; Thurzo et al., 2023 ).

For these and other geopolitical reasons, ChatGPT is banned in countries with strict internet censorship policies, like North Korea, Iran, Syria, Russia, and China. Several nations prohibited the usage of the application due to privacy apprehensions. Meanwhile, North Korea, China, and Russia, in particular, contended that the U.S. might employ ChatGPT for disseminating misinformation. Conversely, OpenAI restricts access to ChatGPT in certain countries, such as Afghanistan and Iran, citing geopolitical constraints, legal considerations, data protection regulations, and internet accessibility as the basis for this decision. Italy became the first Western country to ban ChatGPT (Browne, 2023 ) after the country’s data protection authority called on OpenAI to stop processing Italian residents’ data. They claimed that ChatGPT did not comply with the European General Data Protection Regulation. However, after OpenAI clarified the data privacy issues with Italian data protection authority, ChatGPT returned to Italy. To avoid cheating on school homework and assignments, ChatGPT was also blocked in all New York school devices and networks so that students and teachers could no longer access it (Elsen-Rooney, 2023 ; Li et al., 2023 ). These examples highlight the lack of readiness to embrace recently developed AI tools. There are numerous concerns that must be addressed in order to gain broader acceptance and understanding.

To summarize, incorporating AI chatbots in education brings personalized learning for students and time efficiency for educators. Students benefit from flexible study aid and skill development. However, concerns arise regarding the accuracy of information, fair assessment practices, and ethical considerations. Striking a balance between these advantages and concerns is crucial for responsible integration in education.

The integration of artificial intelligence (AI) chatbots in education has the potential to revolutionize how students learn and interact with information. One significant advantage of AI chatbots in education is their ability to provide personalized and engaging learning experiences. By tailoring their interactions to individual students’ needs and preferences, chatbots offer customized feedback and instructional support, ultimately enhancing student engagement and information retention. However, there are potential difficulties in fully replicating the human educator experience with chatbots. While they can provide customized instruction, chatbots may not match human instructors' emotional support and mentorship. Understanding the importance of human engagement and expertise in education is crucial. A teacher's role encompasses more than just sharing knowledge. They offer students guidance, motivation, and emotional support—elements that AI cannot completely replicate.

We find that AI chatbots may benefit students as well as educators in various ways, however, there are significant concerns that need to be addressed in order to harness its capabilities effectively. Specifically, educational institutions should implement preventative measures. This includes (a) creating awareness among students, focusing on topics such as digital inequality, the reliability and accuracy of AI chatbots, and associated ethical considerations; and (b) offering regular professional development training for educators. This training should initially focus on enabling educators to integrate diverse in-class activities and assignments into the curriculum, aimed at nurturing students’ critical thinking and problem-solving skills while ensuring fair performance evaluation. Additionally, this training should cover educating educators about the capabilities and potential educational uses of AI chatbots, along with providing them with best practices for effectively integrating these tools into their teaching methods.

As technology continues to advance, AI-powered educational chatbots are expected to become more sophisticated, providing accurate information and offering even more individualized and engaging learning experiences. They are anticipated to engage with humans using voice recognition, comprehend human emotions, and navigate social interactions. Consequently, their potential impact on future education is substantial. This includes activities such as establishing educational objectives, developing teaching methods and curricula, and conducting assessments (Latif et al., 2023 ). Considering Microsoft's extensive integration efforts of ChatGPT into its products (Rudolph et al., 2023 ; Warren, 2023 ), it is likely that ChatGPT will become widespread soon. Educational institutions may need to rapidly adapt their policies and practices to guide and support students in using educational chatbots safely and constructively manner (Baidoo-Anu & Owusu Ansah, 2023 ). Educators and researchers must continue to explore the potential benefits and limitations of this technology to fully realize its potential.

The widespread adoption of chatbots and their increasing accessibility has sparked contrasting reactions across different sectors, leading to considerable confusion in the field of education. Among educators and learners, there is a notable trend—while learners are excited about chatbot integration, educators’ perceptions are particularly critical. However, this situation presents a unique opportunity, accompanied by unprecedented challenges. Consequently, it has prompted a significant surge in research, aiming to explore the impact of chatbots on education.

In this article, we present a systematic review of the latest literature with the objective of identifying the potential advantages and challenges associated with integrating chatbots in education. Through this review, we have been able to highlight critical gaps in the existing research that warrant further in-depth investigation. Addressing these gaps will be instrumental in optimizing the implementation of chatbots and harnessing their full potential in the educational landscape, thereby benefiting both educators and students alike. Further research will play a vital role in comprehending the long-term impact, variations based on student characteristics, pedagogical strategies, and the user experience associated with integrating chatbots in education.

From the viewpoint of educators, integrating AI chatbots in education brings significant advantages. AI chatbots provide time-saving assistance by handling routine administrative tasks such as scheduling, grading, and providing information to students, allowing educators to focus more on instructional planning and student engagement. Educators can improve their pedagogy by leveraging AI chatbots to augment their instruction and offer personalized support to students. By customizing educational content and generating prompts for open-ended questions aligned with specific learning objectives, teachers can cater to individual student needs and enhance the learning experience. Additionally, educators can use AI chatbots to create tailored learning materials and activities to accommodate students' unique interests and learning styles.

Incorporating AI chatbots in education offers several key advantages from students' perspectives. AI-powered chatbots provide valuable homework and study assistance by offering detailed feedback on assignments, guiding students through complex problems, and providing step-by-step solutions. They also act as study companions, offering explanations and clarifications on various subjects. They can be used for self-quizzing to reinforce knowledge and prepare for exams. Furthermore, these chatbots facilitate flexible personalized learning, tailoring their teaching strategies to suit each student's unique needs. Their interactive and conversational nature enhances student engagement and motivation, making learning more enjoyable and personalized. Also, AI chatbots contribute to skills development by suggesting syntactic and grammatical corrections to enhance writing skills, providing problem-solving guidance, and facilitating group discussions and debates with real-time feedback. Overall, students appreciate the capabilities of AI chatbots and find them helpful for their studies and skill development, recognizing that they complement human intelligence rather than replace it.

The presence of AI chatbots also brought lots of skepticism among scholars. While some see transformative potential, concerns loom over reliability, accuracy, fair assessments, and ethical dilemmas. The fear of misinformation compromised academic integrity, and data privacy issues cast an eerie shadow over the implementation of AI chatbots. Based on the findings of the reviewed papers, it is commonly concluded that addressing some of the challenges related to the use of AI chatbots in education can be accomplished by introducing preventative measures. More specifically, educational institutions must prioritize creating awareness among students about the risks associated with AI chatbots, focusing on essential aspects like digital inequality and ethical considerations. Simultaneously, investing in the continuous development of educators through targeted training is key. Empowering educators to effectively integrate AI chatbots into their teaching methods, fostering critical thinking and fair evaluation, will pave the way for a more effective and engaging educational experience.

The implications of the research findings for policymakers and researchers are extensive, shaping the future integration of chatbots in education. The findings emphasize the need to establish guidelines and regulations ensuring the ethical development and deployment of AI chatbots in education. Policies should specifically focus on data privacy, accuracy, and transparency to mitigate potential risks and build trust within the educational community. Additionally, investing in research and development to enhance AI chatbot capabilities and address identified concerns is crucial for a seamless integration into educational systems. Researchers are strongly encouraged to fill the identified research gaps through rigorous studies that delve deeper into the impact of chatbots on education. Exploring the long-term effects, optimal integration strategies, and addressing ethical considerations should take the forefront in research initiatives.

Availability of data and materials

The data and materials used in this paper are available upon request. The comprehensive list of included studies, along with relevant data extracted from these studies, is available from the corresponding author upon request.

Change history

15 april 2024.

A Correction to this paper has been published: https://doi.org/10.1186/s41239-024-00461-6

Al Ka’bi, A. (2023). Proposed artificial intelligence algorithm and deep learning techniques for development of higher education. International Journal of Intelligent Networks, 4 , 68–73.

Article   Google Scholar  

AlAfnan, M. A., Dishari, S., Jovic, M., & Lomidze, K. (2023). Chatgpt as an educational tool: Opportunities, challenges, and recommendations for communication, business writing, and composition courses. Journal of Artificial Intelligence and Technology, 3 (2), 60–68.

Google Scholar  

Alsanousi, B., Albesher, A. S., Do, H., & Ludi, S. (2023). Investigating the user experience and evaluating usability issues in ai-enabled learning mobile apps: An analysis of user reviews. International Journal of Advanced Computer Science and Applications , 14(6).

AlZubi, S., Mughaid, A., Quiam, F., & Hendawi, S. (2022). Exploring the Capabilities and Limitations of ChatGPT and Alternative Big Language Models. Artificial Intelligence and Applications .

Aron, J. (2011). How innovative is Apple’s new voice assistant. Siri, NewScientist , 212 (2836), 24

Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Available at SSRN 4337484 .

Benvenuti, M., Cangelosi, A., Weinberger, A., Mazzoni, E., Benassi, M., Barbaresi, M., & Orsoni, M. (2023). Artificial intelligence and human behavioral development: A perspective on new skills and competencies acquisition for the educational context. Computers in Human Behavior, 148 , 107903.

Browne, R. (2023). Italy became the first Western country to ban ChatGPT. Here’s what other countries are doing . CNBC (Apr. 4, 2023).

Celik, I., Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: A systematic review of research. TechTrends, 66 (4), 616–630.

Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial Intelligence trends in education: A narrative overview. Procedia Computer Science, 136 , 16–24.

Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8 , 75264–75278.

Chen, Y., Jensen, S., Albert, L. J., Gupta, S., & Lee, T. (2023). Artificial intelligence (AI) student assistants in the classroom: Designing chatbots to support student success. Information Systems Frontiers, 25 (1), 161–182.

Choi, J. H., Hickman, K. E., Monahan, A., & Schwarcz, D. (2023). Chatgpt goes to law school. Available at SSRN .

Colby, K. M. (1981). PARRYing. Behavioral and Brain Sciences, 4 (4), 550–560.

Cooper, G. (2023). Examining science education in chatgpt: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology, 32 (3), 444–452.

Crawford, J., Cowling, M., & Allen, K.-A. (2023). Leadership is needed for ethical ChatGPT: Character, assessment, and learning using artificial intelligence (AI). Journal of University Teaching and Learning Practice, 20 (3), 02.

Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education, 20 (1), 1–22.

Deng, X., & Yu, Z. (2023). A meta-analysis and systematic review of the effect of chatbot technology use in sustainable education. Sustainability, 15 (4), 2940.

Dergaa, I., Chamari, K., Zmijewski, P., & Saad, H. B. (2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40 (2), 615–622.

Devedzic, V. (2004). Web intelligence and artificial intelligence in education. Journal of Educational Technology and Society, 7 (4), 29–39.

Dinh, T. N., & Thai, M. T. (2018). AI and blockchain: A disruptive integration. Computer, 51 (9), 48–53.

Elsen-Rooney, M. (2023). NYC education department blocks ChatGPT on school devices, networks. Retrieved Jan , 25 , 2023.

Essel, H. B., Vlachopoulos, D., Tachie-Menson, A., Johnson, E. E., & Baah, P. K. (2022). The impact of a virtual teaching assistant (chatbot) on students’ learning in Ghanaian higher education. International Journal of Educational Technology in Higher Education, 19 (1), 1–19.

Eysenbach, G. (2023). The role of ChatGPT, generative language models, and artificial intelligence in medical education: A conversation with ChatGPT and a call for papers. JMIR Medical Education, 9 (1), e46885.

Fariani, R. I., Junus, K., & Santoso, H. B. (2023). A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning, 28 (2), 449–476.

Fauzi, F., Tuhuteru, L., Sampe, F., Ausat, A. M. A., & Hatta, H. R. (2023). Analysing the role of ChatGPT in improving student productivity in higher education. Journal on Education, 5 (4), 14886–14891.

Herft, A. (2023). A Teacher’s Prompt Guide to ChatGPT aligned with’What Works Best’ .

Hoffer, R., Kay, T., Levitan, P., & Klein, S. (2001). Smarterchild . ActiveBuddy.

Holotescu, C. (2016). MOOCBuddy: A Chatbot for personalized learning with MOOCs. RoCHI , 91–94.

Kabiljo, M., Vidas-Bubanja, M., Matic, R., & Zivkovic, M. (2020). Education system in the republic of serbia under COVID-19 conditions: Chatbot-acadimic digital assistant of the belgrade business and arts academy of applied studies. Knowledge-International Journal, 43 (1), 25–30.

Kaharuddin, A. (2021). Assessing the effect of using artificial intelligence on the writing skill of Indonesian learners of English. Linguistics and Culture Review, 5 (1), 288.

Kahraman, H. T., Sagiroglu, S., & Colak, I. (2010). Development of adaptive and intelligent web-based educational systems. In 2010 4th International Conference on Application of Information and Communication Technologies , 1–5.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., & Hüllermeier, E. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103 , 102274.

Khademi, A. (2023). Can ChatGPT and bard generate aligned assessment items? A reliability analysis against human performance. ArXiv Preprint ArXiv:2304.05372.

Khan, R. A., Jawaid, M., Khan, A. R., & Sajjad, M. (2023). ChatGPT-Reshaping medical education and clinical management. Pakistan Journal of Medical Sciences, 39 (2), 605.

Kietzmann, J., Paschen, J., & Treen, E. (2018). Artificial intelligence in advertising: How marketers can leverage artificial intelligence along the consumer journey. Journal of Advertising Research, 58 (3), 263–267.

Kikalishvili, S. (2023). Unlocking the potential of GPT-3 in education: Opportunities, limitations, and recommendations for effective integration. Interactive Learning Environments , 1–13.

Konecki, M., Konecki, M., & Biškupić, I. (2023). Using artificial intelligence in higher education. In Proceedings of the 15th International Conference on Computer Supported Education .

Krstić, L., Aleksić, V., & Krstić, M. (2022). Artificial intelligence in education: A review .

Kuhail, M. A., Alturki, N., Alramlawi, S., & Alhejori, K. (2023). Interacting with educational chatbots: A systematic review. Education and Information Technologies, 28 (1), 973–1018.

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., et al. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health, 2 (2), e0000198.

Lally, A., & Fodor, P. (2011). Natural language processing with prolog in the ibm watson system. The Association for Logic Programming (ALP) Newsletter , 9 , 2011.

Latif, E., Mai, G., Nyaaba, M., Wu, X., Liu, N., Lu, G., ... & Zhai, X. (2023). Artificial general intelligence (AGI) for education. arXiv preprint arXiv:2304.12479.

Li, L., Ma, Z., Fan, L., Lee, S., Yu, H., & Hemphill, L. (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media. ArXiv Preprint ArXiv:2305.02201.

Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13 (4), 410.

Masters, K. (2023). Ethical use of artificial intelligence in health professions education: AMEE Guide No. 158. Medical Teacher , 45 (6), 574–584.

Miao, H., & Ahn, H. (2023). Impact of ChatGPT on interdisciplinary nursing education and research. Asian/pacific Island Nursing Journal, 7 (1), e48136.

Moppel, J. (2018). Socratic chatbot . University Of Tartu, Institute of Computer Science, Bachelor’s Thesis.

Okonkwo, C. W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2 , 100033.

Pentina, I., Hancock, T., & Xie, T. (2023). Exploring relationship development with social chatbots: A mixed-method study of replika. Computers in Human Behavior, 140 , 107600.

Peredo, R., Canales, A., Menchaca, A., & Peredo, I. (2011). Intelligent Web-based education system for adaptive learning. Expert Systems with Applications, 38 (12), 14690–14702.

Pérez, J. Q., Daradoumis, T., & Puig, J. M. M. (2020). Rediscovering the use of chatbots in education: A systematic literature review. Computer Applications in Engineering Education, 28 (6), 1549–1565.

Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. IEEE Global Engineering Education Conference (EDUCON), 2023 , 1–9.

Rahaman, M. S., Ahsan, M. M., Anjum, N., Rahman, M. M., & Rahman, M. N. (2023). The AI race is on! Google’s Bard and OpenAI’s ChatGPT head to head: An opinion article. Mizanur and Rahman, Md Nafizur, The AI Race Is On .

Rudolph, J., Tan, S., & Tan, S. (2023). War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching, 6 (1).

Ruthotto, I., Kreth, Q., Stevens, J., Trively, C., & Melkers, J. (2020). Lurking and participation in the virtual classroom: The effects of gender, race, and age among graduate students in computer science. Computers & Education, 151 , 103854.

de Sales, A. B., & Antunes, J. G. (2021). Evaluation of educational games usage satisfaction. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI) , 1–6.

Schiff, D. (2021). Out of the laboratory and into the classroom: the future of artificial intelligence in education. AI & Society, 36 (1), 331–348.

Sedaghat, S. (2023). Success through simplicity: What other artificial intelligence applications in medicine should learn from history and ChatGPT. Annals of Biomedical Engineering , 1–2.

Sevgi, U. T., Erol, G., Doğruel, Y., Sönmez, O. F., Tubbs, R. S., & Güngor, A. (2023). The role of an open artificial intelligence platform in modern neurosurgical education: A preliminary study. Neurosurgical Review, 46 (1), 86.

Shidiq, M. (2023). The use of artificial intelligence-based chat-gpt and its challenges for the world of education; from the viewpoint of the development of creative writing skills. Proceeding of International Conference on Education, Society and Humanity, 1 (1), 353–357.

Shoufan, A. (2023). Exploring Students’ Perceptions of CHATGPT: Thematic Analysis and Follow-Up Survey. IEEE Access .

St-Hilaire, F., Vu, D. D., Frau, A., Burns, N., Faraji, F., Potochny, J., Robert, S., Roussel, A., Zheng, S., & Glazier, T. (2022). A new era: Intelligent tutoring systems will transform online learning for millions. ArXiv Preprint ArXiv:2203.03724.

Sullivan, M., Kelly, A., & McLaughlan, P. (2023). ChatGPT in higher education: Considerations for academic integrity and student learning .

Tahiru, F. (2021). AI in education: A systematic literature review. Journal of Cases on Information Technology (JCIT), 23 (1), 1–20.

Tate, T., Doroudi, S., Ritchie, D., & Xu, Y. (2023). Educational research and AI-generated writing: Confronting the coming tsunami .

Thurzo, A., Strunga, M., Urban, R., Surovková, J., & Afrashtehfar, K. I. (2023). Impact of artificial intelligence on dental education: A review and guide for curriculum update. Education Sciences, 13 (2), 150.

Wallace, R. (1995). Artificial linguistic internet computer entity (alice). City .

Wang, Q., Jing, S., Camacho, I., Joyner, D., & Goel, A. (2020). Jill Watson SA: Design and evaluation of a virtual agent to build communities among online learners. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems , 1–8.

Warren, T. (2023). Microsoft is looking at OpenAI’s GPT for Word, Outlook, and PowerPoint. The Verge .

Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9 (1), 36–45.

Williams, C. (2023). Hype, or the future of learning and teaching? 3 Limits to AI’s ability to write student essays .

Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H. (2021). Are we there yet?—A systematic literature review on chatbots in education. Frontiers in Artificial Intelligence, 4 , 654924.

Xie, T., & Pentina, I. (2022). Attachment theory as a framework to understand relationships with social chatbots: A case study of Replika .

Zhang, Q. (2023). Investigating the effects of gamification and ludicization on learning achievement and motivation: An empirical study employing Kahoot! and Habitica. International Journal of Technology-Enhanced Education (IJTEE), 2 (1), 1–19.

Download references

Acknowledgements

Not applicable.

The authors declare that this research paper did not receive any funding from external organizations. The study was conducted independently and without financial support from any source. The authors have no financial interests or affiliations that could have influenced the design, execution, analysis, or reporting of the research.

Author information

Authors and affiliations.

Finance Department, American University of the Middle East, Block 6, Building 1, Egaila, Kuwait

Lasha Labadze

Statistics Department, American University of the Middle East, Block 6, Building 1, Egaila, Kuwait

Maya Grigolia

Caucasus School of Business, Caucasus University, 1 Paata Saakadze St, 0102, Tbilisi, Georgia

Lela Machaidze

You can also search for this author in PubMed   Google Scholar

Contributions

LL provided a concise overview of the existing literature and formulated the methodology. MG initiated the initial search process. LM authored the discussion section. All three authors collaborated on the selection of the final paper collection and contributed to crafting the conclusion. The final version of the paper received approval from all authors.

Corresponding author

Correspondence to Lasha Labadze .

Ethics declarations

Competing interests.

Authors have no competing interests to declare.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: “A sentence has been added to the Methodology section of the article to acknowledge use of LLM”

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Labadze, L., Grigolia, M. & Machaidze, L. Role of AI chatbots in education: systematic literature review. Int J Educ Technol High Educ 20 , 56 (2023). https://doi.org/10.1186/s41239-023-00426-1

Download citation

Received : 22 August 2023

Accepted : 18 October 2023

Published : 31 October 2023

DOI : https://doi.org/10.1186/s41239-023-00426-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic literature review
  • Artificial intelligence
  • AI chatbots
  • Chatbots in education

literature review on bot

A literature review on users' behavioral intention toward chatbots' adoption

Applied Computing and Informatics

ISSN : 2634-1964

Article publication date: 5 July 2022

Despite the fact that chatbots have been largely adopted for the last few years, a comprehensive literature review research focusing on the intention of individuals to adopt chatbots is rather scarce. In this respect, the present paper attempts a literature review investigation of empirical studies focused on the specific issue in nine scientific databases during 2017-2021. Specifically, it aims to classify extant empirical studies which focus on the context of individuals' adoption intention toward chatbots.

Design/methodology/approach

The research is based on PRISMA methodology, which revealed a total of 39 empirical studies examining users' intention to adopt and utilize chatbots.

After a thorough investigation, distinct categorization criteria emerged, such as research field, applied theoretical models, research types, methods and statistical measures, factors affecting intention to adopt and further use chatbots, the countries/continents where these surveys took place as well as relevant research citations and year of publication. In addition, the paper highlights research gaps in the examined issue and proposes future research directions in such a promising information technology solution.

Originality/value

As far as the authors are concerned, there has not been any other comprehensive literature review research to focus on examining previous empirical studies of users' intentions to adopt and use chatbots on the aforementioned period. According to the authors' knowledge, the present paper is the first attempt in the field which demonstrates broad literature review data of relevant empirical studies.

  • Literature review
  • Adoption intention
  • Artificial intelligence

Gatzioufa, P. and Saprikis, V. (2022), "A literature review on users' behavioral intention toward chatbots' adoption", Applied Computing and Informatics , Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/ACI-01-2022-0021

Emerald Publishing Limited

Copyright © 2022, Paraskevi Gatzioufa and Vaggelis Saprikis

Published in Applied Computing and Informatics . Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

The widespread use of the Internet and the development of modern technologies have brought about significant changes, including artificial intelligence (AI) agents or chatbots. Chatbots are programs which, using AI, can answer users' questions usually during a text-based conversation [ 1–4 ]. Thus, in many cases they replace employees in customer service transactions, who, in the context of interaction with customers, answer their questions, propose solutions and redefine suggestions according to preferences and choices [ 5–7 ].

Chatbots have been variously defined in the international literature. They are frequently described as “software agents that facilitate automated conversation through natural language processing” [ 8 ], or as “an artificial construct that is designed to converse with human beings using natural language as input and output” [ 9 ], and “Artificial Conversational Entities or computer programs, based on AI, which are very interactive and conduct a conversation via auditory or textual method” [ 10 ].

Text chatbots have transformed communication and interaction between businesses and customers, by providing immediate response to requests, without time or space constraints and without human intervention.

In recent years, the use of chatbots has been widely adopted as part of the companies' marketing strategy [ 11 ]. Based on their utilization, customer service has been improved by reducing response time to requests and increasing loyalty. Typically, chatbots have been used for providing customers with entertainment and useful information, easily and fast, 24 hours a day with personalized help, saving both costs and manpower [ 3 ].

On the other hand, conventional customer service practices have not been abandoned. It is worth noting that a potential significant deterrent to the adoption of chatbots by users is the fact that a large number of customers tend to use traditional communication channels (i.e. mail, website and telephone) when communicating with companies, mainly because of security and privacy of personal data, which are critical issues requiring special attention in terms of their management [ 11 ]. Thus, as research has shown, trust and privacy concerns affect customers [ 12 ].

In the extant literature, various researchers have focused on a number of aspects related to chatbots. Remarkably, a significant number of researchers have focused on the intention to adopt and use chatbots by investigating factors which affect users in specific research areas, such as health [ 13 ], financial services [ 14–16 ], tourism [ 5, 17–20 ], customer service (e.g. Refs [ 1 ,  21–34 ]), mobile commerce [ 35–37 ], business [ 38, 39 ], insurance [ 12, 40 ] and education [ 41, 42 ].

The purpose of the present research is to provide a comprehensive literature review of the existing empirical studies in the field regarding individuals' intentions to adopt and use chatbots. More specifically, the research intends to categorize these studies in terms of a number of criteria, such as applied research methods, areas of chatbots' utilization, theoretical models, influential factors, the countries/continents where most studies have been carried out in the specific field and relevant research citations and year of publication. These classifications are expected to provide a cumulative and better view of the examined topic. As there has not been any other comprehensive literature review research to focus on examining previous empirical studies of users' intentions to adopt and use chatbots, the present paper is the first attempt in the field which demonstrates literature review data of relevant empirical studies.

Which behavioral theories have been most frequently used in the research of individuals' intention to adopt and use chatbots?

Which are the most commonly observed factors that influence users toward the adoption and use of chatbots?

In which sectors are the use of chatbots more frequently observed?

In which countries/continents has extensive and focused research been carried out?

Apart from the specific issues in relation to chatbot adoption and use, the paper aims to identify research gaps in the context of individuals' adoption intention toward chatbots, as well as reveal future research prospects in such a promising information technology solution for contemporary e- and m-business models.

The rest of the paper is divided into four sections. Section 2 discusses the applied research methodology, whereas Section 3 provides a literature review classification. Finally, Section 4 includes the conclusions drawn from the relevant literature review as well as the potential limitations before recommending a series of suggestions for future research and practice.

2. Research methodology

As already mentioned, the present paper discusses the researched literature review using PRISMA methodology (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), a most suitable methodology tool for the objectives of this study, by encompassing all the empirical studies concerning individuals' intention to adopt and use chatbots during 2017-2021 (i.e. from January 2017 to September 2021). The researched papers were selected in terms of specific inclusion and exclusion criteria, such as language–all texts should be written in English – and specific search keywords: chatbot, intention, adoption, usage, text-bot and AI , either in the paper title, abstract or keywords. Specifically, the search statement was as follows: (“chatbot” or “text-bot” OR “AI”) AND (“intention” OR “adoption” OR “usage”). Moreover, it was decided to focus on the most contemporary period when chatbots have been adopted and utilized. Hence, this is the reason why the research has included empirical studies from the last five years (2017–2021).Finally, it should me mentioned that the survey took place between February 2020 and September 2021.

Collection of data was based on the following procedure: Initially, a search of scientific databases was carried out, during which 90 empirical research papers emerged, 20 of which were readily excluded due to duplicate registrations. Next, 10 papers were rejected; eight of them because they did not meet the search criteria and three because they were not written in English. Full access to text was possible for 59. Of these 59 papers and after an in-depth investigation, 20 were not taken into consideration since they were not related to the scope of the paper. Of the remaining 39 sources, eight were conference papers, 27 were published in journals, 1 was an MSc Dissertation, 1 a workshop paper, 1 a symposium paper and 1 a book chapter. Our research relied on the following online academic databases: Science Direct, Emerald, Taylor and Francis, Elsevier, Research Gate, Wiley, IEEE Explore, ACM Digital Libraries and Google Scholar. The quality of the papers that are included to these databases guarantees the trustworthiness of the results of this literature review study.

All researched papers were carefully reviewed, and through their examination, a classification of the prevalent categories emerged. Notably, most research studies include all categories. The classification was based on eight different criteria: Types of Data Analysis, Research Methods, Statistical Methods of Analysis, Field of Study, Behavioral Theories Used, Factors which Affect Adoption Intention, Citations and Year of Publication, Country/Continent, all of which correspond to the specific research questions. The results of the classification were then organized into tables, followed by related comments, aiming to answer the research questions.

The use of certain criteria reflects the studies which were conducted, and describes the issues examined, distinguishing several categories, in accordance with those discussed by Misirlis and Vlachopoulou [ 43 ]. The review raises the research questions which reveal the trends in the specific field of individuals' behavioral intention to adopt and use chatbots. The applied methodology is shown in Figure 1 below.

3. Literature classification

The literature review classification was conducted in terms of the following eight criteria: types of data analysis, research methods, statistical methods of analysis, field of study, behavioral theories used by previous researchers, factors affecting the adoption and use of AI agents, number of citations and year of publication of the empirical research and countries/continents where the specific studies were conducted. Following previous literature review studies (e.g. Ref. [ 43 ]), the selection of these categorizations is expected to better present the extant and most contemporary empirical studies in the examined issue as well as reveal potential research gaps in the context of individuals' adoption intention toward chatbots.

3.1 Types of data analysis

Overall, the most common research method regarding users' intention to adopt and use text chatbots is quantitative research ( N  = 20, 51.2%), whereas qualitative research was used by authors to a much lesser extent in the literature ( N  = 4, 10.2%). Overall, most researchers have applied quantitative analysis methods to investigate the intention to adopt and use chatbots, such as Van Den Broeck et al. work [ 21 ]. In the specific empirical study 245 Facebook users were asked to rate their experience of using a chatbot (Cinebot). Qualitative methods were used by authors such as Mogaji et al. [ 15 ], who investigated the interaction of 36 Nigerians with chatbots in the banking sector.

However, there have also been mixed methods that include both qualitative and quantitative research ( N  = 15, 38.4%). For example, Cardona et al. [ 40 ] examined the factors which affect the adoption of chatbots in the insurance sector in Germany, using a sample of 300 respondents via email and social networks, as well as seven interviews with experts.

Summarizing the information presented, it can be deduced that quantitative studies seem to be more suitable for investigations in the context of individuals' behavioral intention toward chatbot adoption. The outcome of the type of analyses classified in quantitative, qualitative and mixed methods is presented in Table 1 .

3.2 Research methods

Regarding the research methods applied, almost all empirical studies utilized e-questionnaires ( N  = 19, 48.7%), which was the main data collection method. For example, Soni and Pooja [ 26 ] developed an e-questionnaire to determine the factors affecting the adoption of chatbots by generation Z.

With regard to interviewing, only three (7.7%) researchers utilized it as a data collection method. Interviews were used by researchers, such as Folstad et al. [ 44 ], and Mogaji et al. [ 15 ] who, by conducting semi-structured interviews, expected to receive answers to specific research questions.

However, several research studies utilized mixed research methods, such as experimental investigations and questionnaires (e.g. Refs [ 27, 28 ]) or a combination of interviews and experimental methods [ 29 ].

By combining these methods, the researchers tried to investigate the interaction with chatbots and the effect they had on the subsequent intention to use chatbots (e.g. Refs [ 45, 46 ]), as shown in Table 2 .

The above discussion demonstrated that e-questionnaires have various key advantages compared to other research methods, such as easy, fast and inexpensive access to a broad base of potential respondents, as well as anonymity, which makes it ideal for relevant empirical studies. Thus, it was applied by the majority of previous researchers, whereas a considerable number of studies preferred a combination of e-questionnaires with experimental investigations. In effect, the interaction of individuals with a chatbot, combined with a set of relevant questions, is another attractive way to get and analyze data about a number of users who interact for the first time with a new chatbot agent. Such studies can significantly help companies to further improve their emerging or utilized AI solution.

3.3 Statistical methods of analysis

Concerning data analyses, the researched quantitative surveys used statistical processing methods to draw valuable conclusions. The principal method was PLS-SEM ( N  = 13, 33.3%), followed by descriptive statistics measures ( N  = 10, 25.6%). Methods with lower frequency of utilization were regression analysis ( N  = 8, 20.5 %), ANOVA ( N  = 9, 23%), T -test and chi-square test ( N  = 7, 18%). Factor Analysis ( N  = 4, 10.2%) and correlation analysis ( N  = 5, 12.8%) were used to a lesser extent, as shown in Table 3 . Finally, four research papers did not use statistical methods of analysis ( N  = 4, 10.2%); they used content analysis after interviews.

It is worth noting that in several research papers a combination of statistical methods and analyses were applied. For example, Malik et al. [ 30 ] applied ANOVA and factor analysis. However, the predominance of PLM-SEM as the most suitable applied method can be attributed to the fact that it can analyze a large number of factors via various simultaneous regressions, which can have any direct and indirect impact of the examined factors on a single structural model.

3.4 Field of study

As discussed in the introduction, although chatbots have been applied in various areas, the largest percentage of previous empirical studies were focused on e-commerce and customer service ( N  = 17, 43.6%). Companies, especially those with strong customer interaction, have realized the importance of incorporating modern technologies, such as chatbots, in their operations. Hence, they compete with each other to offer the highest quality of customer experience, and their marketing practices have been restructured and oriented toward new tactics, including chatbots, which are an alternative to traditional customer service, by providing an additional level of support of services anytime, anywhere (e.g. Refs [ 1, 21 ]).

A smaller percentage surveyed the intention to use in the tourism industry ( N  = 5, 12.8%), where this technology is widely applied (e.g. Refs [ 5, 17 ]) and has been greatly benefited by the utilization of technologies, such as chatbots.

The research also demonstrated adoption of agents by students in 5.1% ( N  = 2) of cases; similarly, factors that lead to chatbot adoption using mobile phones was also observed in 7.7% ( N  = 3) of the cases [ 35–37 ]. Overall, mobile commerce is an easy and convenient process, with no space or time constraints. As all consumers who have a mobile phone can make purchases, chat-based marketing is one of the most popular digital tools.

In addition, 5.1% ( N  = 2) of the papers examined the acceptance of chatbots among employees in the banking sector [ 14–16 ]. An increasing percentage of financial institutions have already adopted the technology of chatbots aiming to facilitate the support to their clients' financial decisions and transactions. With regards to insurance companies, two studies (5.1%) were identified. Finally, one survey in the public transport [ 47 ], one in the health industry [ 13 ] (2.5%) and one in veterinary medicine [ 48 ] were also identified ( Table 4 ). The specific discussion of the sectors where chatbots have been examined is related to the 3rd research question of the paper.

3.5 Behavioral theories

The empirical studies examined in the present study were based on well-known behavioral theories, schemes and models to investigate specific factors. More specifically, the most frequently used approach in previous studies is UTAUT (Unified theory of acceptance and use of technology) and UTAUT2 ( N  = 5, 12.8%), proposed by Venkatesh et al. [ 49 ] to explain user intentions and subsequent use behavior in relation to information systems. According to this theory, the factors which affect the intention to use technology are performance expectancy, effort expectancy, social influence and facilitating conditions. Venkatesh et al. [ 50 ] expanded it to UTAUT2 theory, by adding three intrinsic factors affecting usage intention: hedonic motivation, price value and habit. According to the above characteristics, this theory fitted well and was applied in two research papers of this investigation [ 19, 42 ].

Another theoretical model observed in research papers is TAM (technology acceptance model), formulated by Davis et al. [ 51 ], to predict users' behavior toward adopting a new technology. In specific TAM, on its original form, was utilized by two studies [ 12, 18 ]. However, TAM was widely applied by being combined with other theories. Particularly, in five studies combined theories have been used (TAM and DOI; TAM and ECM and ISS; TAM & SST; TAM, DOI and TOE) [ 1, 29, 35, 37, 40 ], whereas in seven (18%) research papers information system (IS) continuance models such as SOR, U&G, TPB, CAT, TRA, SERVQUAL, extended post acceptance model were applied. Cheng et al. [ 52 ] used SOR theory to explain the behavior of consumers toward chatbots in the context of e-commerce.

Finally, it is worth noting that 51.3% of the research papers did not employ a common theoretical model ( N  = 20). In specific, these empirical studies base their proposed framework by combining various factors used in a number of past papers and have been already proved for their validity and reliability. Despite the fact that UTAUT(2) and TAM are the most widely applied behavioral theories in previous empirical studies, it should be emphasized that a large number of other theories have been applied in the examined field as well. The results are presented in Table 5 and are associated with the 1st research question of the paper.

3.6 Factors affecting adoption intention

Regarding the second research question, the results ( Table 6 ) demonstrate that the main determinants of TAM and UTAUT (both versions), which are the main behavioral theories, and on which previous researchers based their empirical studies, have been more commonly applied. More specifically, perceived usefulness, which is a key factor of TAM, is top on the list, followed by performance expectancy, trust and attitude. Apart from these factors, effort expectancy, habit, perceived enjoyment, perceived ease of use and social influence were also examined and confirmed in various relevant empirical studies. It should be emphasized, however, that these factors have a direct effect on individuals to adopt and use chatbots.

On the other hand, there are factors which indirectly affect users toward their behavioral intention to adopt and use chatbots (i.e. perceived completeness and communication style) [ 53 ]. There is also a significant number of factors with both direct and indirect impact on individuals in the examined topic, such as perceived usefulness, perceived enjoyment, perceived ease of use, trust and attitude.

3.7 Citations and year of publication

The research demonstrated the top five most popular citations until 7/10/2021 ( Table 7 ). Obviously, the research article by Brandtzaeg and Folstad [ 8 ] ranks first with 410 cross-references, followed by Ciechanowski et al. [ 45 ] and Go and Sundar's [ 54 ] empirical work.

As shown in Figure 2 , research in the examined topic has significantly increased since 2019, peaking in 2020. Two papers were carried out in 2017 and 2018, nine in 2019, 15 in 2020 and 11 until September of 2021. Therefore, the results show an increase in the published papers during the last two years.

3.8 Researched countries/continents

With regard to the countries where these empirical studies were undertaken, the countries where there is the vast majority of investigations on users' behavioral intention to adopt and use chatbots were the United States ( N  = 7, 18%) and India ( N  = 6, 15.4%), followed by the United Kingdom ( N  = 5, 12.8%). These three countries are followed by Germany, Italy, the Netherlands, Norway and China ( N  = 2, 5.1%), whereas in Poland, Japan, South Korea, Indonesia, Nigeria, Spain, Taiwan and the Philippines only one research paper was identified in each (2.6%). In addition, one empirical investigation was conducted in two countries – the United States and the Netherlands, whereas in four (12.8%) studies the countries where the investigations took place were not mentioned.

As regards the continents where the greatest number of surveys has been carried out, Europe ranks first with 15 surveys, followed by Asia where 13 empirical studies were identified, whereas in America, except for the USA, no other countries have carried out any investigations. Similarly, one empirical study was observed in Africa (Nigeria), whereas there are no such studies in Oceania, which is quite surprising. Taking into consideration the broad and continuous adopt of chatbots from numerous e- and m-business models worldwide, it is very surprising that there have not been analogous studies in a large number of countries so far. Table 8 summarizes the relevant information and demonstrates the relation to the 4th research question of the paper.

4. Conclusions

The present paper is a literature review study concerning the empirical investigation of users' behavioral intention to adopt and use chatbots during the last five years. By analyzing key characteristic points of these empirical research studies, a number of significant findings were drawn.

According to the research results in terms of the theoretical models applied, the most commonly used approaches are UTAUT(2) and TAM. Regarding the factors which affect the intention to use and adopt chatbots, performance expectancy, effort expectancy, social influence, trust and attitude are the most significant considerations associated with the behavioral theories on which they were based, whereas in relation to the areas on which most of the research work was focused, customer service ranks first by far. It can, thus, be concluded that an increasing number of companies are focusing their marketing strategies on adopting such technologies to provide rapid and effective services through websites or mobile apps, as a great number of consumers spend most of their time online, either for fun or for informational and/or professional issues.

The present research discussed surveys based mostly on quantitative data collection methods through e-questionnaires, whereas most researchers employed the PLS-SEM statistical method. As regards the countries and continents where empirical research has been carried out, most surveys have taken place in Europe and Asia. Notably, however, there have been no such studies in Oceania and in a large number of developed countries, such as France, Canada, Sweden, Switzerland, etc., where chatbots have been widely applied and greatly welcomed by users in various e- and m-business models. Therefore, this is a significant research gap, on which researchers are expected to focus attention in the upcoming period.

Furthermore, taking into consideration that there is a great interest in using chatbots mostly in customer service, companies should understand their role, especially in the context of the pandemic and the ensuing social distancing, so that they would be able to meet their customers' needs promptly, safely and beyond geographical or time constraints. However, despite the fact that most empirical studies have focused on customer service, there is a significant research gap in specific fields, such as the telecommunication industry, where chatbots have been already applied. Thus, a number of studies on such fields might provide useful insights for chatbots' adoption and further use.

The present literature review highlights the need for the academia to employ knowledge and further investigation of additional factors and dimensions requiring the application and use of chatbots in e- and m-commerce, which have not been identified to date. The specific factors may provide the ground for empirical studies in digital contexts. The categorization ensued should stimulate further efforts for development and evaluation for improving and advancing the relevant research.

Moreover, the research aims at emphasizing the developments in the relevant field by highlighting the factors that cause users to adopt AI agents and provides the appropriate theoretical background for both the academia and the industry. It also underscores additional components which have not been researched and may play a role in e-commerce and m-commerce, such as time response, efficiency and effectiveness of users' experience, mobile apps convenience, internal barriers, pre- and post-use behavior and how it relates to intention and satisfaction resulting from the quality of chatbot service.

The implications of the survey for practitioners are associated with the policies they must follow to enhance the factors which affect the intention to use chatbots, as well as create the conditions that will make individuals integrate and further utilize chatbots in their transactions with companies. In addition, as already discussed, a significant number of such investigations should take place in countries and sectors where chatbots have been applied or intended to be used. Hence, companies could definitely have a more comprehensive view about their chatbot investments and how they can derive better outcomes from this promising IT solution.

With regard to further future research recommendations, particular attention should be paid to the generalization of the research results, as this is related to various factors, which seem contradictory and cover multi-dimensional perspectives in a way. In detail, the most common determinants are the socio-economic and cultural environment of each country and its status in terms of development, as well as digitization level and access-use of the Internet in each country. An additional key factor of differentiation is individuals' personality and innovation characteristics. Thus, a person who is reluctant to accept anything new and innovative will not adopt chatbots as readily as a person who is eager to try new and innovative e-solutions. On the other hand, a fourth factor is to build trust and ensure the quality of service between individuals and companies as far as chatbot is concerned. Companies, especially those involved in customer service, make efforts to respond as quick and efficient as possible to their customers' requests, aiming to satisfy and subsequently gain their loyalty. Therefore, e-readiness to integrate such marketing practices in their business strategy is interesting to be further investigated. Chatbots have already been widely acceptable in some countries, and an important issue is how to get most benefits from their capabilities. In contrast, as aforementioned, there are a number of countries that lag behind because of socio-economic factors and technology-related determinants. To sum up, a multi-dimensional future examination of the aforementioned factors is expected to reveal useful insights and possibly reveal novel chatbots' perspectives to both academia and the industry.

literature review on bot

PRISMA methodology procedure

literature review on bot

Number of references per year

Types of data analysis

Research methods

Statistical methods of analysis

Field of study

Behavioral theories

Factors influencing intention

List of the top-5 most cited papers, as of October 7th, 2021

Chatbot adoption-intention papers per continent and country

1 Ashfaq M , Yun J , Yu S , Loureiro SMC. , Chatbot I . Modeling the determinants of users' satisfaction and continuance intention of AI-power service agents . Telematics Inform . 2020 ; 54 : 17 .

2 Przegalinska A , Ciechanowski L , Stroz A , Gloor P , Mazurek G . In bot we trust: a new methodology of chatbot performance measures . Bus Horiz . 2019 ; 62 ( 6 ): 785 - 97 .

3 Radziwill NM , Benton MC . Evaluating quality of chatbots and intelligent conversational agents . Comput Sci (Internet) . 2017 . Available from: http://arxiv.org/abs/1704.04579.pdf .

4 Sivaramakrishnan S , Wan F , Tang Z . Giving an “e-human touch” to e-tailing: the moderating roles of static information quantity and consumption motive in the effectiveness of an anthropomorphic information agent . J Interact Mark . 2007 ; 21 ( 1 ): 60 - 75 .

5 Dash M , Bakshi S . An exploratory study of customer perceptions of usage of chatbots in the hospitality industry . Int J Cust Relat . 2019 ; 7 ( 2 ): 27 - 33 .

6 Rowley J . Product searching with shopping bots . Internet Res . 2000 ; 10 ( 3 ): 203 - 14 .

7 Smith MD . The impact of shopbots on electronic markets . J Acad Mark Sci . 2002 ; 30 ( 4 ): 446 - 54 .

8 Brandtzaeg PB , Folstad A . Why people use chatbots . In: International Conference on Internet Science ; 2017 Nov 22-24 ; Thessaloniki . 377 - 92 . doi: 10.1007/978-3-319-70284-1_30 .

9 Brennan K . The managed teacher: emotional labour, education, and technology . Educ Insights . 2006 ; 10 ( 2 ): 55 - 65 .

10 Shawar BA , Atwell E . Different measurements metrics to evaluate a chatbot system . In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies ; 2007 . 89 - 96 .

11 Zumstein D , Hundertmark S . Chatbots - an interactive technology for personalized communication, transactions and services . IADIS Int J WWW/Internet . 2017 ; 15 ( 1 ): 96 - 109 .

12 Cardona DR , Janssen A , Guhr N , Breitnet MH , Milde J . A matter of trust? Examination of chatbot usage in insurance business . In: Proceedings of the 54th Hawaii International Conference on System Sciences 2021 Maui ; Hawaii .

13 Nadarzynski T , Miles O , Cowie A , Ridge D . Acceptability of artificial intelligence (AI)-led chatbot services in healthcare: a mixed-methods study . Digital Health . 2019 ; 5 : 1 - 12 .

14 Quah JT , Chua YW . Chatbot assisted marketing in financial service industry . International conference on services computing . Cham : Springer ; 2019 . 107 - 14 . doi: 10.1007/978-3-030-23554-3_8 .

15 Mogaji E , Balakrishnan J , Nwoba Christian A , Nguyen P . Emerging-market consumers' interactions with banking chatbots . Telemat Inform . 2021 ; 65 : 101711 . Elsevier .

16 Coopamootoo NM , Toreini E , Aitken M , Elliot K , Van Moorse A . Simulating the effects of social presence on trust, privacy concerns & usage intentions in automated bots for finance . In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . np .

17 Li L , Lee KY , Emokpae E , Yang S-B . What makes you continuously use chatbot services? Evidence from Chinese online travel agencies . Electron Mark . 2021 . doi: 10.1007/s12525-020-00454-z .

18 Pillai R , Sivathanu B . Adoption of AI-based chatbots for hospitality and tourism . Int J Contemp Hosp Manag . 2020 ; 32 ( 10 ): 3199 - 226 .

19 Gonzalez Melian S , Tano Gutierrez D , Gidumal Bulchand J . Predicting the intentions to use chatbots for travel and tourism . Curr Issues Tour . 2021 ; 24 ( 2 ): 192 - 210 .

20 Roy R , Naidoo V . Enhancing chatbot effectiveness: the role of anthropomorphic conversational styles and time orientation . J Bus Res . 2021 ; 126 : 23 - 34 .

21 Zarouali B , Van den Broeck E , Walrave M , Poels K . Predicting consumer responses to a chatbot on Facebook . Cyberpsychol Behav Soc Netw . 2018 ; 21 ( 8 ): 491 - 497 .

22 Chung M , Ko E , Joung H , Kim SJ . Chatbot e-service and customer satisfaction regarding luxury brands . J Bus Res . 2020 ; 117 : 587 - 95 .

23 Luo X , Tong S , Fang Z , Qu Z . Frontiers: machines vs humans: the impact of artificial intelligence chatbot disclosure on customer purchases . Mark Sci . 2019 ; 38 ( 6 ): 937 - 47 .

24 Kvale K , Freddi E , Hodnebrog S , Sell OA , Folstad A . Understanding the user experience of customer service chatbots: what can we learn from customer satisfaction surveys? Chatbot Research and Design . 4th International Workshop, CONVERSATIONS ; 2020 Nov 23-24 ; 2021 . 205 - 18 .

25 Van den Broeck E , Zarouali B , Poels K . Chatbot advertising effectiveness: when does the message get through? Comput Hum Behav . 2019 ; 98 : 150 - 7 .

26 Soni R , Pooja B . Trust in chatbots: investigating key factors influencing the adoption of chatbots by Generation Z . MuktShabd J . 2020 ; 9 ( 5 ): 5528 - 43 .

27 Sands S. , Ferraro C. , Campell C. , Tsao HY . Managing the human-chatbot divide:how service scripts influence service experience . J Serv Manag . 2021 . ISSN 1757-5818 Elsevier .

28 De Cicco R , Da Costa e Silva SCL , Alparone FR . It's on its way: chatbots applied for online food delivery services, social or task-oriented interaction style? J Food Serv Bus Res . 2020 ; 24 ( 2 ): 140 - 64 .

29 Van der Goot MJ , Pilgrim T . Exploring age differences in motivations for and acceptance of chatbot communication in a customer service context . International Workshop on Chatbot Research and Design, Book Series (LNCS, Volume 11970) . Springer Link ; 2020 . 173 - 86 .

30 Malik P , Gautam S , Srivastava S . A study on behavior intention for using chatbots . In: 8th International Conference on Reliability ; 2020 Jun 4-5 ; Noida, India : Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) ,Institute of Electrical and Electronics Engineers (IEEE) . np .

31 Huang Yu S. , Kao WK . Chatbot service usage during a pandemic: fear and social distancing . Serv Ind J . 2021 ; 41 ( 13-14 ): 964 - 84 .

32 Sheehan B. , Jin HS. , Gottlieb U . Customer service chatbots: anthropomorphism and adoption . J Bus Res . 2020 ; 115 : 14 - 24 .

33 Soni R , Tyagi V . Acceptance of chat bots by millennial consumers . Int J Res Eng Manag . 2019 ; 4 ( 10 ): 429 - 32 . ISSN 2454-9150 .

34 Trapero H , Ilao J , Lacaza R . An integrated theory for chatbot use in air travel: questionnaire development and validation . In: 2020 IEEE Region 10 Conference (TENCON) ; 2020 Nov 16-19 ; Osaka, Japan . 652 - 7 . doi: 10.1109/TENCON50793.2020.9293710 .

35 Van Eeuwen M . Mobile conversational commerce: messenger chatbots as the next interface between businesses and consumers . Master's thesis . University of Twente ; 2017 .

36 De Cosmo LM , Piper L , Di Vittorio A . The role of attitude toward chatbots and privacy concern on the relationship between attitude toward mobile advertising and behavioral intent to use chatbots . Ital J Mark . 2021 ; ( 1-2 ): 83 - 102 .

37 Kasilingam DL . Understanding the attitude and intention to use smartphone chatbots for shopping . Technol Soc . 2020 ; 62 : 15 .

38 Brachten F , Kissmer T , Stieglitz S . The acceptance of chatbots in an enterprise context – a survey study . Int J Inf Manag . 2021 ; 60 ( C ). doi: 10.1016/j.ijinfomgt.2021.102375 .

39 Selamat MA , Windasari NA . Chatbot for SMEs: integrating customer and business owner perspectives . Technol Soc . 2021 ; 66 ( C ). Elsevier , 101685 .

40 Cardona DR , Werth O , Schonborn S , Breitner MH . A mixed methods analysis of the adoption and diffusion of chatbot technology in the German insurance sector . In: Twenty-fifth Americas Conference on Information System ; Mexico : Cancun ; 2019 .

41 Fryer L , Nakao K , Thompson A . Chatbot learning partners: connecting learning experiences, interest and competence . Comput Hum Behav . 2019 ; 93 : 279 - 89 .

42 Almahri Amer Jid F , Bell D , Merhi M . Understanding student acceptance and use of chatbots in the United Kingdom universities: a structural equation modeling approach . In: 6th IEEE International Conference on Information Management ; UK : IEEE Xplore ; 2020 . 284 - 8 .

43 Misirlis N , Vlachopoulou M . Social media metrics and analytics in marketing-S3M: a mapping literature review . Int J Inf Manag . 2018 ; 38 ( 1 ): 270 - 6 .

44 Folstad A , Nordheum CB , Bjorkli CA . What makes users trust a chatbot for customer service. An exploratory interview study . In: 5 th International Conference on Internet Science- INSCI ; 2018 Oct 24-26 ; St. Petersburg, Russia : Internet Science ; 2018 . 194 - 208 .

45 Ciechanowski L , Przegalinska A , Magnuski M , Gloor P . In the shades of the uncanny valley: an experimental study of human-chatbot interaction . Future Gener Comput Syst . 2019 ; 92 : 539 - 48 .

46 Lee S. , Lee N. , Sah YJ . Perceiving a mind in a chatbot: effect of mind perception and social cues on co-presence, closeness, and intention to use . Int J Hum-Comput Int . 2020 ; 36 ( 10 ): 930 - 40 .

47 Kuberkar S , Singhal TK . Factors influencing adoption intention of AI powered chatbot for public transport services within a smart city . Int J Emerg Tech . 2020 ; 11 ( 3 ): 948 - 58 .

48 Huang D.H , Chueh HE . Chatbot usage intention analysis: veterinary consultation . J Innov Knowl . 2021 ; 6 ( 3 ): 135 - 44 .

49 Venkatesh V , Morris MG , Davis GB , Davis FD . User acceptance of information technology: toward a unified view . MIS Q Manag Inf Syst . 2003 ; 27 ( 3 ): 425 - 78 .

50 Venkatesh V , Thong JYL , Xu X . Consumer acceptance and use of information technology: extending the unified theory of acceptance and use of technology . MIS Q Manag Inf Syst . 2012 ; 36 ( 1 ): 157 - 78 .

51 Davis FD , Bagozzi RP , Warshaw PR . User acceptance of computer technology: a comparison of two theoretical models . Manag Sci . 1989 ; 35 ( 8 ): 982 - 1003 .

52 Cheng X , Bao Y , Zarifis A , Gong W. , Mou J . Exploring consumers' response to text-based chatbots in e-commerce: the moderating role of task complexity and chatbot disclosure . Internet Res . 2021 ; 32 ( 2 ): 1066 - 2243 . Emerald Publishing . doi: 10.1108/INTR-08-2020-0460 .

53 Silva S , De Cicco R , Alparone F . What kind of chatbot do millenials prefer to interact with? In: Proceedings of the 49th European Annual Marketing Academy Conference ; 2020 May 26-29 ; Budapest .

54 Go E , Sundar SS . Humanizing chatbots: the effects of visual, identity and conversational cues on humanness perceptions . Comput Hum Behav . 2019 ; 97 : 304 - 16 .

Corresponding author

Related articles, we’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

  • Open access
  • Published: 11 December 2021

A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss

  • Yoo Jung Oh   ORCID: orcid.org/0000-0002-7829-8535 1 ,
  • Jingwen Zhang 1 , 2 ,
  • Min-Lin Fang 3 &
  • Yoshimi Fukuoka 4  

International Journal of Behavioral Nutrition and Physical Activity volume  18 , Article number:  160 ( 2021 ) Cite this article

18k Accesses

65 Citations

21 Altmetric

Metrics details

This systematic review aimed to evaluate AI chatbot characteristics, functions, and core conversational capacities and investigate whether AI chatbot interventions were effective in changing physical activity, healthy eating, weight management behaviors, and other related health outcomes.

In collaboration with a medical librarian, six electronic bibliographic databases (PubMed, EMBASE, ACM Digital Library, Web of Science, PsycINFO, and IEEE) were searched to identify relevant studies. Only randomized controlled trials or quasi-experimental studies were included. Studies were screened by two independent reviewers, and any discrepancy was resolved by a third reviewer. The National Institutes of Health quality assessment tools were used to assess risk of bias in individual studies. We applied the AI Chatbot Behavior Change Model to characterize components of chatbot interventions, including chatbot characteristics, persuasive and relational capacity, and evaluation of outcomes.

The database search retrieved 1692 citations, and 9 studies met the inclusion criteria. Of the 9 studies, 4 were randomized controlled trials and 5 were quasi-experimental studies. Five out of the seven studies suggest chatbot interventions are promising strategies in increasing physical activity. In contrast, the number of studies focusing on changing diet and weight status was limited. Outcome assessments, however, were reported inconsistently across the studies. Eighty-nine and thirty-three percent of the studies specified a name and gender (i.e., woman) of the chatbot, respectively. Over half (56%) of the studies used a constrained chatbot (i.e., rule-based), while the remaining studies used unconstrained chatbots that resemble human-to-human communication.

Chatbots may improve physical activity, but we were not able to make definitive conclusions regarding the efficacy of chatbot interventions on physical activity, diet, and weight management/loss. Application of AI chatbots is an emerging field of research in lifestyle modification programs and is expected to grow exponentially. Thus, standardization of designing and reporting chatbot interventions is warranted in the near future.

Systematic review registration

International Prospective Register of Systematic Reviews (PROSPERO): CRD42020216761 .

Artificial Intelligence (AI) chatbots, also called conversational agents, employ dialogue systems to enable natural language conversations with users by means of speech, text, or both [ 1 ]. Powered by natural language processing and cloud computing infrastructures, AI chatbots can participate in a broad range, from constrained (i.e., rule-based) to unconstrained conversations (i.e., human-to-human-like communication) [ 1 ]. According to a Pew Research Center survey, 46% of American adults interact with voice-based chatbots (e.g., Apple’s Siri and Amazon’s Alexa) on smartphones and other devices [ 2 ]. The use of AI chatbots in business and finance is rapidly increasing; however, their use in lifestyle modification and health promotion programs remains limited.

Physical inactivity, poor diet, and obesity are global health issues [ 3 ]. They are well-known modifiable risk factors for cardiovascular diseases, type 2 diabetes, certain types of cancers, cognitive decline, and premature death [ 3 , 4 , 5 , 6 ]. However, despite years of attempts to raise awareness about the importance of physical activity (PA) and healthy eating, individuals often do not get enough PA nor do they have healthy eating habits [ 7 , 8 ], resulting in an increasing prevalence of obesity [ 9 , 10 ]. With emerging digital technologies, there has been an increasing number of programs aimed at promoting PA, healthy eating, and/or weight loss, that utilize the internet, social media, and mobile devices in diverse populations [ 11 , 12 , 13 , 14 ]. Several systematic reviews and meta-analyses [ 15 , 16 , 17 , 18 , 19 ] have shown that these digital technology-based programs resulted in increased PA and reduced body weight, at least for a short duration. While digital technologies may not address environmental factors that constrain an individual’s health environment, technology-based programs can provide instrumental help in finding healthier alternatives or facilitating the creation of supportive social groups [ 13 , 14 ]. Moreover, these interventions do not require traditional in-site visits, and thus, help reduce participants’ time and financial costs [ 16 ]. Albeit such potentials, current research programs are still constrained in their capacity to personalize the intervention, deliver tailored content, or adjust the frequency and timing of the intervention based on individual needs in real time.

These limitations can be overcome by utilizing AI chatbots, which have great potential to increase the accessibility and efficacy of personalized lifestyle modification programs [ 20 , 21 ]. Enabling AI chatbots to communicate with individuals via web or mobile applications can make these personalized programs available 24/7 [ 21 , 22 ]. Furthermore, AI chatbots provide new communication modalities for individuals to receive, comprehend, and utilize information, suggestions, and assistance on a personal level [ 20 , 22 ], which can help overcome one’s lack of self-efficacy or social support [ 20 ]. AI chatbots have been utilized in a variety of health care domains such as medical consultations, disease diagnoses, mental health support [ 1 , 23 ], and more recently, risk communications for the COVID-19 pandemic [ 24 ]. Results from a few systematic reviews and meta-analyses suggest that chatbots have a high potential for healthcare and psychiatric use, such as promoting antipsychotic medication adherence as well as reducing stress, anxiety, and/or depression symptoms [ 1 , 25 , 26 ]. However, to the best of our knowledge, none of these studies have focused on the efficacy of AI chatbot-based lifestyle modification programs and the evaluation of chatbot designs and technologies.

Therefore, this systematic review aimed to describe AI chatbot characteristics, functions (e.g., the chatbot’s persuasive and relational strategies), and core conversational capacities, and investigate whether AI chatbot interventions were effective in changing PA, diet, weight management behaviors, and other related health outcomes. We applied the AI Chatbot Behavior Change Model [ 22 ], designed to inform the conceptualization, design, and evaluation of chatbots, to guide our review. The systematic review provides new insights about the strengths and limitations in current AI chatbot-based lifestyle modification programs and can assist researchers and clinicians in building scalable and personalized systems for diverse populations.

The protocol of this systematic review was registered at the International Prospective Register of Systematic Reviews (PROSPERO) (ID: CRD42020216761). The systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis guidelines.

Eligibility criteria

Table  1 shows the summary of the inclusion and exclusion criteria of the study characteristics based on the PICOS framework (i.e., populations/participants, interventions and comparators, outcome(s) of interest, and study designs/type) [ 27 ]. We included peer-reviewed papers or conference proceedings that were available in full-text written in English. Review papers, protocols, editorials, opinion pieces, and dissertations were excluded.

Information sources and search strategy

In consultation with a medical librarian (MF), pre-planned systematic search strategies were used for six electronic databases (PubMed, EMBASE, ACM Digital Library, Web of Science Core Collection, PsycINFO, and IEEE). A combination of MeSH/Emtree terms and keyword searches were used to identify studies on AI chatbot use in lifestyle changes; the comprehensive search strategies for each database are provided in Additional file  1 . Further, hand-searching was done to ensure that relevant articles were not missed during the data collection. The searches were completed on November 14, 2020. No date limits were applied to the searches.

Study selection

All retrieved references were imported into the Endnote reference management software [ 28 ], and duplicates were removed. The remaining references were imported into the Covidence systematic review software [ 29 ], and additional duplicates were removed. Before screening the articles, three researchers (YO, JZ, and YF) met to discuss the procedure for title and abstract screening using 20 randomly selected papers. In the first phase of screening, two researchers (YO and JZ) independently assessed all study titles and abstracts against the eligibility criteria in Table 1 . The agreement in the abstract and title screening between the two reviewers was 97.4% (Cohen’s Kappa = .725). Then, they (YO and JZ) read the remaining studies in full length. The agreement for full text screening was 91.9% (Cohen’s Kappa = .734). Discrepancies at each stage were resolved through discussion with a third researcher (YF).

Data collection process and data items

Data extraction forms were developed based on the AI Chatbot Behavior Change Model [ 22 ], which provides a comprehensive framework for analyzing and evaluating chatbot designs and technologies. This model consists of four major components that provide guidelines to develop and evaluate AI chatbots for health behavior changes: 1) designing chatbot characteristics and understanding user background, 2) building relational capacity, 3) building persuasive capacity, and 4) evaluating mechanisms and outcomes. Based on the model, the data extraction forms were initially drafted by YF and discussed among the research team members. One researcher (YO) extracted information on study and sample characteristics, chatbot characteristics, intervention characteristics, outcome measures and results for main outcomes (i.e., PA, diet, and weight loss) and secondary outcomes (i.e., engagement, acceptability/satisfaction, adverse events, and others). Study and sample characteristics consisted of study aim, study design, theoretical framework, sample size, age, sex/gender, race/ethnicity, education, and income. Chatbot characteristics included the systematic features the chatbots were designed with (i.e., chatbot name and gender, media, user input, conversation initiation, relational capacity, persuasion capacity, safety, and ethics discussion). Intervention characteristics included information such as intervention duration and frequency, intervention components, and technological features (e.g., system infrastructure, platform). Two researchers (YF and JZ) independently validated the extracted data.

Quality assessment and risk of bias

Two reviewers (YO and JZ) independently evaluated the risk of bias of included studies using the two National Institutes of Health (NIH) quality assessment tools [ 30 ]. Randomized controlled trials (RCTs) were assessed for methodological quality using the NIH Quality Assessment of Controlled Intervention Studies. For quasi-experimental studies, the NIH Quality Assessment Tool for Before-After (Pre-Post) Studies with No Control Group was used. Using these tools, the quality of each study was categorized into three groups (“good,” “fair,” and “poor”). These tools were used to assess confidence in the evaluations and conclusions of this systematic review. We did not use these tools to exclude the findings of poor quality studies. It should be noted that the studies included in this systematic review were behavioral intervention trials targeting individual-level outcomes. Therefore, criteria asking 1) whether participants did not know which treatment group they were assigned to and 2) the statistical analyses of group-level data were considered inapplicable.

Synthesis of results

Due to the heterogeneity in the types of study outcomes, outcome measures, and clinical trial designs, we qualitatively evaluated and synthesized the results of the studies. We did not conduct a meta-analysis and did not assess publication bias.

Figure  1 shows the study selection process. The search yielded 2360 references in total, from which 668 duplicates were removed. A total of 1692 abstracts were then screened, among which 1630 were judged ineligible, leaving 62 papers to be read in full text. In total, 9 papers met the eligibility criteria and were included.

figure 1

Flow diagram of the article screening process

Summary of study designs and sample characteristics

The 9 included papers had been recently published (3 were published in 2020 [ 20 , 31 , 32 ], 4 in 2019 [ 21 , 33 , 34 , 35 ], and 2 in 2018 [ 36 , 37 ]). Table  2 provides details of the characteristics of each study. Two studies [ 21 , 37 ] were conducted in the United States and the remaining 7 were conducted in Switzerland [ 31 , 33 , 36 ], Australia [ 20 ], South Korea [ 32 ], and Italy [ 34 ] (1 not reported [ 35 ]). In total, 891 participants were represented in the 9 studies, with sample sizes ranging from 19 to 274 participants. The mean age of the samples ranged from 15.2 to 56.2 years (SD range  = 2.0 to 13.7), and females/women represented 42.1 to 87.9% of the sample. One study [ 21 ] solely targeted an adolescent population, whereas most studies targeted an adult population [ 20 , 31 , 32 , 33 , 34 , 35 , 37 ]. One study [ 36 ] did not report the target population’s age. Participants’ race/ethnicity information was not reported in 8 out of the 9 studies. The study [ 21 ] that reported participants’ race/ethnicity information included 43% Hispanic, 39% White, 9% Black, and 9% Asian participants. Participants’ education and income backgrounds were not reported in 5 out of the 9 studies. Among the 4 studies [ 31 , 34 , 35 , 37 ] that reported the information, the majority included undergraduate students or people with graduate degrees. Overall, reporting of participants’ sociodemographic information was inconsistent and insufficient across the studies.

Five studies employed quasi-experimental designs [ 20 , 21 , 35 , 36 , 37 ], and 4 were RCTs [ 31 , 32 , 33 , 34 ]. Only 5 studies [ 21 , 31 , 32 , 35 , 37 ] used at least one theoretical framework. One was guided by 3 theories [ 35 ] and another by 4 theories [ 21 ]. The theories used in the 5 studies included the Health Action Process Approach ( n  = 2), the Habit Formation Model ( n  = 1), the Technology Acceptance Model ( n  = 1), the AttrakDiff Model ( n  = 1), Cognitive Behavioral Therapy ( n  = 1), Emotionally Focused Therapy ( n  = 1), Behavioral Activation ( n  = 1), Motivational Interviewing ( n  = 1), and the Structured Reflection Model ( n  = 1). It is notable that most of these theories were used to design the intervention contents for inducing behavioral changes. Only the Technology Acceptance Model and the AttrakDiff Model were relevant for guiding the designs of the chatbot characteristics and their technological platforms, independent from intervention contents.

Summary of intervention and chatbot characteristics

Figure  2 provides a visual summary of AI chatbot characteristics and intervention outcomes, and Table  3 provides more detailed information. The 9 studies varied in intervention and program length, lasting from 1 week to 3 months. For most studies ( n  = 8), the chatbot was the only intervention component for delivering contents and engaging with the participants. One study used multi-intervention components, and participants had access to an AI chatbot along with a study website with educational materials [ 20 ]. A variety of commercially available technical platforms were used to host the chatbot and deliver the interventions, including Slack ( n  = 2), KakaoTalk ( n  = 1), Facebook messenger ( n  = 3), Telegram messenger ( n  = 1), WhatsApp ( n  = 1), and short messaging services (SMS) ( n  = 2). One study used 4 different platforms to deliver the intervention [ 21 ], and 2 studies used a chatbot app (i.e., Ally app) that was available on both Android and iOS systems [ 31 , 33 ].

figure 2

Summary of chatbot characteristics and intervention outcomes

Following the AI Chatbot Behavior Change Model [ 22 ], we extracted features of the chatbot and intervention characteristics (Table 3 ). Regarding chatbot characteristics, identity features, such as specific names ( n  = 8) [ 20 , 21 , 31 , 32 , 33 , 35 , 36 , 37 ] and chatbot gender ( n  = 3) [ 20 , 31 , 33 ], were specified. Notably, the chatbot gender was woman in the 3 studies that reported it [ 20 , 31 , 33 ]. All 9 chatbots delivered messages in text format. In addition to text, 3 chatbots used graphs [ 31 , 33 , 37 ], 2 used images [ 32 , 35 ], 1 used voice [ 21 ], and 1 used a combination of graphs, images, and videos [ 36 ].

In 5 studies, the chatbots were constrained (i.e., users could only select pre-programmed responses in the chat) [ 31 , 33 , 34 , 35 , 36 ], and in 4, the chatbots were unconstrained (i.e., users could freely type or speak to the chatbot) [ 20 , 21 , 32 , 37 ]. Six chatbots [ 31 , 32 , 33 , 34 , 36 , 37 ] delivered daily intervention messages to the study participants. One chatbot communicated only on a weekly basis [ 20 ], and 1 communicated daily, weekly, on weekends or weekdays or at a scheduled date and time [ 35 ]. One study did not specify when and how often the messages were delivered [ 21 ]. Only 3 chatbots [ 20 , 21 , 32 ] were available on-demand so that users could initiate conversation at any preferred time. Most chatbots were equipped with relational capacity ( n  = 8; i.e., conversation strategy to establish, maintain, or enhance social relationships with users) and persuasive capacity ( n  = 9; i.e., conversation strategy to change user’s behaviors and behavioral determinants), meaning that the conversations were designed to induce behavioral changes while engaging with users socially. While only 1 study [ 21 ] documented data security, none of the studies provided information on participant safety or ethics (i.e., ethical principle or standards with which the chatbot is designed).

Summary of outcome measures and changes in outcomes

Figure 2 also illustrates the outcome measures and changes in the main and secondary outcomes reported in both RCTs and quasi-experimental studies. Among 7 studies that measured PA [ 20 , 21 , 31 , 32 , 33 , 35 , 37 ], 2 used objective measures [ 31 , 33 ], 4 used self-reported measures [ 20 , 21 , 32 , 35 ], and 1 used both [ 37 ]. Self-reported dietary intake was measured in 4 studies [ 20 , 34 , 35 , 36 ]. Only 1 study assessed objective changes in weight in a research office visit [ 20 ]. Details of intervention outcomes, including direction of effects, statistical significance, and magnitude, are presented in Table  4 .

Sample sizes of the 4 RCT studies ranged from 106 to 274 and a priori power analyses were reported in 3 [ 31 , 32 , 34 ], which showed that the sample sizes had sufficient power for analyzing the specified outcomes. Of the 4 RCT studies [ 31 , 32 , 33 , 34 ], 3 reported PA outcomes using daily step count [ 31 , 33 ] and a self-reported habit index [ 32 ]. In these RCTs, the AI chatbot intervention group resulted in a significant increase in PA, as compared to the control group, over the respective study period (6 weeks to 3 months). In terms of dietary change, 1 study [ 34 ] reported that participants in the intervention group showed higher self-reported intention to reduce red and processed meat consumption compared to the control group during a 2-week period.

In contrast, sample sizes for the 5 quasi-experimental studies were small, ranging from 19 to 36 participants, suggesting that these studies may lack statistical power to detect potential intervention effects. Among the 5 quasi-experimental studies, 2 [ 21 , 37 ] reported only PA change outcomes, 1 [ 36 ] reported only diet change outcomes, and 2 [ 20 , 35 ] reported both outcomes. With regard to PA-related outcomes, 2 studies reported statistically significant improvements [ 20 , 37 ]. Specifically, [ 20 ] observed increased moderate and vigorous PA over the study period [ 37 ]. found a significant increase in the habitual action of PA. One study [ 35 ] found no difference in PA intention within the intervention period. Although this study did not observe a statistically significant increase in PA intention, it revealed that among participants with either high or low intervention adherence, their PA intention showed an increasing trend over the study period [ 21 ]. only reported descriptive statistics and showed that participants experienced positive progress towards PA goals 81% of the time.

Among the quasi-experimental studies, only 1 study reported a statistically significant increase in diet adherence over 12 weeks [ 20 ] [ 35 ]. reported no difference of healthy diet intention over 3 weeks. In this study, participants with high intervention adherence showed a marginal increase, whereas, those with low adherence showed decreased healthy diet intention [ 36 ]. reported that participants’ meal consumption improved in 65% of the cases. The only study [ 20 ] reporting pre-post weight change outcomes using objective weight measures showed that participants experienced a significant weight loss (1.3 kg) from baseline to 12 weeks. To summarize, non-significant findings and a lack of statistical reporting were more prevalent in the quasi-experimental studies, but the direction of intervention effects were similar to those reported in the RCTs.

Engagement, acceptability/satisfaction, and safety measures were reported as secondary outcomes in 7 studies [ 20 , 21 , 31 , 33 , 35 , 36 , 37 ]. Five studies reported engagement [ 20 , 21 , 31 , 33 , 37 ] using various types of measurements, such as user response rate to chatbot messages [ 31 ], frequency of users’ weekly check-ins [ 20 ], and length of conversations between the chatbot and users [ 21 ]. Three studies measured acceptability/satisfaction of the chatbot [ 21 , 35 , 36 ] using measures such as technology acceptance [ 35 ], helpfulness of the chatbot [ 21 ], and perceived efficiency of chatbot communications [ 36 ]. Regarding reporting of adverse events (e.g., experiencing side effects from interventions), only 1 study reported that no adverse events related to study participation were experienced [ 20 ]. Three studies reported additional measures, including feasibility of subject enrollment [ 20 ], using the AttrakDiff questionnaire for measuring four aspects of the chatbot (i.e., pragmatic, hedonic, appealing, social) [ 35 ], and assessing perceived mindfulness about own behaviors [ 37 ].

Among 5 studies that reported engagement [ 20 , 21 , 31 , 33 , 37 ], only 1 [ 33 ] reported statistical significance of the effects of intrinsic (e.g., age, personality traits) and extrinsic factors (e.g., time and day of the delivery, location) on user engagement (e.g., conversation engagement, response delay). Among 3 studies [ 21 , 35 , 36 ] that reported acceptability/satisfaction, 1 study [ 35 ] found that the acceptability of the chatbot was significantly higher than the middle score corresponding to “neutral” (i.e., 4 on a 7-point scale). One study that reported the safety of the intervention did not include statistical significance [ 20 ]. Three studies reported other measures [ 20 , 35 , 37 ], and 1 found that pragmatic, hedonic, appealing, and social ratings of the chatbot were significantly higher than the middle score [ 35 ]. Another study [ 37 ] found no significant changes in the perceived mindfulness between pre- and post-study.

Summary of quality assessment and risk of bias

The results of risk of bias assessments of the 9 studies are reported in Additional file  2 . Of the 4 RCT studies [ 31 , 32 , 33 , 34 ], 3 were rated as fair [ 31 , 32 , 34 ] and 1 was rated as poor [ 33 ] due to its lack of reporting of several critical. The poorly rated study did not report overall dropout rates or the differential dropout rates between treatment groups, did not report that the sample size was sufficiently large to be able to detect differences between groups (i.e., no power analysis), and did not prespecify outcomes for hypothesis testing. Of the 5 quasi-experimental studies [ 20 , 21 , 35 , 36 , 37 ], 1 study was rated as fair [ 20 ] and 4 studies were rated as poor [ 21 , 35 , 36 , 37 ] due to flaws with regard to several critical. These studies reported neither a power analysis to ensure that the sample size was sufficiently large, nor follow-up rates after baseline. Additionally, the statistical methods did not examine pre-to-post changes in outcome measures and lacked reporting of statistical significance.

This systematic review aimed to evaluate the characteristics and potential efficacy of AI chatbot interventions to promote PA, healthy diet, and/or weight management. Most studies focused on changes in PA, and majority [ 20 , 31 , 32 , 33 , 37 ] reported significant improvements in PA-related behaviors. The number of studies with the aim to change diet and weight status was small. Two studies [ 20 , 34 ] found significant improvements in diet-related behaviors. Although only 1 study [ 20 ] reported weight-related outcomes, it reported significant weight change after the intervention. In summation, chatbots can improve PA, but the study not able to make definitive conclusions on the potential efficacy of chatbot interventions on promoting PA, healthy eating, or weight loss.

This qualitative synthesis of effects needs to be interpreted with caution given that the reviewed studies lack consistent usage of measurements and reporting of outcome evaluations. These studies used different measurements and statistical methods to evaluate PA and diet outcomes. For example, 1 study [ 20 ] measured one’s self-reported change in MVPA during the intervention period to gauge the efficacy of the intervention, whereas in another study [ 31 ] step-goal achievement was used as a measure of the intervention efficacy. The two quasi-experimental studies did not report statistical significance of the pre-post changes in PA or diet outcomes [ 21 , 36 ]. Such inconsistency in evaluating the potential efficacy of interventions has been reported in previous systematic reviews [ 1 , 38 ]. To advance the application of chatbot interventions in lifestyle modification programs and to demonstrate the rigor of their efficacy, future studies should examine multiple behavior change indicators, ideally incorporating objectively measured outcomes.

Consistent with other systematic reviews of chatbot interventions in health care and mental health [ 1 , 38 ], reporting of participants’ engagement, acceptability/satisfaction, and adverse events was limited in the studies. In particular, engagement, acceptability, and satisfaction measures varied across the studies, impeding the systematic summarization and assessment of various intervention implementations. For instance, 1 study [ 33 ] used user response rates and user response delay as engagement measures, whereas in another study [ 21 ], the duration of conversation and the ratio of chatbot-initiated on patient-initiated conversations were used to assess the level of user engagement. Inconsistent reporting of user engagement, acceptability, and satisfaction measures may be problematic because it could contribute challenges to the interpretation and comparison of the results across different chatbot systems [ 1 ]. Therefore, standardization of these measures should be implemented in future research. For example, as suggested in previous studies [ 39 , 40 ], conversational turns per session can be a viable, objective, and quantitative metric for user engagement. Regarding reporting of adverse events, despite the recommendation of reporting adverse events in clinical trials by the Consolidated Standards of Reporting Trials Group [ 41 ], only 1 study [ 20 ] reported adverse events. It is recommended that future studies consistently assess and report any unexpected events resulting from the use of AI chatbots to prevent any side effects or potential harm to participants.

Theoretical frameworks for designing and evaluating a chatbot system are essential to understand the rationale behind participants’ motivation, engagement, and behaviors. However, theoretical frameworks were not reported in many of the studies included in this systematic review. The lack of theoretical foundations of existing chatbot systems has also been noted in previous literature [ 42 ]. In this review, we found that the majority of AI chatbots were equipped with persuasion strategies (e.g., setting personalized goals) and relational strategies (e.g., showing empathy) to establish, maintain, or enhance social relationships with participants. The application of theoretical frameworks will guide in developing effective communicative strategies that can be implemented into chatbot designs. For example, designing chatbots with personalized messages can be more effective than non-tailored and standardized messages [ 43 , 44 ]. For relational strategies, future studies can benefit from drawing on the literature on human-computer interaction and relational agents (e.g., [ 45 , 46 ]) and interpersonal communication theories (e.g., Social Penetration Theory [ 47 ]) to develop strategies to facilitate relation formation between participants and chatbots.

Regarding designs of chatbot characteristics and dialogue systems, the rationale behind using human-like identity features (e.g., gender selection) on chatbots was rarely discussed. Only 1 study [ 31 ] referred to literature on human-computer interaction [ 48 ] and discussed the importance of using human-like identity features on chatbots to facilitate successful human-chatbot relationships. Additionally, only one chatbot [ 21 ] was able to deliver spoken outputs. This is inconsistent with a previous systematic review on chatbots used in health care, in which spoken chatbot output was identified as the most common delivery mode across the studies [ 1 ].

With regard to user input, over half of the studies [ 31 , 33 , 34 , 35 , 36 ] used a constrained AI chatbot, while the remaining [ 20 , 21 , 32 , 37 ] used unconstrained AI chatbots. Constrained AI chatbots are rule-based, well-structured, and easy to build, control, and implement, thus ensuring the quality and consistency in the structure and delivery of content [ 42 ]. However, they are not able to adapt to participants’ inquiries and address emergent questions, and are, thus, not suitable for sustaining more natural and complex interactions with participants [ 42 ]. In contrast, unconstrained AI chatbots are known to simulate naturalistic human-to-human communication and may strengthen interventions in general, particularly in the long-term, due to their flexibility and adaptability in conversations [ 1 , 38 , 42 ]. With increasing access to large health care datasets, advanced technologies [ 49 ], and new developments in machine learning that allow for complex dialogue management methods and conversational flexibility [ 1 ], employing unconstrained chatbots to yield long-term efficacy may become more feasible in future research. For instance, increasing the precision of natural language understanding and generation will allow for AI chatbots to better engage users in conversations and follow up with tailored intervention messages.

Safety and data security criteria are essential in designing chatbots. However, only 1 study provided descriptions of these criteria. Conversations between study participants and chatbots should be carefully monitored since erroneous chatbot responses may result in unintended harm. In particular, as conversational flexibility increases, there may be an increase in potential errors associated with natural language understanding or response generation [ 1 ]. Thus, using unconstrained chatbots should be accompanied with careful monitoring of participant and chatbot interactions, and of safety functions.

Strengths and limitations

This review has several strengths. First, to the best of our knowledge, this is the first review to systematically examine the characteristics and potential efficacy of AI chatbot interventions in lifestyle modifications, thereby providing crucial insights for identifying gaps and future directions for research and clinical practice. Second, we developed comprehensive search strategies with an MLS for six electronic databases to increase the sensitivity and comprehensiveness of our search. Despite its strengths, several limitations need to also be acknowledged. First, we did not search gray literature in this systematic review. Second, we limited our search to peer-reviewed studies published as full-text in English only. Lastly, due to the heterogeneity of outcome measures and the limited number of RCT designs in this systematic review, we were not able to conduct a meta-analysis and make firm conclusions of the potential efficacy of chatbot interventions. In addition, the small sample sizes used by the studies made it difficult to scale the results to general populations. More RCTs with larger sample sizes and longer study durations are needed to determine the efficacy of AI chatbot interventions on improving PA, diet, and weight loss.

Conclusions

AI chatbot technologies and their commercial applications continue to rapidly develop, as do the number of studies about these technologies. Chatbots may improve PA, but this study was not able to make definitive conclusions of the potential efficacy of chatbot interventions on PA, diet, and weight management/loss. Despite the rapid increase in publications about chatbot designs and interventions, standard measures for evaluating chatbot interventions and theory-guided chatbots are still lacking. Thus, there is a need for future studies to use standardized criteria for evaluating chatbot implementation and efficacy. Additionally, theoretical frameworks that can capture the unique factors of human-chatbot interactions for behavior changes need to be developed and used to guide future AI chatbot interventions. Lastly, as increased adoption of chatbots will be expected for diverse populations, future research needs to consider equity and equality in designing and implementing chatbot interventions. For target populations with different sociodemographic backgrounds (e.g., living environment, race/ethnicity, cultural backgrounds, etc.), specifically tailored designs and sub-group evaluations need to be employed to ensure adequate delivery and optimal intervention impact.

Availability of data and materials

Not applicable.

Abbreviations

Artificial Intelligence

International Prospective Register of Systematic Reviews

Randomized controlled trial

  • Physical activity

Medical librarian

Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58.

Article   PubMed   PubMed Central   Google Scholar  

Pew Research Center. Nearly half of Americans use digital vocie assistants, mostly on their smartphones 2017. Available from: https://www.pewresearch.org/fact-tank/2017/12/12/nearly-half-of-americans-use-digital-voice-assistants-mostly-on-their-smartphones/ .

Google Scholar  

Farhud DD. Impact of lifestyle on health. Iran J Public Health. 2015;44(11):1442.

PubMed   PubMed Central   Google Scholar  

Cecchini M, Sassi F, Lauer JA, Lee YY, Guajardo-Barron V, Chisholm D. Tackling of unhealthy diets, physical inactivity, and obesity: health effects and cost-effectiveness. Lancet. 2010;376(9754):1775–84.

Article   PubMed   Google Scholar  

Wagner K-H, Brath H. A global view on the development of non communicable diseases. Prev Med. 2012;54:S38–41.

Bennett JE, Stevens GA, Mathers CD, Bonita R, Rehm J, Kruk ME, et al. NCD countdown 2030: worldwide trends in non-communicable disease mortality and progress towards sustainable development goal target 3.4. Lancet. 2018;392(10152):1072–88.

Article   Google Scholar  

Clarke T, Norris T, Schiller JS. Early release of selected estimates based on data from the National Health Interview Survey. Natl Center Health Stat. 2019.

Department of Health and Human Services. Physical Activity Guidelines for Americans, 2nd edition. Washington, DC: U.S. Department of Health and Human Services; 2018. Available from: https://health.gov/sites/default/files/2019-09/Physical_Activity_Guidelines_2nd_edition.pdf .

Hales C, Carroll M, Fryar C, Ogden C. Prevalence of obesity and severe obesity among adults: United States, 2017–2018. NCHS Data Brief, no 360. Hyattsville: National Center for Health Statistics; 2020.

Prentice AM. The emerging epidemic of obesity in developing countries. Int J Epidemiol. 2006;35(1):93–9.

Vandelanotte C, Müller AM, Short CE, Hingle M, Nathan N, Williams SL, et al. Past, present, and future of eHealth and mHealth research to improve physical activity and dietary behaviors. J Nutr Educ Behav. 2016;48(3):219–28. e1.

Case MA, Burwick HA, Volpp KG, Patel MS. Accuracy of smartphone applications and wearable devices for tracking physical activity data. Jama. 2015;313(6):625–6.

Article   PubMed   CAS   Google Scholar  

Zhang J, Brackbill D, Yang S, Becker J, Herbert N, Centola D. Support or competition? How online social networks increase physical activity: a randomized controlled trial. Prev Med Rep. 2016;4:453–8.

Zhang J, Brackbill D, Yang S, Centola D. Efficacy and causal mechanism of an online social media intervention to increase physical activity: results of a randomized controlled trial. Prev Med Rep. 2015;2:651–7.

Mateo GF, Granado-Font E, Ferré-Grau C, Montaña-Carreras X. Mobile phone apps to promote weight loss and increase physical activity: a systematic review and meta-analysis. J Med Internet Res. 2015;17(11):e253.

Manzoni GM, Pagnini F, Corti S, Molinari E, Castelnuovo G. Internet-based behavioral interventions for obesity: an updated systematic review. Clin Pract Epidemiol Ment Health. 2011;7:19.

Beleigoli AM, Andrade AQ, Cançado AG, Paulo MN, Maria De Fátima HD, Ribeiro AL. Web-based digital health interventions for weight loss and lifestyle habit changes in overweight and obese adults: systematic review and meta-analysis. J Med Internet Res. 2019;21(1):e298.

Laranjo L, Arguel A, Neves AL, Gallagher AM, Kaplan R, Mortimer N, et al. The influence of social networking sites on health behavior change: a systematic review and meta-analysis. J Am Med Inform Assoc. 2015;22(1):243–56.

Laranjo L, Ding D, Heleno B, Kocaballi B, Quiroz JC, Tong HL, et al. Do smartphone applications and activity trackers increase physical activity in adults? Systematic review, meta-analysis and metaregression. Br J Sports Med. 2021;55(8):422–32.

Maher CA, Davis CR, Curtis RG, Short CE, Murphy KJ. A physical activity and diet program delivered by artificially intelligent virtual health coach: proof-of-concept study. JMIR mHealth uHealth. 2020;8(7):e17558.

Stephens TN, Joerin A, Rauws M, Werk LN. Feasibility of pediatric obesity and prediabetes treatment support through Tess, the AI behavioral coaching chatbot. Transl Behav Med. 2019;9(3):440–7.

Zhang J, Oh YJ, Lange P, Yu Z, Fukuoka Y. Artificial intelligence Chatbot behavior change model for designing artificial intelligence Chatbots to promote physical activity and a healthy diet. J Med Internet Res. 2020;22(9):e22845.

Pereira J, Díaz Ó. Using health chatbots for behavior change: a mapping study. J Med Syst. 2019;43(5):135.

Miner AS, Laranjo L, Kocaballi AB. Chatbots in the fight against the COVID-19 pandemic. NPJ Digit Med. 2020;3(1):1–4.

Gentner T, Neitzel T, Schulze J, Buettner R. A Systematic literature review of medical chatbot research from a behavior change perspective. In 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2020. p. 735-40.

Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Can J Psychiatry. 2019;64(7):456–64.

Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Making. 2007;7(1):1–6.

Clarivate Analytics. Endnote x9 2019. Available from: https://endnote.com/ .

Covidence systematic review software. Melbourne, Australia: Veritas Health Innovation. Available from: www.covidence.org .

NIH National Heart, Lung, and Blood Institute. Study quality assessment tools. Available from: https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools .

Kramer J-N, Künzler F, Mishra V, Smith SN, Kotz D, Scholz U, et al. Which components of a smartphone walking app help users to reach personalized step goals? Results from an optimization trial. Ann Behav Med. 2020;54(7):518–28.

Piao M, Ryu H, Lee H, Kim J. Use of the healthy lifestyle coaching Chatbot app to promote stair-climbing habits among office workers: exploratory randomized controlled trial. JMIR mHealth uHealth. 2020;8(5):e15085.

Künzler F, Mishra V, Kramer J-N, Kotz D, Fleisch E, Kowatsch T. Exploring the state-of-receptivity for mhealth interventions. Proc ACM Interact Mobile Wearable Ubiquitous Technol. 2019;3(4):1–27.

Carfora V, Bertolotti M, Catellani P. Informational and emotional daily messages to reduce red and processed meat consumption. Appetite. 2019;141:104331.

Fadhil A, Wang Y, Reiterer H. Assistive conversational agent for health coaching: a validation study. Methods Inf Med. 2019;58(1):009–23.

Casas J, Mugellini E, Khaled OA, editors. Food diary coaching chatbot. Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers; 2018.

Kocielnik R, Xiao L, Avrahami D, Hsieh G. Reflection companion: a conversational system for engaging users in reflection on physical activity. Proc ACM Interact Mobile Wearable Ubiquitous Technol. 2018;2(2):1–26.

Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The effectiveness of artificial intelligence conversational agents in health care: systematic review. J Med Internet Res. 2020;22(10):e20346.

Abd-Alrazaq A, Safi Z, Alajlani M, Warren J, Househ M, Denecke K. Technical metrics used to evaluate health care Chatbots: scoping review. J Med Internet Res. 2020;22(6):e18301.

Shum H-Y, He X-d, Li D. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic. Engineering. 2018;19(1):10–26.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux P, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol. 2010;63(8):e1–e37.

Fadhil A. Can a chatbot determine my diet?: Addressing challenges of chatbot application for meal recommendation. arXiv preprint arXiv:180209100. 2018.

Kreuter MW, Wray RJ. Tailored and targeted health communication: strategies for enhancing information relevance. Am J Health Behav. 2003;27(1):S227–S32.

Noar SM, Harrington NG, Aldrich RS. The role of message tailoring in the development of persuasive health communication messages. Ann Int Commun Assoc. 2009;33(1):73–133.

Bickmore TW, Caruso L, Clough-Gorr K, Heeren T. ‘It’s just like you talk to a friend’relational agents for older adults. Interact Comput. 2005;17(6):711–35.

Sillice MA, Morokoff PJ, Ferszt G, Bickmore T, Bock BC, Lantini R, et al. Using relational agents to promote exercise and sun protection: assessment of participants’ experiences with two interventions. J Med Internet Res. 2018;20(2):e48.

Altman I, Taylor DA. Social penetration: the development of interpersonal relationships. New York: Holt, Rinehart & Winston; 1973.

Nass C, Steuer J, Tauber ER, editors. Computers are social actors. Proceedings of the SIGCHI conference on Human factors in computing systems; 1994.

Murdoch TB, Detsky AS. The inevitable application of big data to health care. Jama. 2013;309(13):1351–2.

Download references

Acknowledgements

This project was supported by a grant (K24NR015812) from the National Institute of Nursing Research (Dr. Fukuoka) and the Team Science Award by the University of California, San Francisco Academic Senate Committee on Research. Publication made possible in part by support from the UCSF Open Access Publishing Fund. The study sponsors had no role in the study design; collection, analysis, or interpretation of data; writing the report; or the decision to submit the report for publication.

Author information

Authors and affiliations.

Department of Communication, University of California Davis, Davis, USA

Yoo Jung Oh & Jingwen Zhang

Department of Public Health Sciences, University of California Davis, Davis, USA

Jingwen Zhang

Education and Research Services, University of California, San Francisco (UCSF) Library, UCSF, San Francisco, USA

Min-Lin Fang

Department of Physiological Nursing, UCSF, San Francisco, USA

Yoshimi Fukuoka

You can also search for this author in PubMed   Google Scholar

Contributions

YF, JZ, and YO contributed to the conception and design of the review; MF and YO developed the search strategies; YF, JZ, and YO contributed to the screening of papers and synthesizing the results into tables; YF, JZ, and YO wrote sections of the systematic review. All authors contributed to manuscript revision, read, and approved the submitted version. YF is the guarantor of the review.

Corresponding author

Correspondence to Yoo Jung Oh .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Search strategies for PubMed, EMBASE, ACM Digital Library, Web of Science, PsycINFO, and IEEE.

Additional file 2.

Summary of quality assessment and risk of bias.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Oh, Y.J., Zhang, J., Fang, ML. et al. A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss. Int J Behav Nutr Phys Act 18 , 160 (2021). https://doi.org/10.1186/s12966-021-01224-6

Download citation

Received : 31 May 2021

Accepted : 10 November 2021

Published : 11 December 2021

DOI : https://doi.org/10.1186/s12966-021-01224-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Conversational agent
  • Weight loss
  • Weight maintenance
  • Sedentary behavior
  • Systematic review

International Journal of Behavioral Nutrition and Physical Activity

ISSN: 1479-5868

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review on bot

RAxter is now Enago Read! Enjoy the same licensing and pricing with enhanced capabilities. No action required for existing customers.

Your all in one AI-powered Reading Assistant

A Reading Space to Ideate, Create Knowledge, and Collaborate on Your Research

  • Smartly organize your research
  • Receive recommendations that cannot be ignored
  • Collaborate with your team to read, discuss, and share knowledge

literature review research assistance

From Surface-Level Exploration to Critical Reading - All in one Place!

Fine-tune your literature search.

Our AI-powered reading assistant saves time spent on the exploration of relevant resources and allows you to focus more on reading.

Select phrases or specific sections and explore more research papers related to the core aspects of your selections. Pin the useful ones for future references.

Our platform brings you the latest research related to your and project work.

Speed up your literature review

Quickly generate a summary of key sections of any paper with our summarizer.

Make informed decisions about which papers are relevant, and where to invest your time in further reading.

Get key insights from the paper, quickly comprehend the paper’s unique approach, and recall the key points.

Bring order to your research projects

Organize your reading lists into different projects and maintain the context of your research.

Quickly sort items into collections and tag or filter them according to keywords and color codes.

Experience the power of sharing by finding all the shared literature at one place.

Decode papers effortlessly for faster comprehension

Highlight what is important so that you can retrieve it faster next time.

Select any text in the paper and ask Copilot to explain it to help you get a deeper understanding.

Ask questions and follow-ups from AI-powered Copilot.

Collaborate to read with your team, professors, or students

Share and discuss literature and drafts with your study group, colleagues, experts, and advisors. Recommend valuable resources and help each other for better understanding.

Work in shared projects efficiently and improve visibility within your study group or lab members.

Keep track of your team's progress by being constantly connected and engaging in active knowledge transfer by requesting full access to relevant papers and drafts.

Find papers from across the world's largest repositories

microsoft academic

Testimonials

Privacy and security of your research data are integral to our mission..

enago read privacy policy

Everything you add or create on Enago Read is private by default. It is visible if and when you share it with other users.

Copyright

You can put Creative Commons license on original drafts to protect your IP. For shared files, Enago Read always maintains a copy in case of deletion by collaborators or revoked access.

Security

We use state-of-the-art security protocols and algorithms including MD5 Encryption, SSL, and HTTPS to secure your data.

literature review on bot

AI Literature Review Generator

Generate high-quality literature reviews fast with ai.

  • Academic Research: Create a literature review for your thesis, dissertation, or research paper.
  • Professional Research: Conduct a literature review for a project, report, or proposal at work.
  • Content Creation: Write a literature review for a blog post, article, or book.
  • Personal Research: Conduct a literature review to deepen your understanding of a topic of interest.

New & Trending Tools

In-cite ai reference generator, legal text refiner, job search ai assistant.

web1.jpg

We generate robust evidence fast

What is silvi.ai    .

Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis.

Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature search to data analyses. Silvi is directly connected with databases such as PubMed and ClinicalTrials.gov and is always updated with the latest published research. It also supports RIS files, making it possible to upload a search string from your favorite search engine (i.e., Ovid). Silvi has a tagging system that can be tailored to any project.

Silvi is transparent, meaning it documents and stores the choices (and the reasons behind them) the user makes. Whether publishing the results from the project in a journal, sending them to an authority, or collaborating on the project with several colleagues, transparency is optimal to create robust evidence.

Silvi is developed with the user experience in mind. The design is intuitive and easily available to new users. There is no need to become a super-user. However, if any questions should arise anyway, we have a series of super short, instructional videos to get back on track.

To see Silvi in use, watch our short introduction video.

  Short introduction video  

literature review on bot

Learn more about Silvi’s specifications here.

"I like that I can highlight key inclusions and exclusions which makes the screening process really quick - I went through 2000+ titles and abstracts in just a few hours"

Eishaan Kamta Bhargava 

Consultant Paediatric ENT Surgeon, Sheffield Children's Hospital

"I really like how intuitive it is working with Silvi. I instantly felt like a superuser."

Henriette Kristensen

Senior Director, Ferring Pharmaceuticals

"The idea behind Silvi is great. Normally, I really dislike doing literature reviews, as they take up huge amounts of time. Silvi has made it so much easier! Thanks."

Claus Rehfeld

Senior Consultant, Nordic Healthcare Group

"AI has emerged as an indispensable tool for compiling evidence and conducting meta-analyses. Silvi.ai has proven to be the most comprehensive option I have explored, seamlessly integrating automated processes with the indispensable attributes of clarity and reproducibility essential for rigorous research practices."

Martin Södermark

M.Sc. Specialist in clinical adult psychology

weba.jpg

Silvi.ai was founded in 2018 by Professor in Health Economic Evidence, Tove Holm-Larsen, and expert in Machine Learning, Rasmus Hvingelby. The idea for Silvi stemmed from their own research, and the need to conduct systematic literature reviews and meta-analyses faster.

The ideas behind Silvi were originally a component of a larger project. In 2016, Tove founded the group “Evidensbaseret Medicin 2.0” in collaboration with researchers from Ghent University, Technical University of Denmark, University of Copenhagen, and other experts. EBM 2.0  wanted to optimize evidence-based medicine to its highest potential using Big Data and Artificial Intelligence, but needed a highly skilled person within AI.

Around this time, Tove met Rasmus, who shared the same visions. Tove teamed up with Rasmus, and Silvi.ai was created.

Our story  

Silvi ikon hvid (uden baggrund)

       Free Trial       

    No   card de t ails nee ded!  

Duke University Libraries

Literature Reviews

  • Artificial intelligence (AI) tools
  • Getting started
  • Types of reviews
  • 1. Define your research question
  • 2. Plan your search
  • 3. Search the literature
  • 4. Organize your results
  • 5. Synthesize your findings
  • 6. Write the review

Introduction to AI

Research rabbit, copilot (powered by chatgpt4).

  • Thompson Writing Studio This link opens in a new window
  • Need to write a systematic review? This link opens in a new window

literature review on bot

Contact a Librarian

Ask a Librarian

Generative AI tools have been receiving a lot of attention lately because they can create content like text, images, and music. These tools employ machine learning algorithms that can produce unique and sometimes unexpected results. Generative AI has opened up exciting possibilities in different fields, such as language models like GPT and image generators.

However, students need to approach these tools with awareness and responsibility. Here are some key points to consider:

Novelty and Creativity : Generative AI tools can produce content that is both innovative and unexpected. They allow users to explore new ideas, generate unique artworks, and even compose original music. This novelty is one of their most exciting aspects.

Ethical Considerations : While generative AI offers creative potential, it also raises ethical questions. Students should be aware of potential biases, unintended consequences, and the impact of their generated content. Responsible use involves considering the broader implications.

Academic Integrity : When using generative AI tools for academic purposes, students should consult their instructors. Policies regarding the use of AI-generated content may vary across institutions. Always seek guidance to ensure compliance with academic integrity standards.

In summary, generative AI tools are powerful and fascinating, but students should approach them thoughtfully, seek guidance, and adhere to institutional policies. Please refer to the Duke Community Standard  for questions related to ethical AI use.

Looking for a tool that isn't listed here? Let us know about it!

literature review on bot

Research Rabbit is a literature mapping tool that takes one paper and performs backward- and forward citation searching in addition to recommending "similar work." It scans the Web for publicly available content to build its "database" of work.

Best suited for...

Disciplines whose literature is primarily published in academic journals.

Considerations

  • Integrates with Zotero
  • Works mostly with just journal articles
  • Potential for bias in citation searching/mapping

»   researchrabbit.ai   «

center

What is it?

Elicit is a tool that semi-automates time-intensive research processes, such as summarizing papers , extracting data , and synthesizing information . Elicit pulls academic literature from Semantic Scholar , an academic search engine that also uses machine learning to summarize information.

Empirical research (i.g., the sciences, especially biomedicine).

  • Both free and paid versions
  • Doesn't work well in identifying facts or in theoretical/non-empirical research (e.g., the humanities)
  • Potential biases in the natural language processing (NLP) algorithms
  • Summarized information and extracted data will still need to be critically analyzed and verified for accuracy by the user

»   elicit.com   «

literature review on bot

Think of Consensus as ChatGPT for research! Consensus is "an AI-powered search engine designed to take in research questions, find relevant insights within research papers, and synthesize the results using the power of large language models" ( Consensus.app ).  Consensus runs its language model over its entire body of scientific literature (which is sourced from Semantic Scholar ) and extracts the “key takeaway” from every paper.

The social sciences and sciences (non-theoretical disciplines).

  • Free and paid versions
  • Similar to Elicit, Consensus should not be used to ask questions about basic facts
  • Consensus recommends that you ask questions related to research that has already been conducted by scientists
  • Potential for biases in the input data from participants

»   consensus.app   «

literature review on bot

Dubbed the "AI-powered Swiss Army Knife for information discovery," Perplexity is used for answering questions (including basic facts, a function that many other AI tools are not adept at doing), exploring topics in depth utilizing Microsoft's Copilot, organizing your research into a library, and interacting with your data (including asking questions about your files).

Perplexity has wide-reaching applications and could be useful across disciplines.

  • Free and paid pro versions (the pro version utilizes Microsoft's Copilot AI tool)
  • Available in desktop, iOS, and Android apps
  • See  Perplexity's blog for more info
  • Your personal information and data on how you use the tool are stored for analytical purposes (however, this feature can be turned off in settings)
  • Features a browser plug-in, Perplexity Companion , that is essentially a blend of Google and ChatGPT

»   perplexity.ai   «

Did you know that as Duke faculty, staff, and students, we have free access to ChatGPT4 via Microsoft Copilot ?

Log in with your Duke credentials to start using it today.

literature review on bot

The OG of generative AI tools, ChatGPT-4 is the latest iteration of the popular chatbot, answering questions and generating text that sounds like it was written by a human. While not a replacement for conducting research, it can be helpful when it comes to brainstorming topics or research questions and also as a writing tool (rewriting or paraphrasing content, assessing tone, etc.).

All users across all disciplines.

  • ChatGPT-3.5 is the default version of free and paid-tier chat users.
  • Since it can't verify its sources, be wary of hallucinations (or made-up citations) that can look very real.
  • It is not 100% accurate ! While ChatGPT-4 is touted as being 40% more accurate than its predecessor, users are still expected to verify the information generated by it.
  • There is always the potential for bias since ChatGPT was trained on a massive dataset of websites, articles, books, etc. (much of which is inherently biased since it was created by humans).

For ChatGPT-4 (access provided by Duke and requires login) »   copilot.microsoft.com   «

For ChatGPT-3.5 (free) »   chat.openai.com   «

  • << Previous: 6. Write the review
  • Next: Thompson Writing Studio >>
  • Last Updated: May 17, 2024 8:42 AM
  • URL: https://guides.library.duke.edu/litreviews

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

Advertisement

Advertisement

Machine learning-based social media bot detection: a comprehensive literature review

  • Original Article
  • Open access
  • Published: 05 January 2023
  • Volume 13 , article number  20 , ( 2023 )

Cite this article

You have full access to this open access article

literature review on bot

  • Malak Aljabri 1 ,
  • Rachid Zagrouba 3 ,
  • Afrah Shaahid 2 ,
  • Fatima Alnasser 2 ,
  • Asalah Saleh 2 &
  • Dorieh M. Alomari 4  

13k Accesses

17 Citations

1 Altmetric

Explore all metrics

In today’s digitalized era, Online Social Networking platforms are growing to be a vital aspect of each individual’s daily life. The availability of the vast amount of information and their open nature attracts the interest of cybercriminals to create malicious bots. Malicious bots in these platforms are automated or semi-automated entities used in nefarious ways while simulating human behavior. Moreover, such bots pose serious cyber threats and security concerns to society and public opinion. They are used to exploit vulnerabilities for illicit benefits such as spamming, fake profiles, spreading inappropriate/false content, click farming, hashtag hijacking, and much more. Cybercriminals and researchers are always engaged in an arms race as new and updated bots are created to thwart ever-evolving detection technologies. This literature review attempts to compile and compare the most recent advancements in Machine Learning-based techniques for the detection and classification of bots on five primary social media platforms namely Facebook, Instagram, LinkedIn, Twitter, and Weibo. We bring forth a concise overview of all the supervised, semi-supervised, and unsupervised methods, along with the details of the datasets provided by the researchers. Additionally, we provide a thorough breakdown of the extracted feature categories. Furthermore, this study also showcases a brief rundown of the challenges and opportunities encountered in this field, along with prospective research directions and promising angles to explore.

Similar content being viewed by others

literature review on bot

Fake news, disinformation and misinformation in social media: a review

literature review on bot

A review on sentiment analysis and emotion detection from text

literature review on bot

Social Media Account Hacking Using Kali Linux-Based Tool BeEF

Avoid common mistakes on your manuscript.

1 Introduction

In this modern world, OSNs such as Twitter, Facebook, Instagram, LinkedIn have become a crucial part of each one’s life (Albayati and Altamimi 2019 ). It radically impacts daily human social interactions where users and their communities are the base for online growth, commerce, and information sharing. Different social networks offer a unique value chain and target different user segments. For instance, Twitter is known for being the most famous microblogging social network for receiving rapid updates and breaking news. While Instagram usage is mainly by celebrities and businesses for marketing (Meshram et al. 2021 ). Whereas professional communities use LinkedIn. As social networks' popularity grows combined with the availability of vast personal information that users share makes the same valuable features of social platforms for ordinary people a tempting target for malicious entities (Adikari and Dutta 2020 ). The most prevalent form of malware on social media networks is thought to be bots (Aldayel and Magdy 2022 ; Cai, Li, and Zengi 2017b ). Some bots are benign. However, the majority of bots are utilized to perform malicious activities such as fabricating accounts, faking engagements, social spamming, phishing, and spreading rumors to manipulate public opinion, such activities not only disturb the genuine users’ experience but also lead to a negative effect on the public’s and individual’s security. As a result, in recent years, researchers have dedicated a significant amount of attention to social media bot detection (Ali and Syed 2022 ; Ferrara 2018 ; Rangel and Rosso 2019 ; Yang et al. 2012 ) and prevention (Thakur and Breslin 2021 ).

1.1 Social media platforms

OSNs have revolutionized communication technologies and are now an essential component of the modern web. The most popular social networks globally as of January 2022 are shown in Fig.  1 , ordered by the number of monthly active users in millions. The social media platforms which are included in the scope of our study are namely Twitter, Facebook, Instagram, LinkedIn, and Weibo. On these platforms, user growth and popularity have been increasing at an exponential rate. These platforms enable users to produce and exchange user-generated content (Kaplan and Haenlein 2010 ). For instance, only 2.375 billion people were using Facebook in the first quarter of 2019 (Siddiqui 2019 ), thereby representing one-third of the world population (Caers et al. 2013 ). One of the most widespread and extensively used OSN by people from all walks of life is Twitter. Twitter allows individuals to express their sentiments on different topics such as entertainment, the stock market, politics, and sports. (Wald et al. 2013 ). It is one of the fastest means of circulating information as a result extremely affects people’s perspectives. Over the past few years, Twitter has become a replacement for mainstream media for obtaining news (Wald et al. 2013 ). On the other hand, Instagram is an OSN for sharing photos and videos and is accessible on both Android and iOS since 2012. Dated May 2019, there were more than a billion users registered on Instagram, according to collected data (Thejas et al. 2019 ). Moving on, Facebook is an online social networking site that makes it convenient for people to connect and share with family and friends. It was developed in 2004 initially for students by Mark Zuckerberg. With more than 1 billion users globally, Facebook is one of the biggest social networks in the current times (Santia et al. 2019 ). One of the most well-known professional social networks is LinkedIn, a platform that focuses on professional networking and career advancement (Dinath 2021 ). Sina-Weibo, also known as Chinese Twitter, was launched in 2009, and this microblogging website or application is one of China’s biggest social media platforms. It offers a plethora of features which include posting images, instant messaging, Weibo stories, using location-based hashtags, trending topics, etc. Furthermore, it also gives businesses the privilege to set up accounts for the purpose of advertisements and services (Tenba Group 2022 ).

figure 1

Most popular social networks globally as of January 2022, ordered by no. of monthly active users. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

1.2 Social media security

Security and trustworthiness among users, service providers, platform owners, and third-party supervisors are critical factors for social media platforms’ success and stable existence (Zhang and Gupta 2018 ). According to recent surveys (Shearer and Mitchell 2022 ), a considerable segment of the population prefers social networks to TV, newspapers, and other traditional media when looking for information. Trust in social networks as a source of information is predicted to rapidly grow (Kolomeets and Chechulin 2021 ). As a result, social bots can pose significant security risks by influencing public opinion and disseminating false information (Shao et al. 2017 ), spreading rumors and conspiracy theories (Ferrara 2020 ), creating fake reputations, and suppressing political competitors (Pierri et al. 2020 ; Benkler et al. 2017 ). Despite the fact that bots are extensively used, little research has been done to examine how they affect the social media environment. This indicates that nearly 48 million of the accounts on Twitter are bots (Sheehan 2018 ). It was also stated that Facebook acknowledges that 270 million of its accounts are fake (Sheehan 2018 ). Further, there is evidence that social media bots were utilized to attempt to influence political communication dates during the US midterm elections in 2010. There were also allegations that social bots on Twitter played a significant role in the 2016 US presidential election (Cresci et al. 2017 ; Mahesh 2020 ; Sedhai and Sun 2015 ). Bots can be employed to spread misinformation to promote a particular view of a public person, grow an account's following, and repost user-generated content. Bot detection on OSNs is therefore the most frequently requested security feature from businesses and law enforcement organizations (Kolomeets and Chechulin 2021 ). The dearth of publicly accessible datasets for OSNs such as Facebook, Instagram, and LinkedIn is one of the greatest obstacles in this research area. Unlike Twitter, this restriction results from some of these OSNs’ limited data collection policies.

1.3 Types of bots on social media

The term "bot" refers to a robot, a computer program that works more quickly than humans at recurring, automated tasks. More precise terminology can be used to define bots in OSNs "a computer software that generates content automatically and engages with users of social media to replicate and possibly modify their behavior" (Benkler et al. 2017 ). Bots can be used for useful or harmful reasons and often replicate human behavior to some degree (Fonseca Abreu et al. 2020 ). Good bots can significantly reduce the need for human customer service representatives for some businesses, such as chatbots and news bots that automatically upload new articles or news for journalists or bloggers. Bots can be employed for negative as well as positive purposes. According to (Gorwa and Guilbeault 2020 ), bots are responsible for a sizable portion of online activity, are used to manipulate algorithms and recommender systems, stifle or promote political speech, and can be crucial in the spread of hyper-partisan "fake news." According to (Benkler et al. 2017 ), there are four different categories of social media bots: spambots, social bots, sybil bots, and cyborgs. Promoter bots, URL spambots, and false followers are only a few examples of the various types of spambots that spread harmful links, uninvited messages, and hijack popular subjects on social networks (Meshram et al. 2021 ). On the other hand, social bots are algorithmically controlled user accounts that mimic the activity of human users but carry out their tasks at a considerably faster rate while successfully concealing their robotic identity (Ferrara 2018 ). While cyborgs bots are half-human, half-bot accounts that exist between people and bots, sybil bots are anonymous identities, i.e., user accounts, utilized for a significantly big effect (Gorwa and Guilbeault 2020 ). In this review, the collected papers were incorporating three categories of bots which are social bots spambots, and sybil bots.

1.4 The different machine learning-based techniques and algorithms

The development of algorithms that allow a computer to learn on its own from data and prior experiences is the core of ML, a subfield of artificial intelligence (AI). Arthur Samuel was the first to originate the term "Machine Learning" (Wiederhold and McCarthy 1992 ). ML system develops prediction models based on previous data and makes predictions up until new data are gathered. The amount of data used to create a model determines its accuracy (Saranya Shree et al. 2021 ). The various types of ML techniques include Supervised, Semi-supervised, Unsupervised, and Reinforcement. However, we have only included Supervised, Semi-supervised, and Unsupervised as a part of our study. Three types are present under supervised category: Classification, Regression, and Forecasting. Some of the most popular supervised algorithms include Random Forest (RF), Naïve Bayes (NB), Decision Trees (DT), Logistic Regression (LR), Support Vector Machine (SVM), Neural Networks (NN), and many more. Deep learning (DL) is a subset of supervised ML techniques that employs multiple layers to gradually extract higher-order features from the input data. In order to create patterns and process data, this AI technology mimics the actions and processes of the human brain (Gannarapu et al. 2020 ). Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), Generative Adversarial Networks (GANs), etc., are some of the well-known DL algorithms. Whereas unsupervised learning algorithms are namely categorized into Clustering and Association which mainly deal with unlabeled data. Some of the primarily used ones include K-nearest Neighbor (KNN), K-means clustering, Principal Component Analysis (PCA), etc. However, a small amount of labeled data and a large amount of unlabeled data are utilized in semi-supervised learning, which results in a hybrid of supervised and unsupervised learning (Mahesh 2020 ).

1.5 Machine learning implementation on social media security

The way people use social media is evolving as a result of the proliferation of ML techniques in social media and the increased sophistication of cyberattacks on computer information systems (Aljabri et al. 2021a , b ). On social networking sites such as Facebook and Twitter, numerous ML techniques were employed. For instance, existing ML algorithms can determine the user's location, carry out sentiment analysis, (Aljabri et al. 2021a , b ) offer recommendations, and much more. Diverse ML methods have been successfully deployed to address wide-ranging problems in cybersecurity which include detecting malicious URLS, classification of firewall log data, phishing attacks detection, etc. (Aljabri et al. 2022a , b , c ; AAljabri and Mirza 2022 ). However, malicious software can be used to target social media platforms and carry out cyber-attacks. In terms of social media, several efforts have been undertaken to investigate ML techniques to detect such types of malware. For instance, (Alom et al. 2020 ) detected Twitter spammers using DL techniques. Moreover, the study conducted by (Kantartopoulos et al. 2020 ) addressed the effects of hostile attacks and utilized KNN as a measure to tackle the problem. The authors presented a methodology that uses SVM and Ensemble algorithms to effectively detect cyberbullying (Gupta and Kaushal 2017 ). Additionally, models have been developed for social media systems’ access control (Carminati et al. 2011 ). Yet, bot and fake account detection on social media platforms are still one of the primary challenges for cyber security researchers (Thuraisingham 2020 ).

1.6 Key contributions

This section firstly puts forth the existing literature reviews done on different social media platforms as shown in Table 1 . It also briefly discusses the previously used taxonomies along with the prevailing gaps. Starting with the literature review on Twitter, (Alothali et al. 2019 ) included literature concerning from 2010 to 2018 based on various techniques which include Graph-based, Crowdsourcing, and ML. They analyzed the common aspects such as datasets, classifiers, and the selected features employed. The challenges present in the domain were also addressed. (Derhab et al. 2021 ) discussed existing techniques and put forth a taxonomy that addressed the state-of-the-art tweet-based bot detection techniques in the timeline from 2010 to 2020. Based on tweet-based bot detection techniques, they provided the main features utilized. For tweet-based bot detection, they also described big data analytics shallow and DL techniques, in addition to their performance results. Finally, the challenges and open issues in the area of tweet-based bot detection were presented and discussed (Derhab et al. 2021 ). Furthermore, (Orabi et al. 2020 ) discussed the studies from 2010 to 2019 on Graph-based, ML-based, Crowdsourcing, and Anomaly-based. Their research revealed some gaps in the literature, such as the fact that studies discussed mainly Twitter, and that unsupervised ML is rarely used, in addition to the majority of publicly available datasets being either inaccurate or insufficiently large. In (Gheewala and Patel 2018 ), contributed a review on ML twitter spam detection for the years 2010–2017 based on Clustering, Classification, and Hybrid algorithms. Some of the issues concluded were regarding the results being lowered as a result of concerns with feature fabrication, class imbalance, spam drift, etc., for spam detection. The study (Ezarfelix et al. 2022 ) performed was based only on the Instagram platform where a multitude of analyses, and evaluations have been performed on the studies from 2018 to 2021. It was concluded that in order to detect fake accounts, using NN is the most effective method. (Rao et al. 2021 ) presented a comprehensive review of the social spam detection techniques studied from 2015 to 2020 based on different social spam detection techniques which include Honeypot/Honeynet-based techniques, URL List-based spam filtering techniques, and ML and DL techniques. Numerous feature analysis and dimensionality reduction techniques used by different researchers were outlined. A thorough analysis was given, describing the datasets utilized, features used, ML/DL models used, performance measures used, and pros and cons of each model.

To the best of our knowledge, no study in the literature has carried out a comprehensive analysis of the existing studies in the time period (2015–2022) in the domain of applying ML-based techniques for social media bot detection (social bots, spambots, sybil bots). For this specific timeline, the existing reviews have studied either only ML or DL-based studies or only addressed a specific bot type. We perceived that there was a need for a recent literature review to be conducted so that researchers could identify the findings and gaps in this field and use that information as a roadmap for future research directions and further in-depth study. In response to this demand, in this study, we discuss what is currently known and being researched regarding the several concepts, theories, and techniques linked to bot detection on social media platforms.

This paper makes the following key contributions:

Provide summaries and analysis of the used ML-based (supervised, semi-supervised, and unsupervised) classification techniques to detect various types of bots on some particular social media platforms.

Provide a unique taxonomy based on the various ML-based techniques which has not been provided in the existing literature.

Identify and analyze the most commonly extracted and used features on each social media platform.

Study the most affected social media platform from malicious bots, the class of bots mostly found on these platforms. Additionally, highlight the most studied social platforms and analyze the gaps of research on other platforms.

Examine and analyze the popular public datasets used for each platform and the methods used for the self-collected datasets.

Highlight challenges and gaps in existing research thereby providing potential directions for further research.

The rest of this paper is structured as follows: Sect.  2 presents the methodology adopted for this paper. Section  3 puts forth a detailed analysis including tables and figures demonstrating the ML-based techniques used in the existing literature. In Sect.  4 based on all the reviewed studies, the datasets used, features extracted, and algorithms implemented are discussed thereby performing an extensive analysis. Section  5 sheds light on the insights gained and presents a discussion on the challenges and opportunities in existing research thereby providing future research directions. Section  6 provides a conclusion to summarize our literature review.

2 Methodology

The objective of this review is to study the existing literature from 2015 to 2022 in the domain of bot detection and classification using ML techniques on various social media platforms. We searched for social media bot detection-related papers on various well-known databases mainly Google Scholar, Mendeley, IEEE Xplore, ResearchGate, ScienceDirect, Elsevier, acm.org, arxiv.org, SpringerLink, MDPI, etc. The total number of papers reviewed were 105. All these 105 papers were summarized and elaborately discussed in this paper. Figure  2 demonstrates the range of the reviewed papers.

figure 2

Bar chart showcasing the range of the reviewed papers

Figure  3 shows the created taxonomy for the paper. The first tier is based on ML-based techniques, followed by the second tier on the type of social media platform and lastly, the third tier is based on the type of social media bot which includes social bots, spambots, and sybil bots. The logic behind the taxonomy created in this literature review is mainly to identify the most effected social media platforms from bots, the class of bots mostly found on those platforms, and to highlight the most studied social platforms and analyze the gaps of research on other platforms. This is different from most existing literature reviews which focus on the ML techniques and algorithms used in research which can be inefficient to highlight findings and identify gaps since many studies use several techniques and algorithms applied on one platform.

figure 3

Taxonomy of social media bot detection using ML-based techniques

3 Machine learning-based techniques for detecting bots on social media platforms

Numerous studies have been published addressing the use of ML-based techniques for bot detection. This section reviews the existing research studies on the subject by discussing previous studies and findings. The summaries are organized based on the three different ML types followed by different social media platforms and the affecting bot types.

3.1 Using supervised ML

Most of the studies we reviewed have implemented supervised ML and DL to detect social bots, spambots, and sybil bots which shall be discussed below.

3.1.1 Facebook—detecting social bots

Very few studies were found that used the supervised approach to detect social bots on Facebook. To improve classification accuracy, (Wanda et al. 2020 ) built a supervised learning architecture using a CNN model. To train and evaluate the model, the CNN used a Deep Neural Network (DNN) with a number of hidden layers. In order to minimize the objective function using the model's parameters, it also used a gradient descent. To optimize and accelerate training time in the NN, a pooling layer was used. The results with an optimizer Stochastic Gradient Descent (SGD) m  = 0.5 showed a training loss of 0.5058 and a testing loss of 0.5060.

Secondly, 4.4 million publicly generated Facebook postings were collected and described in a dataset by (Dewan and Kumaraguru 2017 ). On their dataset of harmful posts, they used two different filtering techniques: one that used URL blacklists and another that used human annotations. They used NB, DT, RF, and SVM models, among other supervised learning methods. These models are based on a set of 44 publicly accessible attributes. After evaluation, RF was shown to have the highest accuracy of over 80%. Based on their findings, they proceed to develop Facebook Inspector (FBI), a browser plug-in that uses a Representational State Transfer (REST) API to identify harmful Facebook postings in real-time.

3.1.2 Facebook—detecting spambots

Some studies have identified spambots on Facebook using various data collection techniques. Due to the restrictive security policies on Facebook, accessing and acquiring relevant data is challenging.

Sahoo and Gupta ( 2020 ) implemented a spammer detection system on Facebook. The Particle Swarm Optimization (PSO) algorithm was used in this study to determine the popularity of the content and feature selection. The dataset included 1600 profile posts in total. Twelve profile- and content-based features were chosen after the generated content underwent data pre-processing. The PSO algorithm used these features as an input parameter to find fraudulent accounts. In this experiment, classifiers RF, RT, Bagging, JRip, J48, and AdaBoost were utilized. Using the classifier, the detection rate produced the best accuracy of 99.5%.

Followed by, (Rathore et al. 2018 ) who introduced an efficient spammer detection method called SpamSpotter that uses an Institute for Data, Systems, and Society (IDSS) to differentiate spammers from real Facebook users. A dataset made up of 1000 profiles was employed. The framework made use of features based on profiles and content. They used the Bayesian Network (BN), RF, Decorate (DE), J48, JRip, KNN, SVM, and LR as the eight supervised ML classifiers. The BN classifier outperformed all others with an accuracy of 0.984.

3.1.3 Facebook—detecting sybil bots

We found a reasonable number of studies that thrived in recognizing sybils (Fake profiles) on Facebook. This study proposed by (Albayati and Altamimi 2019 ) was about a smart system known as FBChecker that checks if a profile is fake. A set of behavioral and informational attributes were analyzed and classified by the system using the data mining approach. Four data mining algorithms which include KNN, DT, SVM, and NB were used. The RapidMiner data science platform was used to implement the selected classifiers. The dataset of 200 profiles was prepared by the authors. A Receiver Operating Characteristic Curve (ROC) graph comparison was created to check the accuracy and all classifiers showed a high accuracy rate, but SVM outperformed with an accuracy rate of 98%.

Subsequently, (Hakimi et al. 2019 ) proposed supervised ML techniques based on only five characteristics that play a key role in distinguishing fake and true users on Facebook. The important characteristics finalized were Average Post Likes Received, Average Post Comments, Average Post Comments Received, Average Post Liked, and Average Friends. A sample data of 800 users were generated by Mockaroo. The data were categorized into four clusters: Inactive User, Assume Fake account User, Fake account user, and Real User. Classifiers namely KNN, SVM, and NN were implemented. Results showed that KNN outperformed with an accuracy of 0.829. It was concluded that the features “likes”, and “remarks” add a significant value to the job of detection.

Moreover, (Singh and Banerjee 2019 ) created a dataset on Facebook using their graph API to be utilized for sybil accounts detection. Also, a comparative analysis of various algorithms over the dataset was performed. The dataset contained 995 both real and fake accounts. Twenty-nine features were extracted including textual, categorical, and numerical features. AdaBoost, Bagging, XGBoost, Gradient Boost (GB), RF, LR, Support Vector Classifier (LinearSVC), and ExtraTree algorithms were applied for evaluation. AdaBoost was the best-performing algorithm with a 99% F1-score.

However, (Saranya Shree et al. 2021 ) suggested Natural Language Processing (NLP) pre-processing techniques and ML algorithms such as SVM and NB to classify fake and genuine profiles on Facebook. A dataset of 516 profiles was used and trained until 30 epochs. It predicted 91.5% fake accounts and 90.2% genuine accounts correctly.

Another strategy for identifying sybils on Facebook was presented by (Babu et al. 2021 ). By using the Facebook graph API, they gathered a dataset of 500 users from a survey of 500 Facebook users in order to better understand the nature and distinguishing characteristics of sybil. The tested dataset was used to identify fake profiles using the NB classifier. Seven profile-based features were used in the model. Their suggested solution had a 98% efficiency rate. Moving on, (Gupta and Kaushal 2017 ) has described an approach to detect fake accounts. The key contributions of the authors’ work include a collection of a private dataset using the Facebook API through Python wrappers. After data collection, a set of 17 features was shortlisted which included likes, comments, shares, tag, apps usage, etc. A total of 12 supervised ML classification algorithms were used (from Weka), namely, k-Nearest Neighbor, Naive Bayes, Decision Tree classifiers (J48, C5.0, Reduced Error Pruning Trees Classification (REPT), Random Tree, Random Forest), etc. Two types of cross-validation were performed, namely, the holdout method, and tenfold cross-validation. A classification accuracy of 79% was achieved. The user activities contributed the maximum to the detection of fake accounts.

3.1.4 Instagram—detecting social bots

Only one study by (Sen et al. 2018 ) aimed to detect fake likes on Instagram thereby detecting social bots. A dataset of 151,117 likes of both fake and genuine likes was captured and labeled manually by the authors. A limitation of this study was the noisiness of the dataset. However, various types of features were extracted from the dataset, which were Network Effect, Internet Overlap, Liking Frequency, Influential Poster, Hashtag Features, and User-based features to be used with extensive analysis. LR, RF, SVM, AdaBoost, XGBoost, NN, and Multilayer Perceptron (MLP) algorithms were applied. MLP showed the best results with 83% Precision and 81% Recall (AUC of 89%). According to the authors, the model's high efficacy in capturing the parameters that influence genuine liking behavior is the model's main strength.

3.1.5 Instagram—detecting spambots

Two studies were found that used the ML approach for fake and automated accounts detection on Instagram. Firstly, (Akyon and Esat Kalfaoglu 2019 ) contributed by generating two labeled public datasets. A dataset for fake accounts (1203 accounts) and another for bots (1400 accounts). However, both datasets had problems. The fake accounts dataset had an uneven number of real and fake accounts. As a result, the Synthetic Minority Over-sampling Technique-for-Nominal and Continuous (SMOTE-NC) algorithm was implemented. While cost sensitive genetic algorithm was implemented to correct the automated accounts dataset unnatural bias. Profile-centric features were fed into NB, LR, SVM, and NN algorithm. SVM and NN provided promising F1-scores for both datasets. 94% with oversampling for fake accounts and 86% for automated accounts dataset.

Similarly, a method to identify spam posts was also presented by (Zhang and Sun 2017 ). 1983 user profiles and 953,808 media posts made up a manually labeled dataset. Profile-based, Color Difference Histogram-based, and Media Post-based feature vectors were extracted from user profiles and media postings. The near duplicate posts were grouped into the same clusters using two-pass clustering techniques, Minhash clustering and K-medoids clustering. The best pair has an accuracy of 96.27%: RF, (maxDepth: 8, numTrees: 20, impurity: entropy).

3.1.6 Instagram—detecting sybil bots

Many studies were able to detect sybil bots starting with (Meshram et al. 2021 ) proposed an automated methodology for fake profiles detection. The authors collected 1203 accounts including real and fake accounts using Instagram API. In addition, a list of eight content- and behavior-based features were extracted. Authors needed to oversample the dataset using SMOTE-NC before applying any algorithm due to the unevenness of the real-fake accounts ratio. Afterward, NN, SVM, and RF algorithms were applied. RF depicted the best-performing results with an accuracy of 97%.

Whereas, using the same records and features, (Sheikhi 2020 ) presented a bagging classifier and performed a comparative analysis with five well-known ML algorithms, which were RT, J48, SVM, Radial Basis Function (RBF), MLP, Hoeffding Tree, and NB with 10-cross-validation. The bagging classifier showed better performance by successfully classifying 98% of the accounts. Moreover, the author presented the best feature types for different sizes of datasets.

Additionally, (Dey et al. 2019 ) also assessed fake and real different Instagram accounts. A publicly labeled dataset of sixteen accounts was obtained from Kaggle. Twelve profile-based features were extracted from the sample dataset. Missing Value Treatment, OuSybiltlier Detection, and Bivariate Analysis were carried out as a part of the Exploratory Data Analysis. Median imputation was done to deal with the outliers. For the extent of this paper, LR, and RF—two supervised classification algorithms were used. Lastly, out of the two mentioned classifiers, RF showed the best performance with 92.5% accuracy.

Subsequently, the research of (Purba et al. 2020 ) aimed to identify fake users’ behavior. Furthermore, different approaches of classification have been proposed. 2-class (authentic, fake) and 4-classes (authentic, spammer, active fake user, inactive fake user) classifications. The total number of fake and authentic users in the dataset was 32,460 users. They used seventeen features based on metadata, media info, media tags, media similarity, and engagement. Using these features with RF, MLP, LR, NB, and J48 algorithms showed promising results. RF showed an accuracy of up to 91.76% for 4-classes classification. Moreover, analysis outcomes showed that metadata and statistics results are the foremost predictors for classification.

Nevertheless, (Kesharwani et al. 2021 ) utilized a six-layered DL model NN to classify fake and genuine Instagram accounts. The designed model used 12 profile-based features. An open dataset of 696 Instagram users available on Kaggle was used for this experiment and was collected using a crawler. The dataset had 10 profile-based features. The model’s training was done using 20 epochs and therefore giving an accuracy of 93.63%.

Quite interestingly, (Bazm and Asadpour 2020 ) proposed a behavioral-based model. A labeled dataset was collected by the authors including 2000 accounts of both fake and genuine users. Seven behavioral features were extracted from the dataset. KNN, DT, SVM, RF, and AdaBoost algorithms were tested and analyzed. AdaBoost showed the best-performing results with an accuracy of 95%. Additionally, the Max feature was identified as the most effective for classification followed by standard deviation, following count, and entropy. Three of the above-mentioned most effective features were behavioral.

Lastly, the work of (Thejas et al. 2019 ) also focused on detecting valid and fake likes of Instagram posts by applying automated single and ensembled learning models. A labeled dataset of 10,346 observations and 37 features has been composed. The authors used numeric features and text-based features to perform extensive analysis of fake likes related patterns. Various single classifiers have been used such as LR SVM, KNN, NB, and NN with different versions. Adjacent to ensembled-based classifiers as RF with multiple versions as well. Moreover, bot detection using an autoencoder has been experimented. RF showed the highest performance among all with 97% accuracy.

3.1.7 LinkedIn—detecting sybil bots

Only two studies were found on this platform to detect sybils, (Adikari and Dutta 2020 ) proposed a methodology for identifying bot-generated profiles based on limited publicly available data of profiles using data mining techniques. Many existing research assumes the availability of static and dynamic data of a profile, which is not the case with LinkedIn as it has more restrictive privacy policies that impede access to dynamic data. The profile features were extracted from a dataset of 74 profiles only. Thirty-four fake accounts were collected by searching blogs and websites for known LinkedIn fake accounts. The lack of verified fake accounts was a limitation of this research. NN, SVM, PCA, and Weighted Average algorithms were used in several combinations for detecting fake profiles. SVM showed the highest accuracy (87.34%) when employing PCA-selected features with a polynomial kernel.

Furthermore, (Xiao et al. 2015 ) proposed a scalable offline framework using the pipeline to identify clusters of fake accounts on LinkedIn. Cluster-level fake accounts are identified rather than account-level to detect fake accounts after registration rapidly. Statistical features generated by users at or after registration time, such as name, email address, company, were grouped into clusters. Cluster-level features were exclusively fed into the RF, LR, and SVM models. The authors have collected a set of labeled data for 260,644 LinkedIn accounts. RF algorithm’s performance evidently provided the best results for all metrics; an AUC of 0.95 and a recall of 0.72 at 95% precision for out-sample test data.

3.1.8 Twitter—detecting social bots

Numerous studies were able to detect social bots on Twitter starting (Echeverra et al. 2018 ) tested 20 unseen bot classes of varying sizes and characteristics using bot classifiers. Two datasets were collected using Twitter’s API consisting of 2.5 million accounts. Twenty-nine Profile- and Content-based features were employed for classification. The classifiers used to test were GB Trees (XGBoost and LightGBoost Model (LGBM)), RF, DT, and AdaBoost. LGBM showed the highest accuracy rate of 97.84% on both the subsampling used—C30K and C500.

Moreover, (Fonseca Abreu et al. 2020 ) examined whether feature set reduction for Twitter bot detection yields comparable outcomes to large sets. Five Profile-based features were used for classification. The dataset used consisted of 4565 records of both social bots and genuine users. The ML algorithms tested namely were RF, SVM, NB, and one-class SVM. AUC’s greater than 0.9 were obtained by all multiclass classifiers. However, RF exhibited the best results with an AUC of 0.9999.

Varol et al. ( 2017 ) used more than a thousand features which were based on metadata primarily based on friends, tweet content, sentiment, network patterns, and activity time series. A publicly accessible dataset of size 31 K that contains manually verified Twitter accounts as bots or real was used to train the model. The model’s accuracy was evaluated using RF, AdaBoost, LR, and DT classifiers. The best performance was depicted by RF of 0.95 AUC. Furthermore, it was concluded that the most significant sources of data are user metadata and content features.

Twenty-eight features were extracted based on profile, tweets, and behavior (Knauth 2019 ). For easy future portability, language-agnostic features were mainly focused on. LR, SVM, RF, AdaBoost, and MLP classifiers were used for experiments. AdaBoost outperformed all competitors with an accuracy of 0.988. Smaller quantities of training data were analyzed, and it was shown that using a few, expressive characteristics provides good practical benefits for bot identification.

In this study, after a long process of feature extraction and data pre-processing, (Kantepe and Gañiz 2017 ) employed ML techniques. Thousand eight hundred accounts were used to get the data from Twitter API and Apache Spark, which was In this study, after a long process of feature extraction and data pre-processing, (Kantepe and Gañiz 2017 ) employed ML techniques. One thousand eight hundred accounts data was obtained with Twitter API and Apache Spark, which was then used to extract 62 different features. The features extracted were mainly profile-based features, Twitter features and periodic features. Four classifiers were used which include LR, Multinomial Naïve Bayes (MNB), SVM and GB. The highest accuracy result 86% was shown by the GB trees.

This research conducted by (Barhate et al. 2020 ) used two approaches for the detection of bots and analyzed their influence in trending a hashtag on Twitter. First, the bot probability of a user was calculated using a supervised ML technique and a new feature bot score. A total of 13 features were extracted for data pre-processing and Estimation of Distribution Algorithms (EDA). The data were trained using RF classifier, which produced an AUC result of 0.96. This study also came to the conclusion that bots had a high friend-to-follower ratio and a low follower growth rate.

The dataset that was acquired by (Pratama and Rakhmawati 2019 ) is from the supporters of the Indonesian presidential candidate on Twitter. The top five hashtags for each candidate were used to collect tweets, which were then manually labeled with the accounts' bot characteristics, resulting in a limit of about 4.000 tweets. SVM and RF, two ML models, are utilized for bot detection. These two models were trained with cross-validation ten-folds to improve the overall score. From these two models, RF has a higher overall score than SVM of 74% in F1-Score, Accuracy, and AUC. Comparing the 10 retrieved features from the dataset, they discovered that the account year creation had the biggest separation between humans and bots.

Davis et al. ( 2016 ) made use of RF classifier to evaluate and detect social bots by creating a system called BotOrNot. A public dataset of 31 K accounts was used to train the model. From six main groups of characteristics—network, user, friend, temporal, content, and sentiment features—the framework collected more than 1000 features. These various classifiers—one for each category of features and one for the overall score—were trained using extracted features. The system performance was assessed using ten-fold cross-validation, and an AUC value of 95% was obtained.

Likewise, a Twitter bot identification technique was also presented by (Shevtsov et al. 2022 ). 15.6 million tweets ‘total, including 3.2 million accounts sent during the US Elections, were included in their dataset from Twitter. The XGBoost algorithm was used to pick 229 features from approximately 337 user-extracted features. Their suggested ML pipeline involves training and validating many three ML models which are SVM, RF, and XGBoost. Performance was best for XGBoost where their findings indicate that it performs well on the collected dataset compared to the training data section because of its great generalization capabilities. Only 2% of the F1 score is going from 0.916 to 0.896, and 0.03% of the ROC-AUC indicates a decline in performance from 0.98 to 0. 977.

Additionally, SPY-BOT, a post-filtering method based on ML for social network behavior analysis, was introduced by (Rahman et al. 2021 ). Six hundred training samples were used to extract eleven characteristics. They contrast the two ML algorithms LR and SVM throughout the training phase. After comparing outcomes, tuned SVM was the best performing. On the validation dataset, their method achieves up to 92.7% accuracy while up to 90.1% accuracy was obtained on the testing dataset. As result, they suggest that the proposed approach able to classify the users’ behavior in Social Network-Integrated Industrial Internet of Things (SN-IIoT).

Also, a real-time streaming framework called Shot Boundary Determination (SBD) was also suggested by (Alothali, Alashwal, et al. 2021a ) as a way to detect social bots before they launch an attack to protect users. To gather tweets and extract user profile features, the system uses the Twitter API. They used a publicly available Twitter dataset from Kaggle, which has a total of 37,438 records, as their offline dataset. Friends count, Followers count, Favorites count, Status count, Account age days, and Average tweets per day were the six features that were extracted and further used as input to their ML model. They use RF algorithm to differentiate between the bots and human accounts. The outcomes of their methodology demonstrated the effectiveness of retrieving, publishing the data, and monitoring the estimates.

Shukla et al. ( 2022 ) proposed a novel AI-driven multi-layer condition-based social media bot detection framework called TweezBot. Moreover, the authors have performed a comparative analysis with several existing models and an extensive study of features, and exploratory data. The proposed method analyzed each Twitter-specific user profile features and activity-centric characteristics, such as profile name, location, description, verification status, and listed count. 2789 distinct user profiles were used to extract these features from a public labeled dataset from Kaggle. ML models used for comparative evaluation and analysis were RF, DT, Bernoulli Naïve Bayes (BNB), CNB, SVC, and MLP. TweezBot attained a maximum accuracy of 99.00049%.

Since bots are used to manipulate activities in politics as well (Fernquist et al. 2018 ) presented a study on political Twitter bots and their impact on the September 2018 Swedish general elections. To identify automatic behavior, an ML model that is independent of language was developed. The training data consist of both bots and genuine accounts. Three different datasets (Cresci et al. 2015 ; Gilani et al. 2017 ; Varol et al. 2017 ) were used to train the classification model. Furthermore, a list of 140 user metadata, Tweet and Time features were extracted. Various algorithms such as AdaBoost, LR, SVM, and NB were tested. RF outperformed with an accuracy of 0.957.

Similarly, (Beğenilmiş and Uskudarli 2018 ) made use of collective behavior features in hashtag-based tweet sets, which were compiled by searching for relevant hashtags. A dataset of 850 records was utilized to train the model using algorithms including RF, SVM, and LR. From tweets collected during the 2016 US presidential election, 299 features were retrieved. To capture the coordinated behavior, the features represent user and temporal synchronization characteristics. These models were developed to distinguish between organic and inorganic, political and non-political, and pro-Trump or pro-Hillary or neither tweet set behavior. The RF displayed the best outcomes, with an F-measure of 0.95. In conclusion, this study found that media utilization and tweets marked as favorites are the most dominant features and user-based features were the most valuable ones.

On the other hand, in this approach, (Rodríguez-Ruiz et al. 2020 ) one-class classification was suggested. One benefit of one-class classifiers is that they do not need examples of abnormal behavior, such as bot accounts. The public dataset (Cresci et al. 2017 ) was used. Bagging-TPMiner (BTPM), Bagging-RandomMine (BRM), One-Class K-means with Randomly projected features Algorithm (OCKRA), one-class SVM, and NB were the classifiers that were taken into consideration. For categorization, only 13 numerical features were extracted. With an average AUC value of 0.921, Bagging-TPMiner outperformed all other classifiers over a number of experiments.

Moreover, (Attia et al. 2022 ) proposed a new multi-input DNN technique-based content-based bot detection model. They used the 6760 records from the public PAN 2019 Bots and Gender Profiling Task (Rangel and Rosso 2019 ) dataset. The proposed multi-input model includes three phases. Their proposed Multi-input model includes 3 phases. The first phase represents the first input as an N-gram model of a 3D matrix of 100*8*300 as model input to two-dimensional CNN. On the other hand, the second phase input is one-dimensional CNN model that has a vector with M length (100 tweets) as model input. The final phase has the previous models with fully connected neural networks to combine them. Each model was trained using suitable hyper-parameters values. Their model achieved a detection accuracy of 93.25% and outperforms other newly proposed models in bot detection.

In the work of (Sayyadiharikandeh et al. 2020 ) for each class of bots, they recommended training specialized classifiers and combining their conclusions using the maximum rule. In the most recent version of Botometer, they also produced Ensemble Specialized Classifier (ESC). Additionally, the authors used 18 different public labeled datasets from Bot Repository, and over 1200 features were extracted. Features were divided into 6 categories: metadata, retweet/mention networks, temporal features, content information, and sentiment features. Accordingly, a cross-domain performance comparison and analysis was performed using all the 18 different datasets. The authors recommend considering the three types of bot class as in (Cresci et al. 2017 ) dataset. Moreover, the authors provided a list of the most informative features per bot classes in the used public dataset.

A comprehensive comparative analysis was conducted by (Shukla et al. 2021 ) to determine the optimal feature encoding, feature selection, and ensembling method. From the Kaggle repository, a total of 37,438 records comprising the training and testing dataset were acquired. Scaling of numerical attributes and encoding of categorical attributes were two steps in the pre-processing of the dataset. A total of 19 attributes were extracted. The model used the classifiers: RF, Adaboost, NN, SVM, and KNN. It was determined that employing RF for blending produced the best results and the highest AUC score of 93%. Since the proposed approach uses Twitter profile metadata, it can detect bots more quickly than a system that analyzes an account's behavior. However, the system's reliance on static analysis reduces its efficiency.

Ramalingaiah et al. ( 2021 ) represented an effective text-based bag of words (BoW) model. BoW produces a numerical vector that can be utilized as inputs in different ML algorithms. Using resulted features from feature selection process, different ML algorithms were implemented like DT, KNN, LR, and NB to calculate their accuracies and compare it with their classifier which uses the BoW model to detect Twitter bots from a given training data. The utilized dataset from Kaggle with 2792 training entries and 576 testing entries for evaluation of their models. As a result, the performance of the decision tree gives the highest accuracy which further uses a bag of bots’ algorithm to increase accuracy in detecting bots. Their classifier performs the best as it uses a bag of words model with test data yields an accuracy of over 99%.

A ML method based on benchmarking was proposed by (Pramitha et al. 2021 ) to choose the best model for bot account detection. Dataset obtained from Kaggle with 24,631 records then scraping was performed using the Twitter API to obtain profile features. Furthermore, over-sampling using SMOTE is applied to overcome imbalanced data and improve the models’ accuracy. Both RF and XGBoost algorithms were evaluated. XGBoost algorithm outperforms RF, with an accuracy of 0.8908. Additionally, after ranking fifteen different features, they discovered that three significant features—verified, network, and geo-enable—can identify between human and bot accounts.

Many studies implemented effective DL algorithms instead of ML, such as a Behavior-enhanced Deep Model (BeDM) proposed by (Cai, Li, and Zengi 2017b ) for bot detection using a real-world public labeled dataset of size 5658 accounts and 5,122,000 tweets from Twitter, which have been collected with honeypots. The model fused tweets content as temporal text data and the user posting behavior information using DL by applying a DNN to detect bots. The DL frameworks used in the BeDM are CNN and LSTM. Compared to Boosting (Gilani et al. 2016 ; Lee et al. 2006 ; Morstatter et al. 2016 ) baselines, the BeDM attained the highest F1 score of 87.32%, which proved the efficacy of the model.

Later in the same year, (Cai, Li, and Zeng 2017a ) proposed analogous work. Yet, the novel Deep Bot Detection Model (DBDM) avoids the laborious feature engineering and automatically learns both behavioral and content representations based on the user representation. Additionally, DBDM took into consideration endogenous and exogenous factors that have an impact on user behavior. DBDM achieved a better results with an F1-score of 88.30%.

Additionally, (Hayawi et al. 2022 ) also proposed a DL framework, DeeProBot used eleven user profile metadata-based features. Five training and five testing datasets were used from Bot Repository. Additionally, the text feature was embedded using GLoVe which aided in enhanced learning from the features. To detect bots, DeeProBot employed a hybrid Deep NN model. On the hold-out test set, DeeProBot gave an AUC of 0.97 for bot detection.

However, in a novel framework called GANBOT (Najari et al. 2022 ) modified the (Generative Adversarial Network) GAN concept. The generator and classifier were connected via an LSTM layer as a shared channel between them, reducing the convergence limitation on Twitter. By raising the likelihood of bot identification, the suggested framework outperformed the existing contextual LSTM technique. A total of 8386 from the Cresci2017 dataset were used. Results were assessed for four distinct vector dimensions: 25D, 50D, 100D, and 200D; the highest result was 949/0.951 for 200D.

A total of seventeen state-of-the-art methods for bot detection were described by (Kenyeres and Kovács 2022 ) together based on DL models. They classified Twitter feeds as bots or humans, based solely on the account’s textual form of the tweets. PAN 2019 Bots and Gender Profiling task (Rangel and Rosso 2019 ) dataset was used which consisted of 11,560 labeled users. The core of seven models was based on LSTM networks, four based on Encoder Representations from Transformers (BERT) models, and one a combination of the two. For tweet classification, the best accuracy was obtained using fine-tuned BERT model of 0.828. While for account classification, the Adaboost model archived the best accuracy of 0.9. Their findings demonstrate that, even with a small dataset, DL models may compete with Classical Machine Learning (CML) methods.

Moreover, (Martin-Gutierrez et al. 2021 ) provide a multilingual method for detecting suspect Twitter accounts through DL. Dataset used in their work was collected using Twitter API of 37,438 Twitter accounts. Several experiments were conducted using different combinations of Word Embeddings to obtain a single vector regarding the text-based features of the user account. These features are later on concatenated with the rest of the metadata to build a potential input vector on top of a Dense Network denoted as Bot-DenseNet. The comparison of these experiments showed that the Bot-DenseNet when using the so-called RoBERTa Transformer as part of the input feature vector with an F1-score of 0.77, produces the best acceptable trade-off between performance and feasibility.

In this research, (Ping and Qin 2019 ) proposed a social bot detection model DeBD based on the DL algorithm CNN-LSTM for Twitter. CNN was used by DeBD to extract the joint features of the tweet content and their relationship. To carry out the experiments, a dataset of 5132 accounts was created. Secondly, the potential temporal features of the tweet metadata were extracted using LSTM. Finally, in order to achieve the purpose of detecting social bots, the temporal features were finally fused with the joint content features. The dataset used in this experiment was from (Cresci et al. 2017 ). All the experiments achieved a detection accuracy of more than 99%.

Daouadi et al. ( 2019 ) proved that a Deep Forest algorithm combined with thirteen metadata-based features is sufficient to accurately identify bot accounts on Twitter. Two datasets were used which were published by (Lee et al. 2006 ; Subrahmanian et al. 2016 ). The Twitter API was used to gather the dataset. The implementation was performed for more than 30 conventional algorithms, including Bagging, MLP, AdaBoost, RF, SL, etc. With an accuracy of 97.55%, the Deep Forest method surpassed the other conventional supervised learning techniques.

In this paper, (Cable and Hugh 2019 ) implemented the algorithms: NB, LR, Kernel SVM, RF, and LSTM-NN to identify political trolls across Twitter and compared their accuracies. A dataset of tweet ids related to the 2016 elections was used by scraping the Twitter API and obtaining a total of 142,560 unique tweets. The features were extracted using several methods: Word count, TF-IDF, and Word embeddings. The LSTM-NN obtained a test accuracy of 0.957.

Since it is important to determine the best features for enhancing the detection of social bots. To locate these ideal features, (Alothali, Hayawi, et al. 2021b ) offer a hybrid feature selection (FS) technique. This method evaluates profile metadata features using random forest, naive Bayes, support vector machines, and neural networks. Using a public dataset made accessible by Kaggle that had a total of 18 profile metadata features, they investigated four feature selection approaches. In order to find the best feature subset, they employed filter and wrapper approaches. They discovered that, when compared to other FS methods, the cross-validation attribute evaluation performed the best. According to their findings, the random forest classifier has the best score using six optimal features: favorites count, verified, statuses count, average tweets per day, lang, and ID.

Lastly, (Sengar et al. 2020 ) proposed both ML and DL to distinguish bots from genuine users on Twitter. This was done by gathering user activity and profile-based features, then applying supervised ML and NLP to accomplish the goal. A labeled Twitter dataset which contains more than 5000 users and 200,000 tweets was used to train the classifiers. After analysis and feature engineering, eight features were extracted. Different learning models were compared and analyzed to determine the best-performing bot detection system namely KNN, DT, RF, AdaBoost, GB, Gaussian Naive Bayes (GNB), MNB, and MLP. Results showed that NN-based MLP algorithm gave the most accurate prediction with an accuracy of 95.08%. A CNN architecture was proposed for tweet level analysis by combining user and tweet metadata. The MIB Dataset (Cresci et al. 2017 ) was used. The novel approach gave a staggering improvement. RF and GB gave the highest accuracy of 99.54%.

3.1.9 Twitter—detecting spambots

Some studies demonstrate the detection of spammers, starting with a hybrid method for identifying automated spammers based on their interactions with their followers was presented (Fazil and Abulaish 2018 ). Nineteen distinct features were retrieved, integrating community-based features with those from other categories like metadata-, content-, and interaction-based features. A real public dataset of 11,000 labeled users was used. The performance was analyzed using three supervised ML techniques namely RF, DT, and BN which were implemented in Weka. All three metrics—DR-0.976, FPR-0.017, and F-score 0.979, were found to be the best for RF. Lastly, it was determined that interaction- and community-based features are the most successful for spam identification in comparison after executing a feature ablation test and examining the discrimination capability of various features.

Oentaryo et al. ( 2016 ) categorized bots based on their behavior as broadcast, consumption, and spambots. A systematic profiling framework was developed which included a set of features and a classifier bank. Numeric, categorical, and series features were taken into consideration. The private manually labeled dataset used consisted of bots and non-bot 159 K accounts. Four supervised ML algorithms were employed which include: NB, RF, SVM, and LR. It was seen that LR outperforms the other classifiers by depicting an F1 score of 0.8228.

The research conducted by (Heidari et al. 2020 ) firstly, they created a new public data set containing profile-based features for more than 6900 Twitter accounts from the (Cresci et al. 2017 ) dataset where the input feature set consisted of age, gender, personality, and education from users’ online posts. To build their system, they compare the following classifiers: RF, LR, AdaBoost, Feed-forward NN (FFNN), SGD. The results showed that the FFNN model with 97% accuracy provides the best results as compared with the other classifiers. Lastly, a new bot detection model was introduced which uses a contextualized representation of each tweet by using Embeddings from Language Model (ELMO) and Global Vectors for Word Representation (GloVe) in the word embedding phase to have a complete representation of each tweet’s text. The model created multiple FFNN’s models on top of multilayer bidirectional LSTM models to extract different aspects of a tweet’s text. The model detected bots from human accounts, regardless of having the same user profile and achieved 94% prediction accuracy in two different testing datasets.

A spam detection AI approach for Twitter social networks was proposed by (Prabhu Kavin et al. 2022 ). The dataset (7973 accounts) was collected using Twitter Rest API and combined with the public dataset “The Fake Project” (Cresci et al. 2015 ). For pre-processing, dataset tokenization, stop word removal, and stemming were applied. User-based and content-based features were extracted from the dataset. To develop the model, a variety of ML methods, including SVM, ANN, and RF, were applied. With user-based features, the findings showed that SVM had the highest precision (97.45%), recall (98.19%), and F measure (97.32%).

In this research, (Eshraqi et al. 2016 ) determined a clustering algorithm that identified spam tweets (anomaly problem) on the basis of the data stream. The dataset consisted of 50,000 Twitter user accounts and 14 million tweets. The pre-processing was done by RapidMiner and then, transferred into Massive Online Analysis (MOA) for implementation. The features extracted were based on Graphs, Content, Time, and Keywords. When using the DenStream algorithm (Cao et al. 2006 ), regulating needed to be done properly. The model successfully identified 89% of available spam tweets. Furthermore, the results achieved by the model showed an accuracy of 99%.

Mateen et al. ( 2017 ) used 13 user-, content—as well as graph-based features to classify between human and spam profiles. The real public dataset used for this study was provided by (Gu 2022 ) which consisted of 11 K user accounts and 400 K tweets approximately. Three classifiers namely J48, DE, and NB were used for evaluation. J48 and DE outperformed the other classifiers using the hybrid technique of combined features by showing a 97.6% precision. Results showed that for the dataset employed, the hybrid technique significantly improved precision and recall. Additionally, compared to content- and graph-based features, which demonstrated 92% accuracy, user- and graph-based features correctly classified only 90% of cases.

Moreover, (Chen et al. 2017a , b ) found that over time, the statistical characteristics of spam tweets in their labeled dataset changed, which impacted the effectiveness of the existing ML classifiers and is known as Twitter spam drift. Using Twitter's Streaming API, a public dataset of 2 million tweets was gathered. The Web Reputation Technology from Trend Micro was used to identify the tweets that were considered spam. The Lfun system, which was learned from unlabeled tweets, was proposed. Day 1 training and Day 2 to Day 9 testing results showed that RF only obtained DR ranging from 45 to 80%, whereas RF-Lfun increased to 90%. The Detection Rate of RF was roughly 85% from Day 2 training to Day 10 testing, but that of RF-Lfun was over 95%.

Kumar and Rishiwal ( 2020 ) explored and provided a framework for identifying spammers, content polluters, and bots using a ML approach based on NN usage. A data set consisting of 5572 tweets containing the text messages and their categorization labeling was used. Various algorithms were trained mainly MNB, Bernoulli, NB, SVM, and Complementary NB. The most effective and best classification of spam account detection was shown by MNB with an accuracy of 99%.

In this study, (Güngör et al. 2020 ) used a dataset of 714 tweets that had been manually labeled and retrieved through the Twitter API. Eight profile-based features and five tweet-based features were extracted and analyzed. Additionally, a set of guidelines had been discovered via adding followers and friend FF rate, and spam accounts had been detected. For this experiment, the algorithms NB, J48, and LR were used. J48 performed the best, achieving an accuracy of 97.2%. In conclusion, the accuracy rate increased as a result of the usage of both tweet- and profile-based features.

By utilizing a dataset of 82 accounts of tweeters who use both Arabic and English, (Al-Zoubi et al. 2017 ) improved spam identification. J48, MLP, KNN, and NB were the algorithms used and compared in tenfold cross-validation with stratified sampling as a training/testing methodology. With an accuracy of 94.9, J48 demonstrated the best spam detection ability using the top seven features discovered by ReliefF.

For bot detection, (Heidari et al. 2021 ) analyzed the sentiment features of tweets' content for each account to measure their impact on the accuracy of ML algorithms. The authors have used (Cresci et al. 2017 ) dataset of the size of 12,736 accounts and 6,637,615 tweets. The bot detection methodology proposed by the authors is centered on the number of tweets that show a concentration on extreme opinions for an individual account. Whether the opinions are overly negative, positive, or neutral, it indicates the user is a bot. ML models such as RF, NN, SVM, and LR were examined using the proposed sentiment features. The highest result was achieved using Support Vector Regression (SVR) with an F1-score of 0.930.

The research work (Rodrigues et al. 2022 ) focused on identifying live tweets as spam or ham and performed sentiment analysis on both live and stored tweets to classify them as either positive, negative, or neutral. The proposed methodology used two different datasets from Kaggle. Vectorizers like TF-IDF and BoW models were used to extract sentiment features, which were then fed into a variety of ML and DL classifiers. The classifiers achieved the highest accuracy rate using LSTM in both spam detection with 98.74% and sentiment analysis with 73.81% accuracy.

The work (Andriotis and Takasu 2019 ) proposed a content-based approach to identify spambots. Technically, four public datasets were used in this study, which was (Cresci et al. 2017 ; Varol et al. 2017 ; Yang et al. 2012 , 2013 ). Collectively, the datasets contain tweets of nearly up to 20 K accounts of both bots and genuine users. The methodology proposed employed metadata, content, and sentiment features. Furthermore, the performance of the KNN, DT, NB, SVM, RF, and AdaBoost algorithms was tested. AdaBoost showed the best result with a 0.95 F1-score. Additionally, the study depicted that sentiment features add value when combined with known features to bot detection algorithms.

Also, (Sadineni 2020 ) detect spam using a dataset from Kaggle that included 950 users and ten content-based attributes, demonstrating that SVM and RF outperform NB in terms of performance.

On the other hand, (Kudugunta and Ferrara 20182018 ) presented a contextual LSTM architecture based on a DNN that uses account metadata and tweet text to identify bots at the tweet level. The tweet text served as the primary input for the model. It was tokenized and converted into a series of GloVe vectors before being fed into the LSTM, which then fed the data into a 2-layer NN with ReLU activations. High classification accuracy can be attained using the suggested model. Additionally, the compared techniques for account-level bot identification that used synthetic minority oversampling reached over 99% AUC.

In this study, Arabic spam accounts were detected using text-based data with CNN models and metadata with NN models by (Alhassun and Rassam 2022 ) utilizing Twitter's premium API, and a dataset of 1.25 million tweets was collected. By flagging terminated accounts, data labeling was carried out. 13 features based on tweets, accounts, and graphs were retrieved. The findings demonstrated that the suggested combination framework used premium features to reach an accuracy of 94.27%. The performance of spam detection improved when premium features were compared to standard features when used with Twitter.

An efficient technique for spam identification was introduced by (Inuwa-Dutse et al. 2018 ). They suggested an SPD Optimized set of features that are apart from historical tweets. They focused on user-related attributes, user accounts, and paired user engagement. MaxEnt, Random Forest, ExtraTrees, SVM, GB, MLP, MLP+, and SVM were among the classification models that were utilized and evaluated based on three datasets, Honeypot (Lee et al. 2006 ), SPDautomated, and SPDmanual. The performance reached a peak of 99.93% when using GB on the SPD Optimized set. This technique can be used in real-time as the first step in a social media data gathering pipeline to increase the validity of research data.

Instead of employing the LCS method, (Sheeba et al. 2019 ) discovered spams using the RF classifier technique. The study used a dataset of 100,000 tweets. Latent Semantic Analysis was used to further identify the account after the RF classifier had identified it as a spambot using Latent Semantic Analysis (LSA). The proposed approach delivered benefits in terms of time consumption, high accuracy, and cost effectiveness.

An approach to spam identification based on DL methods was developed by (Alom et al. 2020 ). CNN architecture was utilized for the text-based classifier, while CNN and NN were merged for the combined classifier to classify tweet text and metadata, respectively. On two distinct real-world public datasets, Honeypot (Lee et al. 2006 ) and 1KS-10 K (Yang et al. 2013 ), the suggested approach's performance was compared to those of five ML-based and two DL-based state-of-the-art approaches. For the datasets Honeypot and 1KS-10KN, the accuracy of 99.68% and 93.12%, respectively, was attained.

In this research, (Reddy et al. 2021 ) implemented some supervised classification algorithms to detect spammers on Twitter. Information was obtained from tweepyAPI which comprised 2798 accounts in the training set and 578 accounts in the test set. Eighteen profile-base features were extracted. In terms of accuracy, Extreme Machine Learning (EML) obtained a better accuracy of 87.5.

3.1.10 Twitter—detecting sybil bots

Firstly, (Narayan 2021 ) used ML algorithms for the detection and successful identification of bogus Twitter accounts/bots. The algorithms used were DT, RF, and MNB. The dataset used included 447 Twitter accounts. Twitter API was used for the excavation of the data. DT has been found to be more accurate as compared to RF and MNB.

In their work, (Bindu et al. 2022 ) proposed three efficient methods to successfully detect fake accounts. The classification algorithms used were as follows: Linear and radial SVM, RF, and KNN. The data set used contained a total of 3964 records. RF gave more accurate prediction results accordingly overcoming the overfitting problem. The K-Fold Cross-Validation Scores for RF include a mean of 0.979812 and a standard deviation of 0.019682. On the other hand, in comparison Radial SVM did not perform well, and gave more False Negatives. However, using the Ensemble approach, higher accuracy was achieved.

Likewise, (Alarifi et al. 2016 ) studied the features used for detecting sybil accounts. Twitter4j was used to gather a manually labeled sample dataset of 2000 Twitter accounts (humans, bots, and hybrid-both human and bot tweets). Eight content-based features were selected. Four supervised ML algorithms which include J48 (C4.5), Logistic Model Tree, RF, Logitboost, BN, SMO-P, SMO-R, and multilayer NN were used. RF performed the best with a DR of 91.39 for two-class and 88.00 for three-class classification. Lastly, in order to maximize the use of the classifier, the authors developed an efficient browser plug-in.

David et al. ( 2017 ) leveraged a public labeled dataset from the project BoteDeTwitter to build half of their data set related to Spain politics. Using the Twitter API, a sample of 853 bot profiles and the most recent 1000 tweets from each user's timeline was collected. To create an initial feature set, 71 features based on profiles, metadata, and content were extracted. The following supervised ML methods were compared: RF, SVM, NB, DT, and NNET. Even though the increases were not significant after the first six features, RF managed to get the highest average accuracy of 94% by using 19 features.

In (van der Walt and Eloff 2018 ) paper, Twitter data were mined using the twitter4J API and a non-relational database yielding a total of 169,517 accounts. Engineered traits that had previously been used to successfully identify fraudulent accounts made by bots were added to a sample of human accounts. Without relying on behavioral data, these features were applied to several supervised ML models, enabling training on very little data. The results show that engineered traits, which were previously employed to identify fake accounts created by bots, could only reasonably predict fake accounts created by humans with an F1 score of 49.75%.

Kondeti et al. ( 2021 ) implemented ML to detect fake accounts on the Twitter platform. Different ML algorithms were used such as SVM, LR, RF, and KNN along with six account metadata features likes, Lang-code, sex-code, status-count, friends-count, followers-count, and favorites-count. Further to improve these algorithms’ accuracy, they used two different normalization techniques such as Z-Score and Min–Max. Their approach achieved high accuracy of 98% for both RF and KNN models.

Khaled et al. ( 2019 ) suggested a new algorithm—SVM-NN to efficiently detect sybil bots. Four public labeled datasets were used by the authors. A total of 4456 accounts of both fake and human classes, result from combining them. Sixteen user-based numerical features were extracted from the datasets after applying features reduction, and they were then fed into the SVM, NN, and SVM-NN algorithms. The authors of the researchers assert that their novel SVM-NN uses fewer features than existing models. SVM-NN was the best-performing algorithm as it showed an accuracy of around 98%.

In the study, (Ersahin et al. 2017 ) collected their own dataset of fake and real accounts using Twitter API. The dataset consisted of 1000 accounts’ data later pre-processed using Entropy Minimization Discretization (EMD) on sixteen user-based numerical features. NB with EMD showed the best result with 90.41% accuracy.

However, in order to predict sybil bots on Twitter using deep-regression learning, (Al-Qurishi et al. 2018 ) introduced a new model. The authors used two publicly available labeled datasets that had been generated during the 2016 US election and collected using Twitter API. The first dataset consisted of 39,467 profiles and 42,856,800 tweets. Whereas the second dataset consisted of 3140 profiles and 4,152,799 tweets. The authors extracted 80 online and offline features based on Profile-, Content- (Temporal, Topic, Quality, and Emotion-based), and Graph. Accordingly, the features were fed into the Deep Learning Component (DLC) FFNN. When fed with noisy and unclear data, the results depicted an accuracy of 86%. Categorical features showed clear segregation that all sybil bots disable their geographical location and have an unverified account. While numerical features showed that sybil bots have a noticeably young account age (recently created). Additionally, the number of re-post and mentions are significantly higher in the sybil's accounts.

Gao et al. ( 2020 ) proposed a content-based method to detect sybils. The proposed method included three main phases: CNN, bi-SN-LSTM, and the dense layer and softmax classifier stacked to output the classification results. The proposed bi-SN-LSTM network, in contrast to the bi-LSTM, employs SELU as the activation function of its recurrent step, enabling limitless modifications to the state value. The proposed model achieved a high F1-score of 99.31% on the “My Information Bubble” (Cresci et al. 2015 ) dataset.

3.1.11 Weibo—detecting social bots

Data collection, feature extraction, and detection modules were all included in the DL technique known as TPBot proposed by (Yang et al. 2022 ). To begin with, the data collection module used a web crawler to obtain user data from Sina Weibo using dataset collected by (Wu et al. 2021 ). Then, depending on each user's profile, the feature extraction module extracted temporal-semantic and temporal-metadata features. Finally, in the detection module, a detection model based on BiGRU was developed. TPBot outperformed baselines, by achieving an F1-score of 98.37%. Additionally, experiments were carried out on two Twitter datasets (Cresci et al. 2015 , 2017 ) to assess the generalization capabilities of TPBot, and on both datasets, it outperformed the baselines.

Behavioral analysis and feature study were performed by (Dan and Jieqi 2017 ) to extract the effective features of Weibo accounts and build a supervised model to detect bots. A dataset of 5840 accounts from the Sina-Weibo data warehouse was used to discriminate between real and bot users. Eleven users’ behavioral-based features were extracted and fed into DT, C4.5, and RF algorithms. The RF algorithm performed measurably better with a 0.944 F-measure.

Moreover, (Huang et al. 2016 ) built a classifier that combined NB and Genetic Algorithm on Weibo. The genetic algorithm was used to create an optimal threshold matrix which efficiently increased the precision of the model by improving the conditional probability matrix. Two models were built using two different datasets. One dataset (1000) was crawled by R and the other consisted of spammers purchased from the sales platform (600) and legitimate users crawled from friends and relatives (400). 9 profile-based features were set as attributes. In the comparison of the performance with LR, DT, and NB showed a higher precision of 0.92.

3.1.12 Weibo—detecting spambots

In this paper, for effective spammer detection, an EML-based supervised ML approach was proposed by (Zheng, Zhang, et al. 2016 b ). The study started by crawling Weibo data to create a labeled dataset. 1000 messages, both spam and normal, were chosen from the collected dataset. Message content and user behavior-based features were then extracted for a total of 18 features, which were then fed into the classification algorithm. With a TPR of spammers and non-spammers reaching 99 and 99.95%, respectively, the experiment and evaluation demonstrated that the suggested approach offers good performance.

Zheng, Wang, et al. ( 2016a ) proposed a two-phase-based spambot detection approach. In the first phase, authors took existing work about user features. In the second phase, the authors introduced content mining for spambot detection. Using web crawlers, a dataset of 517 accounts and 381,139 tweets was collected. Eighteen behavioral and content-based features were extracted. The experiment results were compared with SVM, DT, NB, and BN algorithms. The proposed two-phased method performed better than the mentioned algorithms with an accuracy of 90.67%.

However, (Wu et al. 2021 ) used DNN and active learning (DABot) as a technique to detect bots. They classified bots into three types: spammers, bots that engage with accounts to increase impressions, and bots involve with politics. Thirty features were extracted and classified as metadata, interactions, contents, and time. A data collection of 20 K users and 214,506 posts from all users was produced as a consequence of the authors manually labeling the user accounts. Different stages made up the modeled architecture: data input for each user, ResNet block, BiGRU block, Attention layer, and Interference layer.

Another spam detection technique was put forth by (Xu et al. 2021 ) and relied on the self-attention Bi-LSTM NN model in conjunction with ALBERT. Two datasets were employed in the experiment: one self-collected (582 accounts) and the other microblogPCU (2000 accounts). They converted the text from social network sites into word vectors using ALBERT and then, input those word vectors into the Bi-LSTM layer. The final feature vector was created after feature extraction and combination with the information focus of the self-attention layer. To get the result, the SoftMax classifier performed classification.

3.1.13 Weibo—detecting sybil bots

In this research, (Bhattacharya et al. 2021 ) suggested a detection model that performed improved prediction of fake Weibo accounts using a variety of Ensemble ML algorithms. The 918 HTML pages that made up the public Weibo dataset were obtained from Kaggle. Data scraping was used to construct the fake accounts dataset. Content-based attributes were extracted. Five supervised models—RF, SVC, NB, LR, and GB—were taken into consideration. For determining the final result, the RF classifier's highest F1 score of 0.93, precision, and recall were taken into account. Finally, a plot confusion matrix revealed an inaccurate prediction for 44 accounts, providing the opportunity for additional research.

3.2 Using semi-supervised ML

Few studies on only two platforms have implemented semi-supervised ML to detect spambots and sybil bots which are discussed below.

3.2.1 Twitter—detecting spambots

Sedhai and Sun ( 2018 ) were the earliest that utilized a semi-supervised approach for spam detection. Their proposed S3D approach contains two main components which are spam detection components in real-time mode, and model update components in batch mode to periodically update the detection models. For spam detection, they apply four detectors which are a blacklisted domain detector using blacklisted URLs, a near-duplicate detector to label near-duplicate tweets using clustering, a reliable ham detector to label tweets that are posted by trusted users and that do not contain spammy words, and a multi-classifier using NB, LR, and RF models to labels the remaining tweets. Their approach achieved good accuracy results for spam detection on the public HSpam14 dataset along with four types of features to represent tweet and cluster Hashtag, in addition to being effective in detecting new spamming forms.

In this research work, (Alharthi et al. 2019 ) proposed a semi-supervised ML technique that classified Twitter accounts as spam or genuine accounts based on their behavior and profile information. A dataset consisting of (500) active Arab users was collected through a Twitter API and manually labeled. Label spreading and label propagation algorithms were implemented using 16 extracted features. The features (TweetsAverage), (Number of the accounts’ followers to the number of his/her friends), (Tweet Source), and (is all the tweets have the same source?) were proven to be the most efficient features. The proposed model achieved the following results an F-measure of 0.89, an accuracy of 0.91, and an AUC of 0.90.

3.2.2 Twitter—detecting sybil bots

In this study, (Zeng et al. 2021 ) used semi-supervised self-training learning by utilizing a Kaggle data set of real and fake Twitter accounts. In this suggested technique, a self-training method was applied to automatically classify Twitter accounts. Further, to effectively reduce the impact of class imbalance on the identification effect, the resampling technique was incorporated into the self-training process. The proposed framework displayed good identification results on six different base classifiers, particularly for the initial batch of small-scaled labeled Twitter accounts.

3.2.3 Weibo—detecting spambots

Only a single study based on a semi-supervised approach by (Ren et al. 2018 ) detected spambots on Weibo. The authors have collected the dataset (31,147 users and 754,112 tweets) using a crawler. Behavioral and Content-based features were utilized to feed the model. Compared to NB, LR, SVM, and J48 algorithms, the proposed approach showed better results in all the evaluation metrics applied.

3.3 Using unsupervised ML

Few studies on only three platforms have implemented unsupervised ML to detect social bots, spambots and sybil bots which are discussed below.

3.3.1 Facebook—detecting spambots

Sohrabi and Karimi ( 2018 ) carried out the Facebook platform's spam filtering mechanism for posts and comments. Different exploration techniques and optimization techniques, including PSO, simulated annealing, ant colony optimization, and Differential Evolution (DE) could be used with the suggested filtering strategy. Seven metadata features were recovered from the dataset, which was made up of 200,000 wall posts and comments on them. They examined the DB index and DE clustering method, SVM, and DT, three algorithms with PSO-based feature selection. The hybrid algorithm created by integrating SVM and clustering techniques produced the best outcomes.

3.3.2 Facebook—detecting sybil bots

Fake Facebook profiles Detection using a group of supervised and unsupervised mining algorithms was performed by (Albayati and Altamimi 2020 ). The main components were the Crawler and the analyzer modules. A dataset of 982 profiles and a set of 12 behavioral and profile-based features. In the analyzer module, using the mining tool RapidMiner Studio, they implemented two unsupervised algorithms, K-Means and K-Medoids, along with three supervised algorithms: ID3, KNN, and SVM. The findings of the performance evaluation method revealed that supervised algorithms outperformed unsupervised algorithms in terms of accuracy rates. With a 97.7% accuracy rate, ID3 surpasses other classifiers.

3.3.3 Instagram—detecting sybil bots

In this paper, (Munoz and Paul Guillen Pinto 2020 ) detected fake profiles on Instagram. Web scrapping techniques were used for data extraction on the third-party site to Instagram. A dataset of 1086 true and false profiles was designed. 17 features were extracted based on metadata and multimedia information. Various ML algorithms such as DT, LR, RF, MLP, AdaBoost, GNB, Quadratic Discriminant Analysis, Gaussian process classification, SVM, and NN were deployed. RF obtained the best accuracy of 0.96 as well as the best true and false prediction precision.

3.3.4 Twitter—detecting social bots

A bot detection technique was put forth by (Chen et al. 2017a , b ) that used shortened URLs and tweeting almost duplicate content over an extended period of time to look for a particular class of malicious bots. This method automatically gathered bot groups from real-time Twitter streams as opposed to earlier work. The following nine URL shortening services were investigated: bit.ly, ift.tt, ow.ly, goo.gl, tinyurl.com, dlvr.it, dld.bz, viid.me, and ln.is. The model is made up of four sequentially operating parts: a crawler, a duplicate filter, a collector, and a bot detector. In order to conduct the experiment, 500,000 tweets were collected. According to the experiments, bot networks and accounts made up a mean of 10.5% of all accounts that employed shortened URLs.

Interestingly, (Mazza et al. 2019 ) presented a visualization technique named Retweet Tweet (RTT) for gaining insights into the retweeting behavior of Twitter accounts. For the purpose of identifying retweeting social bots, Retweet-Buster (RTBUST), an unsupervised group-analysis method, was employed. Using the Twitter Premium Search API, a dataset of 10 M Italian retweets shared by 1446, 250 unique users was compiled. RTBust was built around an LSTM variational autoencoder. Based on the results of the Hierarchical Density-Based Spatial Clustering (HDBSCAN) algorithm, it was decided whether the account was a bot or legitimate. In comparison with using it with PCA and TICA, the proposed RTBUST technique using the VAE produced the best detection performance, i.e., F1 = 0.87.

Anwar and Yaqub ( 2020 ) proposed a quick way to isolate bots from the Twitter discussion space. The dataset used was unlabeled data collected through Twitter Search API during the 2019 Canadian elections. It consisted of 103,791 accounts and 546,728 tweets. 13 metadata features were extracted using PCA implemented in K-means clustering. Results showed that bots have a higher rate of retweet percentage, daily tweets, and daily favorite count, which are incorporated with the known characteristics of bots.

In this paper, to enhance the detection accuracy of social bots, (Wu et al. 2020 ) proposed an improved conditional GAN to extend imbalanced data sets prior to applying training classifiers. The Gaussian kernel density peak clustering algorithm (GKDPCA), an unsupervised modified clustering algorithm, was put into practice. 2433 users’ data was compiled into a dataset. On the basis of six different feature types—user meta-data, sentiment, friends, content, network, and timing, eleven different features were retrieved. With an F1 score of 97.56%, the enhanced CGAN performed better than the three popular oversampling methods.

Khalil et al. ( 2020 ) used two unsupervised clustering algorithms DBSCAN and K-Mean. Six publicly available datasets (2232, 3465, and 1969) were used mentioned in (Kantartopoulos et al. 2020 ). Eight profile-based features were extracted. It was concluded that DBSCAN performed better by achieving an accuracy of 97.7%.

The second contribution of (Barhate et al. 2020 ) is aimed at using an unsupervised ML approach. Hashtag data from the Twitter API was mined and a dataset of 140 K users was created. Using the PCA and K-means clustering algorithms, users were divided into four groups based on activity-related features. This enabled the analysis of each cluster's bot percentage. The age distribution of users in a trending hashtag was also plotted by the authors.

3.3.5 Twitter—detecting spambots

Some analyses were able to detect spammers successfully using unsupervised learning methods for instance, (Cresci et al. 2016 ) put forth a novel behavioral-based unsupervised approach for spambots accounts detection, inspired by biological DNA. The proposed methodology extracts and analyzes digital DNA sequences from users’ actions. The authors manually created a dataset (4929 accounts) of verified spambot and genuine accounts. Each account got associated with a string that encodes its behavioral information. Compared to other benchmark work done, DNA fingerprinting model achieved the highest result with an MCC of 0.952.

Furthermore, (Koggalahewa et al. 2022 ) proposed an unsupervised spammer detection approach. In Stage 1, the clustering based on user interest distribution was performed. In Stage 2, spam detection was performed based on peer acceptance. Lastly, by assessing the user’s peer acceptability against a threshold, a user was categorized as spam or genuine. Three datasets were used namely Social Honey Pot (Lee et al. 2006 ), HSpam14 million Tweets, and The Fake Project (Cresci et al. 2017 ). Detection accuracies pointed out that three features Local Outlier StandardScore (LOSS), Global Outlier Standard Score (GOSS), and Entropy when combined gave the best results. SMD performed the best with an accuracy of approximately 0.98 on the three datasets.

4 Discussion

To begin with, from all the reviewed studies we noticed that Twitter is the most researched platform with a total of 71 studies carried out, followed by 12 studies on Facebook, 11 studies on Instagram, 9 studies on Weibo, and lastly only 2 studies were conducted on LinkedIn. Appendix Table 2 summarizes all the reviewed ML-based studies focusing on the dataset used, feature’s type, best-performing algorithm, and the highest result obtained, respectively. With respect to the most detected type of bot on each platform, Twitter had 36 studies on social bots, Facebook had 7 studies on sybil bots, Instagram had 8 studies on sybils, Weibo had 5 studies on spambots, and lastly, LinkedIn had only 2 studies which were on sybils.

Researchers in the reviewed papers used different datasets both publicly available and self-created to evaluate their models to classify bots from humans on the five addressed social media platforms. A summary of the 38 publicly available datasets has been provided in Appendix Table 3 . From the Appendix Table 3 , the most widely used datasets are MIB datasets which are the Cresci2017 and Cresci2015. However, the Cresci2017 dataset was the most used dataset by researchers because it includes five distinguished types of social media bots, namely genuine accounts, social spambots, traditional spambots, fake followers or Sybil, and a test set consisting of a mix between genuine and social spambots. Besides the variety of dataset’s bot types, it is relatively a recent and large labeled dataset consisting of 12,736 accounts and 6,637,615 tweets in total, which may have attracted researchers to conduct their studies using the Cresci2017 dataset to detect spam and social bots in the Twitter platform. While Cresci2015 includes three fake follower’s datasets, and two human accounts datasets making it more efficient in the detection of sybil bots on Twitter. The Fake Project dataset is one of the Cresci2015 which is much more used together with Honeypot dataset to detect spambots on Twitter. Different Kaggle’s public datasets were used to detect different types of bots on Twitter. Due to the majority of papers related to Twitter compared to other platforms, more provided datasets are publicly available than the self-collected (private datasets) one. While other platforms such as Facebook and Instagram have more datasets that were self-created (private datasets). Weibo has almost equal types of datasets while LinkedIn has only self-created. Figure  4 illustrates public and collected datasets on each platform.

figure 4

Datasets distribution on each platform for the reviewed papers

Despite the fact that there are numerous datasets available, some of them only contain human or bot IDs and labels. As a result, scraping is done using the appropriate collection API or method to obtain profile features or other information from an ID or account. For instance, the Twitter API is used to gather real-time datasets from publicly accessible Twitter data (Rodrigues et al. 2022 ). Many researchers have created their own datasets using these collection methods on different platforms as shown in Appendix Table 4 . In Twitter, Twitter API was the most used collection method while methods like Twitter4j, Tweepy, ML, Twitter Premium Search API, and REST API were less used. Instagram datasets were collected using Instagram API, Selenium Web Driver tool, 3rd-party Instagram websites, and some manually. For Facebook, mostly used Facebook Graph API to collect data while web crawler was mostly used on the Weibo platform. Lastly, for LinkedIn in only two studies, the dataset was collected manually.

To distinguish between human and automated users on social media platforms, it's critical to identify an ideal collection of attributes (Alothali, Hayawi, et al. 2021b ). A general observation was made that bots have a high friend-to-follower ratio and a low follower growth rate. This can be done by using a variety of features that have been reported in various studies. On the basis of the extracted features in all the reviewed papers, the features were classified into the following categories: Content/Language, User (Profile), Metadata, Behavioral, Network (Community/Interaction), Sentiment, Timing/Temporal, Graph, Numeric/Categorical/Textual/Series, Statistical¸ User Friends, Media and Engagement, Entity and Link¸ Keywords, Internet Overlap, hashtag features, and Periodic features. Content features are based on linguistic cues computed through NLP, mainly part-of-speech tagging. User features are based on properties of the users’ accounts and users’ relationships. User Meta Data features are information regarding the profile's characteristics. Locating an information source via metadata is known to be effective. Behavior features are calculated by statistical properties from the data. Different aspects of information diffusion patterns are captured by network features. General-purpose and Twitter-specific sentiment analysis algorithms are used to build sentiment features. Time features include statistics of time. Graph features are extracted by modelling the social media platform as a social graph model. Descriptive statistics relative to an account’s social contacts are included in user friend features. Interest Overlap features include overlap between two users such as Topical affinity. Appendix Table 5 of the literature review provides examples of features from the reviewed studies as well as a summary of the features used in various social media platforms. According to the table, the most popular feature types from all the reviewed papers are content-based, profile-based, metadata-based, and behavioral-based features on essentially all types of platforms. Content-based features were utilized in 44 studies, followed by user/profile-based features in 42 studies, metadata-based in 27 studies, and behavioral-based in 16 studies. User friend, media, engagement, and keywords on various types of platforms are among the less popular feature types.

In regard to the Twitter platform, 34 studies used profile-based features followed by 32 studies that used content-based features and achieved high results. Meta-data-based features were used in 17 studies. Features based on Timing, Statistical, Keywords, Interaction, Periodic, Latent, Numeric, Categorical, and Series were used only once by single studies and achieved reasonable results. Four studies (Alhassun and Rassam 2022 ; Al-Qurishi et al. 2018 ; Mateen et al. 2017 ; Eshraqi et al. 2016 ) utilized graph-based features. It came to the notice that when (Eshraqi et al. 2016 ) combined graph-based features along with Content, Time, and Keywords, a very high accuracy of 0.99 was achieved. Only 5 studies (Wu et al. 2020 ; Davis et al. 2016 ; Inuwa-Dutse et al. 2018 ; Sayyadiharikandeh et al. 2020 ; Varol et al. 2017 ) made use of the network-based features. (Inuwa-Dutse et al. 2018 ) combined such network- and profile-based features and achieved the highest result AUC 99.93%. The number of features utilized in all 71 studies ranged from as less as 5 features to as high as 1000 features. (Varol et al. 2017 ) and (Davis et al. 2016 ) used approximately 1000 features and achieved an AUC of 0.95 whereas (Fonseca Abreu et al. 2020 ) used only 5 profile-based features and still obtained an AUC of 0.999. Regarding crucial features, interaction- and community-based features hold high value in spambot detection (Fazil and Abulaish 2018 ).

In regard to the Facebook platform, 7 out of 12 studies utilized profile-based features followed by content-based being used by 5 studies. Moreover, by examining the results of this platform’s studies, it can be concluded that the highest results were achieved when profile-based and content-based features were combined hence showing a high accuracy of 0.984 in the research conducted by (Rathore et al. 2018 ). Noteworthy, textual, Categorial, and Numerical-based features were used only in 1 study.

Singh and Banerjee ( 2019 ) but gave promising results (F1-score 0.99). Features such as “likes”, “remarks”, “user activities” contributed the maximum for the detection of sybils. Moving on to the third-most researched platform, Instagram, behavior-, content, and profile-based were used in 4 out of 11 studies. The combination of behavior- and content-based features showed the highest performance with an accuracy of 0.9845. In Weibo-based studies, content-based were the most widely used in addition to behavior-based features. Timing as well as semantic features were the least used but, on the contrary, gave the highest results. Since only 2 studies were found on LinkedIn, they made use of profile and statistical features. This platform needs to be extensively explored using other feature types which include content-, metadata-, behavioral-based, etc.

In terms of the different ML-based (supervised, semi-supervised, and unsupervised) techniques utilized in the reviewed papers which were built to compare and detect different types of social bots, Appendix Table 6 presents a list of all the respective papers that utilized the different algorithms. This Table 6 highlights the classifier bot type with the highest performance achieved for each algorithm. Figure  5 shows the number of papers that utilized each ML algorithm. As shown, RF is the best-performing and most applied algorithm among all algorithms in research and SVM is the second most applied algorithm in research followed by NB, DT, and AdaBoost. The least applied algorithms were GNB, ELM, bi-SN-LSTM, and clustering algorithms were the least applied algorithm. For supervised ML, the best performing algorithms of classical ML algorithms, the best performing were RF, JRip, AdaBoost, with their accuracy reaching up to 99.5%, and the least utilized algorithms were ID3 and GNB. As for DL algorithms, the best performing algorithm was CNN with the highest accuracy of 99.68% achieved though most perform well. The least utilized and least popular algorithm was ELM, even though it is considered simple with less training time. Moreover, it was noticed that ML classifiers work well with small-size datasets and DL algorithms with large-size datasets. However, no algorithm can be considered good or bad as it depends on a number of factors such as the dataset size, data pre-processing, and the number and type of features.

figure 5

Bar chart for ML algorithms

Further in terms of semi-supervised learning, despite them being powerful techniques in terms of discovering patterns in big data only four studies were found: three on Twitter and one on Weibo (Sedhai and Sun 2018 ; Alharthi et al. 2019 ; Zeng et al. 2021 ; Zeng et al. 2021 ). Since large datasets are derived from the Twitter platform which makes labeling an expensive and time-consuming process, semi-supervised techniques such as label propagation and label spreading show the ability to be applied more often. Moreover, integrating resampling along with self-training is a helpful way to reduce the impact of class imbalance when using semi-supervised learning. For the unsupervised learning approach, the DenStream unsupervised clustering algorithm achieved the highest result as compared with other used clustering algorithms like K-Mean and DBSCAN. Though this approach has less popularity and performance compared with the supervised approach (Albayati and Altamimi 2020 ), this method offers the benefit of not requiring a labeled dataset. Since, there was only one paper that applied this approach, hence additional investigation into this particular algorithm is not possible.

To conclude, not much evidence could be drawn from this analysis that the most researched bot type or the most researched social platforms are necessarily the ones most affected by social bots.

5 Challenges and opportunities

In this section, we shall put forth an elaborate discussion on challenges and future research directions based on our study and analysis. The findings showcased that social bot detection is challenging and this challenge is aggravated as the social network volume increases. To begin with, the most detected and researched bot types are social bots (42 studies), followed by spambots (34 studies), and lastly sybil bots (29 studies). Evidentially, Twitter is the most studied social network with a large number of bots of all types, especially social bots, mainly because of how easy it is to collect data through their API and the vast collection of accessible public datasets. However, social networks such as Instagram, LinkedIn, and Weibo need further in-depth study. Specifically, there is a dearth of studies on Facebook and LinkedIn due to the immense difficulty in obtaining publicly available datasets, which is caused by certain strict privacy policies on those networks. LinkedIn, in particular, does not have much of recent studies conducted on it. Furthermore, only sybil bots were found in the publicly available LinkedIn datasets. Moreover, with slight modifications, the ML techniques used for Instagram will have the potential to be applied to LinkedIn. From our studies, we conclude that Cresci2017 is the most used dataset in social media bot research due to its classification of bots based on their types. Whereas, Instagram has a greater number of sybil bots and two studies based on fake engagements. User and content-based features are the most frequently used for Instagram thereby showing high-accuracy results. Nevertheless, there is a scope for more research on this platform. In terms of features, on twitter (Fonseca Abreu et al. 2020 ) showed that even with 5 significant features, high results can be achieved. Therefore, new studies can be carried out by using as less features as possible. On Facebook, since profile-, content-, textual-, categorial-, and numerical-based features contributed high value in various studies, a new research direction can be explored by combining all the above-mentioned five feature types. LinkedIn needs to be extensively explored by using feature types which include content-, metadata-, behavioral-based, etc.

In terms of the reviewed algorithms, it is seen that RF is the best performing in terms of accuracy and the most applied algorithm among all algorithms on all social media platforms in the conducted study. SVM is the second most applied algorithm in research followed by NB, DT, and AdaBoost. DenStream unsupervised clustering algorithm achieved the highest result compared with other used clustering algorithms like K-mean and DBSCAN. Though this approach has less popularity, it has added the advantage of not requiring a labeled dataset. Different algorithms based on Bayes Theorem were used to classify Spam and social bots like NB, MNB, GNB, and NB. However, MNB overperforms the others. Different types of algorithms were used to build Decision Trees like basic DT, J48, JRip and ID3. DT, and J48 were the most applied forms. Yet, the JRip algorithm achieved the best performance among them on spam detection. Different types of boosting algorithms were applied such as AdaBoost, XGBoost, and GradientBoost. AdaBoost was the most applied, whereas GradientBoost performed the best amongst them on social bots detection. The DL approach was mostly applied to detect social bots type on the Twitter platform. Among the different DL algorithms, CNN and LSTM were the highest-performing and most promising algorithms in terms of accuracy.

Comparing algorithms on different platforms, RF achieved the best accuracy result on Weibo and Instagram platform. While AdaBoost achieved highest Detection Rate on Facebook platform. On the other hand, CNN and ANN achieved highest accuracy on twitter platform.

Moving on, future enthusiastic researchers are encouraged to investigate and conduct studies on unstudied social media platforms such as TikTok, Telegram which are known to have bots. As seen above, only four studies employed semi-supervised learning techniques, and a few used unsupervised technique; therefore, these fields need more exploration and contribution. The semi-supervised approach gives unlabeled instances the same weight as labeled ones while also minimizing the cost of labeling the data. More importantly, it is advised that researchers make their datasets available to the scientific community. This will support the training of new models, their testing, and the evaluation of the existing models. Additionally, new public datasets that contain the most recent type of bots are needed. The main gap was observed in the collected research. Only a few papers proposed a ML-technique to detect bots at registration or creation in real-time. As none of the existing research are designed to catch bots and act before they make connections with real users. Whereas in practice, it is desired to detect bots as soon as possible after registration in order to prevent them from interacting with real users. However, this has its own challenges as the bot’s detection needs to be done from the basic information provided during the registration time.

Lastly, as the great novelist, Patricia Briggs quotes “Knowledge is a better weapon than a sword”. The users on various social media platforms need to gain cybersecurity awareness in order to not get deceived and be able to distinguish between bots and benign accounts and be responsible in situations if a malicious bot was recognized to immediately report it to the platform.

6 Conclusion

This paper made an effort to provide a comprehensive review of the existing studies in the area of utilizing ML for bots detection on social media platforms which are affected by three types of bots—social, spam, and sybils to provide a starting point for researchers to identify the knowledge gaps in this field and conduct future in-depth research.

Furthermore, the usage of supervised, semi-supervised, and unsupervised ML-based approaches was also summarized. Numerous ML and DL methods were analyzed for bots detection, including KNN, RF, DT, NB, SVM, KNN, LSTM, ANN, etc. Visual aids were created to analyze the reviewed papers based on the nature of their datasets, the various categories of features, as well as the performance of employed algorithms. From the analysis for bots detection, we discovered that RF exhibited the highest performance in terms of accuracy and is the most frequently used ML algorithm. Whereas CNN and LSTM were the highest-performing and promising DL algorithms in terms of accuracy. Last but not the least, we addressed and listed some of the challenges, limitations as well as recommended suggestions that can be utilized by enthusiastic future researchers for adding more value and thereby contributing to the field of cybersecurity.

Adikari S, Dutta K (2020) Identifying fake profiles in LinkedIn

Akyon FC, Esat Kalfaoglu M (2019) Instagram fake and automated account detection. In: Proceedings—2019 innovations in intelligent systems and applications conference, ASYU 2019. https://doi.org/10.1109/ASYU48272.2019.8946437

Alarifi A, Alsaleh M, Al-Salman AM (2016) Twitter turing test: identifying social machines. Inf Sci. https://doi.org/10.1016/j.ins.2016.08.036

Article   Google Scholar  

Albayati MB, Altamimi AM (2019) An empirical study for detecting fake Facebook profiles using supervised mining techniques. Inf Slovenia. https://doi.org/10.31449/inf.v43i1.2319

Albayati M, Altamimi A (2020) MDFP: a machine learning model for detecting fake Facebook profiles using supervised and unsupervised mining techniques. Int J Simul Syst Sci Technol. https://doi.org/10.5013/ijssst.a.20.01.11

Aldayel A, Magdy W (2022) Characterizing the role of bots’ in polarized stance on social media. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-022-00858-z

Alharthi R, Alhothali A, Moria K (2019) Detecting and characterizing Arab spammers campaigns in Twitter. Proc Comput Sci 163:248–256. https://doi.org/10.1016/j.procs.2019.12.106

Alhassun AS, Rassam MA (2022) A combined text-based and metadata-based deep-learning framework for the detection of spam accounts on the social media platform Twitter. Processes. https://doi.org/10.3390/pr10030439

Ali A, Syed A (2022) Cyberbullying detection using machine learning. Pak J Eng Technol 3(2):45–50. https://doi.org/10.51846/vol3iss2pp45-50

Article   MathSciNet   Google Scholar  

Aljabri M, Aljameel SS, Mohammad RMA, Almotiri SH, Mirza S, Anis FM, Aboulnour M, Alomari DM, Alhamed DH, Altamimi HS (2021a) Intelligent techniques for detecting network attacks: review and research directions. In Sens. https://doi.org/10.3390/s21217070

Aljabri M, Chrouf SM, Alzahrani NA, Alghamdi L, Alfehaid R, Alqarawi R, Alhuthayfi J, Alduhailan N (2021b) Sentiment analysis of Arabic tweets regarding distance learning in Saudi Arabia during the covid-19 pandemic. Sensors 21(16):5431. https://doi.org/10.3390/s21165431

Aljabri M, Altamimi HS, Albelali SA, Al-Harbi M, Alhuraib HT, Alotaibi NK, Alahmadi AA, Alhaidari F, Mohammad RM, Salah K (2022a) Detecting malicious URLs using machine learning techniques: review and research directions. IEEE Access 10:121395–121417. https://doi.org/10.1109/access.2022.3222307

Aljabri M, Alhaidari F, Mohammad RM, Mirza S, Alhamed DH, Altamimi HS, Chrouf SM (2022b) An assessment of lexical, network, and content-based features for detecting malicious urls using machine learning and deep learning models. Comput Intell Neurosci 2022:1–14. https://doi.org/10.1155/2022/3241216

Aljabri M, Alahmadi AA, Mohammad RM, Aboulnour M, Alomari DM, Almotiri SH (2022c) Classification of firewall log data using multiclass machine learning models. Electronics 11(12):1851. https://doi.org/10.3390/electronics11121851

Aljabri M, Mirza S (2022) Phishing attacks detection using machine learning and Deep Learning Models. In: 2022 7th international conference on data science and machine learning applications (CDMA). https://doi.org/10.1109/cdma54072.2022.00034

Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Soc Netw Media. https://doi.org/10.1016/j.osnem.2020.100079

Alothali E, Alashwal H, Salih M, Hayawi K (2021a) Real time detection of social bots on Twitter using machine learning and Apache Kafka. In: 2021a 5th cyber security in networking conference, CSNet 2021a. https://doi.org/10.1109/CSNet52717.2021.9614282

Alothali E, Hayawi K, Alashwal H (2021b) Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-021-00786-4

Alothali E, Zaki N, Mohamed EA, Alashwal H (2019) Detecting social bots on Twitter: a literature review. In: Proceedings of the 2018 13th international conference on innovations in information technology, IIT 2018. https://doi.org/10.1109/INNOVATIONS.2018.8605995

Al-Qurishi M, Alrubaian M, Rahman SMM, Alamri A, Hassan MM (2018) A prediction system of Sybil attack in social network using deep-regression model. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2017.08.030

Al-Zoubi AM, Alqatawna J, Faris H (2017) Spam profile detection in social networks based on public features. In: 2017 8th international conference on information and communication systems, ICICS 2017. https://doi.org/10.1109/IACS.2017.7921959

Andriotis P, Takasu A (2019) Emotional bots: content-based spammer detection on social media. In: 10th IEEE international workshop on information forensics and security, WIFS 2018. https://doi.org/10.1109/WIFS.2018.8630760

Anwar A, Yaqub U (2020) Bot detection in twitter landscape using unsupervised learning. ACM Int Conf Proc Series. https://doi.org/10.1145/3396956.3401801

Attia SM, Mattar AM, Badran KM (2022) Bot detection using multi-input deep neural network model in social media. In: 2022 13th international conference on electrical engineering (ICEENG), p 71–75. https://doi.org/10.1109/ICEENG49683.2022.9781863

Barhate S, Mangla R, Panjwani D, Gatkal S, Kazi F (2020) Twitter bot detection and their influence in hashtag manipulation. In: 2020 IEEE 17th India council international conference, INDICON 2020. https://doi.org/10.1109/INDICON49873.2020.9342152

Bazm, M. and Asadpour, M. (2020) “Behavioral Modeling of Persian Instagram Users to detect Bots.” Available at: https://doi.org/10.48550/arXiv.2008.03951

Beğenilmiş E, Uskudarli S (2018) Organized behavior classification of tweet sets using supervised learning methods. ACM Int Conf Proc Series. https://doi.org/10.1145/3227609.3227665

Benkler Y et al (2017) Partisanship, propaganda, and disinformation: online media and the 2016 U.S. presidential election, search issue lab. Issue lab. Available at: https://search.issuelab.org/resource/partisanship-propaganda-and-disinformation-online-media-and-the-2016-u-s-presidential-election.html . Accessed 9 Oct 2022

Bhattacharya A, Bathla R, Rana A, Arora G (2021) Application of machine learning techniques in detecting fake profiles on social media. In: 2021 9th international conference on reliability, Infocom technologies and optimization (trends and future directions), ICRITO 2021. https://doi.org/10.1109/ICRITO51393.2021.9596373

Bindu K et al (2022) Detection of fake accounts in Twitter using data science. Int Res J Mod Eng Technol Sci 4(5), pp. 3552-3556.

Cable, J. and Hugh, G. (2019) Bots in the Net: Applying Machine Learning to Identify Social Media Trolls. rep. Available at: http://cs229.stanford.edu/proj2019spr/report/74.pdf

Caers R, de Feyter T, de Couck M, Stough T, Vigna C, du Bois C (2013) Facebook: a literature review. New Media Soc. https://doi.org/10.1177/1461444813488061

Cai C, Li L, Zeng D (2017a) Detecting social bots by jointly modeling deep behavior and content information. Int Conf Inf Knowl Manag Proc Part F131841. https://doi.org/10.1145/3132847.3133050

Cai C, Li L, Zengi D (2017b) Behavior enhanced deep bot detection in social media. In: 2017b IEEE international conference on intelligence and security informatics: security and big data, ISI 2017b. https://doi.org/10.1109/ISI.2017.8004887

Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining, 2006. https://doi.org/10.1137/1.9781611972764.29

Carminati B, Ferrari E, Heatherly R, Kantarcioglu M, Thuraisingham B (2011) Semantic web-based social network access control. Comput Secur 30(2–3):108–115. https://doi.org/10.1016/j.cose.2010.08.003

Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017a) Statistical features-based real-time detection of drifted Twitter spam. IEEE Trans Inf Forensics Secur. https://doi.org/10.1109/TIFS.2016.2621888

Chen Z, Tanash RS, Stoll R, Subramanian D (2017b) Hunting malicious bots on twitter: an unsupervised approach. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 10540 LNCS. https://doi.org/10.1007/978-3-319-67256-4_40

Cresci S, di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst. https://doi.org/10.1016/j.dss.2015.09.003

Cresci S, di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2016) DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intell Syst. https://doi.org/10.1109/MIS.2016.29

Cresci S, Spognardi A, Petrocchi M, Tesconi M, di Pietro R (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: 26th international world wide web conference 2017, WWW 2017 companion. https://doi.org/10.1145/3041021.3055135

Dan J, Jieqi T (2017) Study of bot detection on Sina-Weibo based on machine learning. In: 14th international conference on services systems and services management, ICSSSM 2017—Proceedings. https://doi.org/10.1109/ICSSSM.2017.7996292

Daouadi KE, Rebaï RZ, Amous I (2019) Bot detection on online social networks using deep forest. Adv Intell Syst Comput. https://doi.org/10.1007/978-3-030-19810-7_30

David I, Siordia OS, Moctezuma D (2017) Features combination for the detection of malicious Twitter accounts. In: 2016 IEEE international autumn meeting on power, electronics and computing, ROPEC 2016. https://doi.org/10.1109/ROPEC.2016.7830626

Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016). BotOrNot. Proceedings of the 25th International Conference Companion on World Wide Web - WWW . https://doi.org/10.1145/2872518.2889302

Derhab A, Alawwad R, Dehwah K, Tariq N, Khan FA, Al-Muhtadi J (2021) Tweet-based bot detection using big data analytics. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3074953

Dewan P, Kumaraguru P (2017) Facebook Inspector (FbI): towards automatic real-time detection of malicious content on Facebook. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-017-0434-5

Dey A, Reddy H, Dey M, Sinha N (2019) Detection of fake accounts in Instagram using machine learning. Int J Comput Sci Inf Technol. https://doi.org/10.5121/ijcsit.2019.11507

Dinath W (2021) Linkedin: a link to the knowledge economy. In: Proceedings of the European conference on knowledge management, ECKM. https://doi.org/10.34190/EKM.21.178

Echeverra J, de Cristofaro E, Kourtellis N, Leontiadis I, Stringhini G, Zhou S (2018) LOBO. In: Proceedings of the 34th annual computer security applications conference, p 137–146. https://doi.org/10.1145/3274694.3274738

Ersahin B, Aktas O, Kilinc D, Akyol C (2017) Twitter fake account detection. Int Conf Comput Sci Eng (UBMK) 2017:388–392. https://doi.org/10.1109/UBMK.2017.8093420

Eshraqi N, Jalali M, Moattar MH (2016) Detecting spam tweets in Twitter using a data stream clustering algorithm. In: 2nd international congress on technology, communication and knowledge, ICTCK 2015. https://doi.org/10.1109/ICTCK.2015.7582694

Ezarfelix J, Jeffrey N, Sari N (2022) Systematic literature review: Instagram fake account detection based on machine learning. Eng Math Comput Sci J. https://doi.org/10.21512/emacsjournal.v4i1.8076

Fazil M, Abulaish M (2018) A hybrid approach for detecting automated spammers in Twitter. IEEE Trans Inf Forensics Secur. https://doi.org/10.1109/TIFS.2018.2825958

Fernquist J, Kaati L, Schroeder R (2018) Political bots and the Swedish general election. In: 2018 IEEE international conference on intelligence and security informatics, ISI 2018. https://doi.org/10.1109/ISI.2018.8587347

Ferrara, E. (2018). Measuring Social Spam and the Effect of Bots on Information Diffusion in Social Media. Computational Social Sciences, 229-255. https://doi.org/10.1007/978-3-319-77332-2_13

Ferrara, E. (2020). What types of COVID-19 conspiracies are populated by Twitter bots?. First Monday, 25(6). https://doi.org/10.5210/fm.v25i6.10633

Fonseca Abreu JV, Ghedini Ralha C, Costa Gondim JJ (2020) Twitter bot detection with reduced feature set. In: Proceedings—2020 IEEE international conference on intelligence and security informatics, ISI 2020. https://doi.org/10.1109/ISI49825.2020.9280525

Gannarapu S, Dawoud A, Ali RS, Alwan A (2020) Bot detection using machine learning algorithms on social media platforms. In: CITISIA 2020—IEEE conference on innovative technologies in intelligent systems and industrial applications, proceedings. https://doi.org/10.1109/CITISIA50690.2020.9371778

Gao T, Yang J, Peng W, Jiang L, Sun Y, Li F (2020) A content-based method for Sybil detection in online social networks via deep learning. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2975877

Gheewala S, Patel R (2018) Machine learning based twitter spam account detection: a review. In: Proceedings of the 2nd international conference on computing methodologies and communication, ICCMC 2018. https://doi.org/10.1109/ICCMC.2018.8487992

Gilani Z, Wang L, Crowcroft J, Almeida M, Farahbakhsh R (2016) Stweeler: a framework for Twitter bot analysis. In: WWW 2016 companion—proceedings of the 25th international conference on World Wide Web. https://doi.org/10.1145/2872518.2889360

Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, p 349–354. https://doi.org/10.1145/3110025.3110090

Gorwa R, Guilbeault D (2020) Unpacking the social media bot: a typology to guide research and policy. Policy Internet 12(2):225–248. https://doi.org/10.1002/poi3.184

Güngör KN, Ayhan Erdem O, Doğru İA (2020) Tweet and account based spam detection on Twitter, p 898–905. https://doi.org/10.1007/978-3-030-36178-5_79

Guofei Gu (no date) Welcome to Guofei Gu's Homepage. Available at: https://people.engr.tamu.edu/guofei/index.html . Accessed 12 Oct 2022

Gupta A, Kaushal R (2017) Towards detecting fake user accounts in facebook. In: ISEA Asia security and privacy conference 2017, ISEASP 2017. https://doi.org/10.1109/ISEASP.2017.7976996

Hakimi AN, Ramli S, Wook M, Mohd Zainudin N, Hasbullah NA, Abdul Wahab N, Mat Razali NA (2019) Identifying fake account in facebook using machine learning. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 11870 LNCS. https://doi.org/10.1007/978-3-030-34032-2_39

Hayawi K, Mathew S, Venugopal N, Masud MM, Ho PH (2022) DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-022-00869-w

Heidari M, Jones JH, Uzuner O (2020) Deep contextualized word embedding for text-based online user profiling to detect social bots on Twitter. In: IEEE international conference on data mining workshops, ICDMW, 2020-November. https://doi.org/10.1109/ICDMW51313.2020.00071

Heidari M, Jones JHJ, Uzuner O (2021) An empirical study of machine learning algorithms for social media bot detection. In: 2021 IEEE international IOT, electronics and mechatronics conference, IEMTRONICS 2021—Proceedings. https://doi.org/10.1109/IEMTRONICS52119.2021.9422605

Huang, Y., Zhang, M., Yang, Y., Gan, S., & Zhang, Y. (2016) The Weibo Spammers’ Identification and Detection based on Bayesian-algorithm. Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications. https://doi.org/10.2991/wartia-16.2016.271

Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.07.044

Kantartopoulos P, Pitropakis N, Mylonas A, Kylilis N (2020) Exploring adversarial attacks and defences for fake Twitter account detection. Technologies. https://doi.org/10.3390/technologies8040064

Kantepe M, Gañiz MC (2017) Preprocessing framework for Twitter bot detection. In: 2nd international conference on computer science and engineering, UBMK 2017. https://doi.org/10.1109/UBMK.2017.8093483

Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of social media. Bus Horiz. https://doi.org/10.1016/j.bushor.2009.09.003

Kenyeres A, Kovács G (2022) “Conference: XVIII. Conference on hungarian computational linguistics.” Available at: https://www.researchgate.net/publication/358801180_Twitter_bot_detection_using_deep_learning

Kesharwani M, Kumari S, Niranjan V (2021) “Detecting fake social media account using deep neural networking. Int Res J Eng Technol (IRJET), 8(7), pp. 1191-1197.

Khaled S, El-Tazi N, Mokhtar HMO (2019) Detecting fake accounts on social media. In: Proceedings—2018 IEEE international conference on big data, big data 2018. https://doi.org/10.1109/BigData.2018.8621913

Khalil H, Khan MUS, Ali M (2020) Feature selection for unsupervised bot detection. In: 2020 3rd international conference on computing, mathematics and engineering technologies: idea to innovation for building the knowledge economy, ICoMET 2020. https://doi.org/10.1109/iCoMET48670.2020.9074131

Knauth J (2019) Language-agnostic twitter bot detection. In: International conference recent advances in natural language processing, RANLP, 2019-September. https://doi.org/10.26615/978-954-452-056-4_065

Koggalahewa D, Xu Y, Foo E (2022) An unsupervised method for social network spammer detection based on user information interests. J Big Data. https://doi.org/10.1186/s40537-021-00552-5

Kolomeets M, Chechulin A (2021) Analysis of the malicious bots market. In: Conference of open innovation association, FRUCT, 2021-May. https://doi.org/10.23919/FRUCT52173.2021.9435421

Kondeti P, Yerramreddy LP, Pradhan A, Swain G (2021) Fake account detection using machine learning, p 791–802. https://doi.org/10.1007/978-981-15-5258-8_73

Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inf Sci. https://doi.org/10.1016/j.ins.2018.08.019

Kumar G, Rishiwal V (2020) Machine learning for prediction of malicious or SPAM users on social networks. Int J Sci Technol Res, 9(2), pp. 926-932

Lee K, Eoff BD, Caverlee J (2006) Seven months with the devils: a long-term study of content polluters on Twitter. Icwsm 2011

Mahesh, B. (2020) “Machine Learning Algorithms - A Review,” International Journal of Science and Research (IJSR), 9(1), pp. 381–386. Available at: https://doi.org/10.21275/ART20203995 .

Martin-Gutierrez D, Hernandez-Penaloza G, Hernandez AB, Lozano-Diez A, Alvarez F (2021) A deep learning approach for robust detection of bots in Twitter using transformers. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3068659

Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for Twitter. In: Proceedings of 2017 14th international bhurban conference on applied sciences and technology, IBCAST 2017. https://doi.org/10.1109/IBCAST.2017.7868095

Mazza M, Cresci S, Avvenuti M, Quattrociocchi W, Tesconi M (2019) RTbust: exploiting temporal patterns for botnet detection on twitter. In: WebSci 2019—proceedings of the 11th ACM conference on web science. https://doi.org/10.1145/3292522.3326015

Meshram EP, Bhambulkar R, Pokale P, Kharbikar K, Awachat A (2021) Automatic detection of fake profile using machine learning on Instagram. Int J Sci Res Sci Technol. https://doi.org/10.32628/ijsrst218330

Morstatter F, Wu L, Nazer TH, Carley KM, Liu H (2016) A new approach to bot detection: striking the balance between precision and recall. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), p 533–540. https://doi.org/10.1109/ASONAM.2016.7752287

Munoz SD, Paul Guillen Pinto E (2020) A dataset for the detection of fake profiles on social networking services. In: Proceedings—2020 international conference on computational science and computational intelligence, CSCI 2020. https://doi.org/10.1109/CSCI51800.2020.00046

Najari S, Salehi M, Farahbakhsh R (2022) GANBOT: a GAN-based framework for social bot detection. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-021-00800-9

Narayan N (2021) Twitter bot detection using machine learning algorithms. In: 2021 4th international conference on electrical, computer and communication technologies, ICECCT 2021. https://doi.org/10.1109/ICECCT52121.2021.9616841

Naveen Babu M, Anusha G, Shivani A, Kalyani C, Meenakumari J (2021) Fake profile identification using machine learning. Int J Recent Adv Multidiscip Topics 2(6):273–275

Google Scholar  

Oentaryo RJ, Murdopo A, Prasetyo PK, Lim EP (2016) On profiling bots in social media. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), p 10046 LNCS. https://doi.org/10.1007/978-3-319-47880-7_6

Orabi M, Mouheb D, al Aghbari Z, Kamel I (2020) Detection of bots in social media: a systematic review. Inf Process Manag. https://doi.org/10.1016/j.ipm.2020.102250

Pierri F, Artoni A, Ceri S (2020) Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections. PLoS ONE. https://doi.org/10.1371/journal.pone.0227821

Ping H, Qin S (2019) A social bots detection model based on deep learning algorithm. In: Int Conf Commun Technol Proc, ICCT, 2019-October. https://doi.org/10.1109/ICCT.2018.8600029

Prabhu Kavin B, Karki S, Hemalatha S, Singh D, Vijayalakshmi R, Thangamani M, Haleem SLA, Jose D, Tirth V, Kshirsagar PR, Adigo AG (2022) Machine learning-based secure data acquisition for fake accounts detection in future mobile communication networks. Wirel Commun Mob Comput. https://doi.org/10.1155/2022/6356152

Pramitha FN, Hadiprakoso RB, Qomariasih N, Girinoto (2021) Twitter bot account detection using supervised machine learning. In: 2021 4th international seminar on research of information technology and intelligent systems, ISRITI 2021. https://doi.org/10.1109/ISRITI54043.2021.9702789

Pratama PG, Rakhmawati NA (2019) Social bot detection on 2019 Indonesia president candidate’s supporter’s tweets. Proc Comput Sci. https://doi.org/10.1016/j.procs.2019.11.187

Purba KR, Asirvatham D, Murugesan RK (2020) Classification of instagram fake users using supervised machine learning algorithms. Int J Electr Comput Eng. https://doi.org/10.11591/ijece.v10i3.pp2763-2772

Rahman MA, Zaman N, Asyhari AT, Sadat SMN, Pillai P, Arshah RA (2021) SPY-BOT: machine learning-enabled post filtering for social network-integrated industrial internet of things. Ad Hoc Netw. https://doi.org/10.1016/j.adhoc.2021.102588

Ramalingaiah A, Hussaini S, Chaudhari S (2021) Twitter bot detection using supervised machine learning. J Phys Conf Series 1950(1):012006. https://doi.org/10.1088/1742-6596/1950/1/012006

Rangel F, Rosso P (2019) Overview of the 7th author profiling task at Pan 2019: Bots and gender profiling in twitter. In: CEUR workshop proceedings, p 2380

Rao S, Verma AK, Bhatia T (2021) A review on social spam detection: challenges, open issues, and future directions. Exp Syst Appl. https://doi.org/10.1016/j.eswa.2021.115742

Rathore S, Loia V, Park JH (2018) SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on Facebook. Appl Soft Comput J. https://doi.org/10.1016/j.asoc.2017.09.032

Reddy PM, Venkatesh K, Bhargav D, Sandhya M (2021) Spam detection and fake user identification methodologies in social networks using extreme machine learning. SSRN Electron J. https://doi.org/10.2139/ssrn.3920091

Ren H, Zhang Z, Xia C (2018) Online social spammer detection based on semi-supervised learning. ACM Int Conf Proc Series. https://doi.org/10.1145/3302425.3302429

Rodrigues AP, Fernandes R, Shetty A, Lakshmanna K, Shafi RM (2022) Real-time Twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Comput Intell Neurosci 2022:1–14. https://doi.org/10.1155/2022/5211949

Rodríguez-Ruiz J, Mata-Sánchez JI, Monroy R, Loyola-González O, López-Cuevas A (2020) A one-class classification approach for bot detection on Twitter. Comput Secur. https://doi.org/10.1016/j.cose.2020.101715

Sadineni PK (2020) Machine learning classifiers for efficient spammers detection in Twitter OSN. SSRN Electron J. https://doi.org/10.2139/ssrn.3734170

Sahoo SR, Gupta BB (2020) Popularity-based detection of malicious content in facebook using machine learning approach. Adv Intell Syst Comput. https://doi.org/10.1007/978-981-15-0029-9_13

Santia GC, Mujib MI, Williams JR (2019) Detecting social bots on facebook in an information veracity context. In: Proceedings of the 13th international conference on web and social media, ICWSM 2019

Saranya Shree S, Subhiksha C, Subhashini R (2021) Prediction of fake Instagram profiles using machine learning. SSRN Electron J. https://doi.org/10.2139/ssrn.3802584

Sayyadiharikandeh M, Varol O, Yang KC, Flammini A, Menczer F (2020) Detection of novel social bots by ensembles of specialized classifiers. Int Conf Inf Knowl Manag Proc. https://doi.org/10.1145/3340531.3412698

Sedhai S, Sun A (2015) Hspam14: a collection of 14 million tweets for hashtag-oriented spam research. In: SIGIR 2015—proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. https://doi.org/10.1145/2766462.2767701

Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175. https://doi.org/10.1109/tcss.2017.2773581

Sen I, Singh S, Aggarwal A, Kumaraguru P, Mian S, Datta A (2018) Worth its weight in likes: towards detecting fake likes on instagram. In: WebSci 2018—proceedings of the 10th ACM conference on web science. https://doi.org/10.1145/3201064.3201105

Sengar SS, Kumar S, Raina P (2020) Bot detection in social networks based on multilayered deep learning approach. Sens Transducers 244(5):37–43

Shao C, Ciampaglia GL, Varol O, Yang K, Flammini A, Menczer F (2017) The spread of low-credibility content by social bots. Nat Commun. https://doi.org/10.1038/s41467-018-06930-7

Shearer E, Mitchell A (2022) News use across social media platforms in 2020, Pew Research Center's Journalism Project. Available at: https://www.journalism.org/2021/01/12/news-use-across-social-media-platforms-in-2020 . Accessed 9 Oct 2022

Sheeba JI, Pradeep Devaneyan S (2019) Detection of spambot using random forest algorithm. SSRN Electron J. https://doi.org/10.2139/ssrn.3462968

Sheehan BT (2018) Customer service chatbots: anthropomorphism adoption and word of mouth. Griffith University, University of Queensland, Queensland

Sheikhi S (2020) An efficient method for detection of fake accounts on the instagram platform. Revue Intell Artif. https://doi.org/10.18280/ria.340407

Shevtsov A, Tzagkarakis C, Antonakaki D, Ioannidis S (2022) Explainable machine learning pipeline for Twitter bot detection during the 2020 US Presidential Elections. Softw Impacts 13:100333. https://doi.org/10.1016/j.simpa.2022.100333

Shukla R, Sinha A, Chaudhary A (2022) TweezBot: an AI-driven online media bot identification algorithm for Twitter social networks. Electron (switzerland). https://doi.org/10.3390/electronics11050743

Shukla H, Jagtap N, Patil B (2021) Enhanced Twitter bot detection using ensemble machine learning. In: Proceedings of the 6th international conference on inventive computation technologies, ICICT 2021. https://doi.org/10.1109/ICICT50816.2021.9358734

Siddiqui A (2019) Facebook 2019 Q1 earnings: The social media giant boasts 2.7 billion monthly active users on its all services, Digital Information World. Available at: https://www.digitalinformationworld.com/2019/04/facebook-q1-2019-report.html . Accessed 9 Oct 2022

Singh Y, Banerjee S (2019) Fake (sybil) account detection using machine learning. SSRN Electron J. https://doi.org/10.2139/ssrn.3462933

Sohrabi MK, Karimi F (2018) A feature selection approach to detect spam in the Facebook social network. Arab J Sci Eng. https://doi.org/10.1007/s13369-017-2855-x

Subrahmanian VS, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46. https://doi.org/10.1109/MC.2016.183

Tenba Group (2022) What is Sina Weibo? Know your Chinese social media!, Tenba Group. Available at: https://tenbagroup.com/what-is-sina-weibo-know-your-chinese-social-media . Accessed 9 Oct 2022

Thakur S, Breslin JG (2021) Rumour prevention in social networks with layer 2 blockchains. Soc Netw Anal Mining. https://doi.org/10.1007/s13278-021-00819-y

Thejas GS, Soni J, Chandna K, Iyengar SS, Sunitha NR, Prabakar N (2019) Learning-based model to fight against fake like clicks on Instagram posts. In: Conference proceedings—IEEE SOUTHEASTCON, 2019-April. https://doi.org/10.1109/SoutheastCon42311.2019.9020533

Thuraisingham B (2020) The role of artificial intelligence and cyber security for social media. In: Proceedings—2020 IEEE 34th international parallel and distributed processing symposium workshops, IPDPSW 2020. https://doi.org/10.1109/IPDPSW50202.2020.00184

van der Walt E, Eloff J (2018) Using machine learning to detect fake identities: bots vs humans. IEEE Access. https://doi.org/10.1109/ACCESS.2018.2796018

Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the 11th international conference on web and social media, ICWSM 2017

Wald R, Khoshgoftaar TM, Napolitano A, Sumner C (2013) Predicting susceptibility to social bots on Twitter. In: Proceedings of the 2013 IEEE 14th international conference on information reuse and integration, IEEE IRI 2013. https://doi.org/10.1109/IRI.2013.6642447

Wanda P, Hiswati ME, Jie HJ (2020) DeepOSN: bringing deep learning as malicious detection scheme in online social network. IAES Int J Artif Intell. https://doi.org/10.11591/ijai.v9.i1.pp146-154

Wiederhold G, McCarthy J (1992) Arthur Samuel: Pioneer in machine learning. IBM J Res Dev 36(3):329–331. https://doi.org/10.1147/rd.363.0329

Wu B, Liu L, Yang Y, Zheng K, Wang X (2020) Using improved conditional generative adversarial networks to detect social bots on Twitter. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2975630

Wu Y, Fang Y, Shang S, Jin J, Wei L, Wang H (2021) A novel framework for detecting social bots with deep neural networks and active learning. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2020.106525

Xiao C, Freeman DM, Hwa T (2015). Detecting clusters of fake accounts in online social networks. In: AISec 2015—proceedings of the 8th ACM workshop on artificial intelligence and security, co-located with CCS 2015. https://doi.org/10.1145/2808769.2808779

Xu G, Zhou D, Liu J (2021) Social network spam detection based on ALBERT and combination of Bi-LSTM with self-attention. Secur Commun Netw. https://doi.org/10.1155/2021/5567991

Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur. https://doi.org/10.1109/TIFS.2013.2267732

Yang Z, Chen X, Wang H, Wang W, Miao Z, Jiang T (2022) A new joint approach with temporal and profile information for social bot detection. Secur Commun Netw 2022:1–14. https://doi.org/10.1155/2022/9119388

Yang C, Harkreader R, Zhang J, Shin S, Gu G (2012) Analyzing spammers’social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In: WWW’12—proceedings of the 21st annual conference on World Wide Web. https://doi.org/10.1145/2187836.2187847

Zeng Z, Li T, Sun S, Sun J, Yin J (2021) A novel semi-supervised self-training method based on resampling for Twitter fake account identification. Data Technol Appl 56(3):409–428. https://doi.org/10.1108/dta-07-2021-0196

Zhang W, Sun HM (2017) Instagram spam detection. In: Proceedings of IEEE Pacific Rim international symposium on dependable computing, PRDC. https://doi.org/10.1109/PRDC.2017.43

Zhang Z, Gupta BB (2018) Social media security and trustworthiness: overview and new direction. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2016.10.007

Zheng X, Zhang X, Yu Y, Kechadi T, Rong C (2016b) ELM-based spammer detection in social networks. J Supercomput 72(8):2991–3005. https://doi.org/10.1007/s11227-015-1437-5

Zheng X, Wang J, Jie F, Li L (2016a) Two phase based spammer detection in Weibo. In: Proceedings—15th IEEE international conference on data mining workshop, ICDMW 2015. https://doi.org/10.1109/ICDMW.2015.22

Download references

We would like to thank SAUDI ARAMCO Cybersecurity Chair at Imam Abdulrahman Bin Faisal University (IAU) for supporting and funding this research work.

Author information

Authors and affiliations.

Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, 21955, Saudi Arabia

Malak Aljabri

SAUDI ARAMCO Cybersecurity Chair, Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia

Afrah Shaahid, Fatima Alnasser & Asalah Saleh

SAUDI ARAMCO Cybersecurity Chair, Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia

Rachid Zagrouba

SAUDI ARAMCO Cybersecurity Chair, Department of Computer Engineering, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia

Dorieh M. Alomari

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, MA, RZ, AS, FA, AS, DA; Methodology, MA, RZ, AS, FA, AS, DA; Formal Analysis, MA, RZ, AS, FA, AS, DA; Writing-Reviewing, MA, RZ, AS, FA, AS, DA; Project Administration, MA. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Malak Aljabri .

Ethics declarations

Conflict of interest.

The authors declare that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Tables 2 , 3 , 4 , 5 and 6 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Aljabri, M., Zagrouba, R., Shaahid, A. et al. Machine learning-based social media bot detection: a comprehensive literature review. Soc. Netw. Anal. Min. 13 , 20 (2023). https://doi.org/10.1007/s13278-022-01020-5

Download citation

Received : 24 October 2022

Revised : 30 November 2022

Accepted : 20 December 2022

Published : 05 January 2023

DOI : https://doi.org/10.1007/s13278-022-01020-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Social media security
  • Bot detection
  • Machine learning
  • Social bots
  • Feature engineering
  • Cybersecurity
  • Find a journal
  • Publish with us
  • Track your research

Advertisement

Supported by

An Appraisal

Alice Munro, a Literary Alchemist Who Made Great Fiction From Humble Lives

The Nobel Prize-winning author specialized in exacting short stories that were novelistic in scope, spanning decades with intimacy and precision.

  • Share full article

This black-and-white photo shows a smiling woman with short, thick dark hair sitting in a chair. The woman is wearing a loose fitting, short-sleeve white blouse, the fingers of her right hand holding the end of a long thing chain necklace that she is wearing around her neck. To the woman’s right, we can see part of a table lamp and the table it stands on, and, behind her, a dark curtain and part of a planter with a scraggly houseplant.

By Gregory Cowles

Gregory Cowles is a senior editor at the Book Review.

The first story in her first book evoked her father’s life. The last story in her last book evoked her mother’s death. In between, across 14 collections and more than 40 years, Alice Munro showed us in one dazzling short story after another that the humble facts of a single person’s experience, subjected to the alchemy of language and imagination and psychological insight, could provide the raw material for great literature.

Listen to this article with reporter commentary

And not just any person, but a girl from the sticks. It mattered that Munro, who died on Monday night at the age of 92, hailed from rural southwestern Ontario, since so many of her stories, set in small towns on or around Lake Huron, were marked by the ambitions of a bright girl eager to leave, upon whom nothing is lost. There was the narrator of “Boys and Girls,” who tells herself bedtime stories about a world “that presented opportunities for courage, boldness and self-sacrifice, as mine never did.” There was Rose, from “The Beggar Maid,” who wins a college scholarship and leaves her working-class family behind. And there was Del Jordan, from “Lives of Girls and Women” — Munro’s second book, and the closest thing she ever wrote to a novel — who casts a jaundiced eye on her town’s provincial customs as she takes the first fateful steps toward becoming a writer.

Does it seem reductive or limiting to derive a kind of artist’s statement from the title of that early book? It shouldn’t. Munro was hardly a doctrinaire feminist, but with implacable authority and command she demonstrated throughout her career that the lives of girls and women were as rich, as tumultuous, as dramatic and as important as the lives of men and boys. Her plots were rife with incident: the threatened suicide in the barn, the actual murder at the lake, the ambivalent sexual encounter, the power dynamics of desire. For a writer whose book titles gestured repeatedly at love (“The Progress of Love,” “The Love of a Good Woman,” “Hateship, Friendship, Courtship, Loveship, Marriage”), her narratives recoiled from sentimentality. Tucked into the stately columns of The New Yorker, where she was a steady presence for decades, they were far likelier to depict the disruptions and snowballing consequences of petty grudges, careless cruelties and base impulses: the gossip that mattered.

Munro’s stories traveled not as the crow flies but as the mind does. You got the feeling that, if the GPS ever offered her a shorter route, she would decline. Capable of dizzying swerves in a line or a line break, her stories often spanned decades with intimacy and sweep; that’s partly what critics meant when they wrote of the novelistic scope she brought to short fiction.

Her sentences rarely strutted or flaunted or declared themselves; but they also never clanked or stumbled — she was an exacting and precise stylist rather than a showy one, who wrote with steely control and applied her ambitions not to language but to theme and structure. (This was a conscious choice on her part: “In my earlier days I was prone to a lot of flowery prose,” she told an interviewer when she won the Nobel Prize in 2013. “I gradually learned to take a lot of that out.”) In the middle of her career her stories started to grow roomier and more contemplative, even essayistic; they could feel aimless until you approached the final pages and recognized with a jolt that they had in fact been constructed all along as intricately and deviously as a Sudoku puzzle, every piece falling neatly into place.

There was a signature Munro tone: skeptical, ruminative, given to a crucial and artful ambiguity that could feel particularly Midwestern. Consider “The Bear Came Over the Mountain,” which — thanks in part to Sarah Polley’s Oscar-nominated film adaptation, “ Away From Her ” (2006) — may be Munro’s most famous story; it details a woman’s descent into senility and her philandering husband’s attempt to come to terms with her attachment to a male resident at her nursing home. Here the husband is on a visit, confronting the limits of his knowledge and the need to make peace with uncertainty, in a characteristically Munrovian passage:

She treated him with a distracted, social sort of kindness that was successful in holding him back from the most obvious, the most necessary question. He could not demand of her whether she did or did not remember him as her husband of nearly 50 years. He got the impression that she would be embarrassed by such a question — embarrassed not for herself but for him. She would have laughed in a fluttery way and mortified him with her politeness and bewilderment, and somehow she would have ended up not saying either yes or no. Or she would have said either one in a way that gave not the least satisfaction.

Like her contemporary Philip Roth — another realist who was comfortable blurring lines — Munro devised multilayered plots that were explicitly autobiographical and at the same time determined to deflect or undermine that impulse. This tension dovetailed happily with her frequent themes of the unreliability of memory and the gap between art and life. Her stories tracked the details of her lived experience both faithfully and cannily, cagily, so that any attempt at a dispassionate biography (notably, Robert Thacker’s scholarly and substantial “Alice Munro: Writing Her Lives,” from 2005) felt at once invasive and redundant. She had been in front of us all along.

Until, suddenly, she wasn’t. That she went silent after her book “Dear Life” was published in 2012, a year before she won the Nobel, makes her passing now seem all the more startling — a second death, in a way that calls to mind her habit of circling back to recognizable moments and images in her work. At least three times she revisited the death of her mother in fiction, first in “The Peace of Utrecht,” then in “Friend of My Youth” and again in the title story that concludes “Dear Life”: “The person I would really have liked to talk to then was my mother,” the narrator says near the end of that story, in an understated gut punch of an epitaph that now applies equally well to Munro herself, but she “was no longer available.”

Read by Greg Cowles

Audio produced by Sarah Diamond .

Gregory Cowles is the poetry editor of the Book Review and senior editor of the Books desk. More about Gregory Cowles

  • Skip to main content
  • Keyboard shortcuts for audio player

Book Reviews

Two new novels investigate what makes magic, what is real and imagined.

Marcela Davison Avilés

Covers of Pages of Mourning and The Cemetery of Untold Stories

In an enchanted world, where does mystery begin? Two authors pose this question in new novels out this spring.

In Pages of Mourning by the Mexican magical realism interrogator-author Diego Gerard Morrison, the protagonist is a Mexican writer named Aureliano Más II who is at war with his memory of familial sorrow and — you guessed it — magical realism. And the protagonist Alma Cruz in Julia Alvarez's latest novel, The Cemetery of Untold Stories, is also a writer. Alma seeks to bury her unpublished stories in a graveyard of her own making, in order to find peace in their repose — and meaning from the vulnerability that comes from unheard stories.

Both of these novels, one from an emerging writer and one from a long celebrated author, walk an open road of remembering love, grief, and fate. Both find a destiny not in death, but in the reality of abandonment and in dreams that come from a hope for reunion. At this intersection of memory and meaning, their storytelling diverges.

Pages of Mourning

Pages of Mourning, out this month, is set in 2017, three years after 43 students disappear from the Ayotzinapa Rural Teachers' College after being abducted in Iguala , Guerrero, Mexico. The main character, Aureliano, is attempting to write the Great Mexican Novel that reflects this crisis and his mother's own unexplained disappearance when he was a boy. He's also struggling with the idea of magical realism as literary genre — he holds resentment over being named after the protagonist in 100 Years of Solitude, which fits squarely within it. He sets out on a journey with his maternal aunt to find his father, ask questions about his mother, and deal with his drinking problem and various earthquakes.

Morrison's voice reflects his work as a writer, editor and translator based in Mexico City, who seeks to interrogate "the concept of dissonance" through blended art forms such as poetry and fiction, translation and criticism. His story could be seen as an archetype, criticism, or a reflection through linguistic cadence on Pan American literature. His novel name drops and alludes to American, Mexican and Latin American writers including Walt Whitman, Juan Rulfo, Gabriel Garcia Márquez — and even himself. There's an earnest use of adjectives to accompany the lived dissonance of his characters.

There's nothing magical, in the genre sense, in Morrison's story. There are no magical rivers, enchanted messages, babies born with tails. Morrison's dissonance is real — people get disappeared, they suffer addictions, writer's block, crazy parents, crazier shamans, blank pages, corruption, the loss of loved ones. In this depiction of real Pan-American life — because all of this we are also explicitly suffering up North — Morrison finds his magic. His Aureliano is our Aureliano. He's someone we know. Probably someone we loved — someone trying so hard to live.

The Cemetery of Untold Stories

From the author of In the Time of the Butterflies and How the García Girls Lost Their Accents , The Cemetery of Untold Stories is Julia Alvarez's seventh novel. It's a story that's both languorous and urgent in conjuring a world from magical happenings. The source of these happenings, in a graveyard in the Dominican Republic, is the confrontation between memories and lived agendas. Alvarez is an acclaimed storyteller and teacher, a writer of poetry, non-fiction and children's books, honored in 2013 with the National Medal of Arts . She continues her luminous virtuosity with the story of Alma Cruz.

Julia Alvarez: Literature Tells Us 'We Can Make It Through'

Author Interviews

Julia alvarez: literature tells us 'we can make it through'.

Alma, the writer at the heart of The Cemetery of Untold Stories , has a goal - not to go crazy from the delayed promise of cartons of unpublished stories she has stored away. When she inherits land in her origin country — the Dominican Republic — she decides to retire there, and design a graveyard to bury her manuscript drafts, along with the characters whose fictional lives demand their own unrequited recompense. Her sisters think she's nuts, and wasting their inheritance. Filomena, a local woman Alma hires to watch over the cemetery, finds solace in a steady paycheck and her unusual workplace.

Alma wants peace for herself and her characters. But they have their own agendas and, once buried, begin to make them known: They speak to each other and Filomena, rewriting and revising Alma's creativity in order to reclaim themselves.

How Julia Alvarez Wrote Her Many Selves Into Existence

Code Switch

How julia alvarez wrote her many selves into existence.

In this new story, Alvarez creates a world where everyone is on a quest to achieve a dream — retirement, literary fame, a steady job, peace of mind, authenticity. Things get complicated during the rewrites, when ambitions and memories bump into the reality of no money, getting arrested, no imagination, jealousy, and the grace of humble competence. Alma's sisters, Filomena, the townspeople — all make a claim over Alma's aspiration to find a final resting place for her memories. Alvarez sprinkles their journey with dialogue and phrases in Spanish and one — " no hay mal que por bien no venga " (there is goodness in every woe) — emerges as the oral talisman of her story. There is always something magical to discover in a story, and that is especially true in Alvarez's landing place.

Marcela Davison Avilés is a writer and independent producer living in Northern California.

  • Election 2024
  • Entertainment
  • Newsletters
  • Photography
  • Personal Finance
  • AP Investigations
  • AP Buyline Personal Finance
  • AP Buyline Shopping
  • Press Releases
  • Israel-Hamas War
  • Russia-Ukraine War
  • Global elections
  • Asia Pacific
  • Latin America
  • Middle East
  • Election Results
  • Delegate Tracker
  • AP & Elections
  • Auto Racing
  • 2024 Paris Olympic Games
  • Movie reviews
  • Book reviews
  • Personal finance
  • Financial Markets
  • Business Highlights
  • Financial wellness
  • Artificial Intelligence
  • Social Media

Alice Munro, Nobel literature winner revered as short story master, dead at 92

FILE - Canadian author Alice Munro poses for a photograph at the Canadian Consulate's residence in New York on Oct. 28, 2002. Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history's most honored short story writers, has died at age 92. (AP Photo/Paul Hawthorne, File)

FILE - Canadian author Alice Munro poses for a photograph at the Canadian Consulate’s residence in New York on Oct. 28, 2002. Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history’s most honored short story writers, has died at age 92. (AP Photo/Paul Hawthorne, File)

FILE - Canadian author Alice Munro is photographed during an interview in Victoria, B.C. Tuesday, Dec.10, 2013. Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history’s most honored short story writers, has died at age 92. (Chad Hipolito/The Canadian Press via AP, File)

FILE - Nobel Prize-winning Canadian author Alice Munro attends a ceremony held by the Royal Canadian Mint where they unveiled a 99.99% pure silver five-dollar coin in Victoria, B.C., on March 24, 2014. Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history’s most honored short story writers, has died at age 92. (Chad Hipolito/The Canadian Press via AP, File)

FILE - Writer Alice Munro attends the opening night of the International Festival of Authors in Toronto on Wednesday Oct. 21, 2009. Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history’s most honored short story writers, has died at age 92. (Chris Young/The Canadian Press via AP)

  • Copy Link copied

Nobel laureate Alice Munro, the Canadian literary giant who became one of the world’s most esteemed contemporary authors and one of history’s most honored short story writers, has died at age 92.

A spokesperson for publisher Penguin Random House Canada said Munro, winner of the Nobel literary prize in 2013, died Monday at home in Port Hope, Ontario. Munro had been in frail health for years and often spoke of retirement, a decision that proved final after the author’s 2012 collection, “Dear Life.”

Often ranked with Anton Chekhov, John Cheever and a handful of other short story writers, Munro achieved stature rare for an art form traditionally placed beneath the novel. She was the first lifelong Canadian to win the Nobel and the first recipient cited exclusively for short fiction. Echoing the judgment of so many before, the Swedish academy pronounced her a “master of the contemporary short story” who could “accommodate the entire epic complexity of the novel in just a few short pages.”

Munro, little known beyond Canada until her late 30s, also became one of the few short story writers to enjoy ongoing commercial success. Sales in North America alone exceeded 1 million copies and the Nobel announcement raised “Dear Life” to the high end of The New York Times’ bestseller list for paperback fiction. Other popular books included “Too Much Happiness,” “The View from Castle Rock” and “The Love of a Good Woman.”

Over a half century of writing, Munro perfected one of the greatest tricks of any art form: illuminating the universal through the particular, creating stories set around Canada that appealed to readers far away. She produced no single definitive work, but dozens of classics that were showcases of wisdom, technique and talent — her inspired plot twists and artful shifts of time and perspective; her subtle, sometimes cutting humor; her summation of lives in broad dimension and fine detail; her insights into people across age or background, her genius for sketching a character, like the adulterous woman introduced as “short, cushiony, dark-eyed, effusive. A stranger to irony.”

Her best known fiction included “The Beggar Maid,” a courtship between an insecure young woman and an officious rich boy who becomes her husband; “Corrie,” in which a wealthy young woman has an affair with an architect “equipped with a wife and young family"; and “The Moons of Jupiter,” about a middle-aged writer who visits her ailing father in a Toronto hospital and shares memories of different parts of their lives.

“I think any life can be interesting,” Munro said during a 2013 post-prize interview for the Nobel Foundation. “I think any surroundings can be interesting.”

Disliking Munro, as a writer or as a person, seemed almost heretical. The wide and welcoming smile captured in her author photographs was complemented by a down-to-earth manner and eyes of acute alertness, fitting for a woman who seemed to pull stories out of the air the way songwriters discovered melodies. She was admired without apparent envy, placed by the likes of Jonathan Franzen, John Updike and Cynthia Ozick at the very top of the pantheon. Munro’s daughter, Sheila Munro, wrote a memoir in which she confided that “so unassailable is the truth of her fiction that sometimes I even feel as though I’m living inside an Alice Munro story.” Fellow Canadian author Margaret Atwood called her a pioneer for women, and for Canadians.

“Back in the 1950s and 60s, when Munro began, there was a feeling that not only female writers but Canadians were thought to be both trespassing and transgressing,” Atwood wrote in a 2013 tribute published in the Guardian after Munro won the Nobel. “The road to the Nobel wasn’t an easy one for Munro: the odds that a literary star would emerge from her time and place would once have been zero.”

Although not overtly political, Munro witnessed and participated in the cultural revolution of the 1960s and ‘70s and permitted her characters to do the same. She was a farmer’s daughter who married young, then left her husband in the 1970s and took to “wearing miniskirts and prancing around,” as she recalled during a 2003 interview with The Associated Press. Many of her stories contrasted the generation of Munro’s parents with the more open-ended lives of their children, departing from the years when housewives daydreamed “between the walls that the husband was paying for.”

Moviegoers would become familiar with “The Bear Came Over the Mountain,” the improbably seamless tale of a married woman with memory loss who has an affair with a fellow nursing home patient, a story further complicated by her husband’s many past infidelities. “The Bear” was adapted by Sarah Polley into the 2006 feature film “Away from Her,” which brought an Academy Award nomination for Julie Christie. In 2014, Kristen Wiig starred in “Hateship, Loveship,” an adaptation of the story “Hateship, Friendship, Courtship, Loveship, Marriage,” in which a housekeeper leaves her job and travels to a distant rural town to meet up with a man she believes is in love with her — unaware the romantic letters she has received were concocted by his daughter and a friend.

Even before the Nobel, Munro received honors from around the English-language world, including Britain’s Man Booker International Prize and the National Book Critics Circle award in the U.S., where the American Academy of Arts and Letters voted her in as an honorary member. In Canada, she was a three-time winner of the Governor’s General Award and a two-time winner of the Giller Prize.

Munro was a short story writer by choice, and, apparently, by design. Judith Jones, an editor at Alfred A. Knopf who worked with Updike and Anne Tyler, did not want to publish “Lives of Girls & Women,” her only novel, writing in an internal memo that “there’s no question the lady can write but it’s also clear she is primarily a short story writer.”

Munro would acknowledge that she didn’t think like a novelist.

“I have all these disconnected realities in my own life, and I see them in other people’s lives,” she told the AP. “That was one of the problems, why I couldn’t write novels. I never saw things hanging together too well.”

Alice Ann Laidlaw was born in Wingham, Ontario, in 1931, and spent much of her childhood there, a time and place she often used in her fiction, including the four autobiographical pieces that concluded “Dear Life.” Her father was a fox farmer, her mother a teacher and the family’s fortunes shifted between middle class and working poor, giving the future author a special sensitivity to money and class. Young Alice was often absorbed in literature, starting with the first time she was read Hans Christian Andersen’s “The Little Mermaid.” She was a compulsive inventor of stories and the “sort of child who reads walking upstairs and props a book in front of her when she does the dishes.”

A top student in high school, she received a scholarship to study at the University of Western Ontario, majoring in journalism as a “cover-up” for her pursuit of literature. She was still an undergraduate when she sold a story about a lonely teacher, “The Dimensions of a Shadow,” to CBC Radio. She was also publishing work in her school’s literary journal.

One fellow student read “Dimensions” and wrote to the then-Laidlaw, telling her the story reminded him of Chekhov. The student, Gerald Fremlin, would become her second husband. Another fellow student, James Munro, was her first husband. They married in 1951, when she was only 20, and had four children, one of whom died soon after birth.

Settling with her family in British Columbia, Alice Munro wrote between trips to school, housework and helping her husband at the bookstore that they co-owned and would turn up in some of her stories. She wrote one book in the laundry room of her house, her typewriter placed near the washer and dryer. Flannery O’Connor, Carson McCullers and other writers from the American South inspired her, through their sense of place and their understanding of the strange and absurd.

Isolated from the literary center of Toronto, she did manage to get published in several literary magazines and to attract the attention of an editor at Ryerson Press (later bought out by McGraw Hill). Her debut collection, “Dance of the Happy Shades,” was released in 1968 with a first printing of just under 2,700 copies. A year later it won the Governor’s General Award and made Munro a national celebrity — and curiosity. “Literary Fame Catches City Mother Unprepared,” read one newspaper headline.

“When the book first came they sent me a half dozen copies. I put them in the closet. I didn’t look at them. I didn’t tell my husband they had come, because I couldn’t bear it. I was afraid it was terrible,” Munro told the AP. “And one night, he was away, and I forced myself to sit down and read it all the way through, and I didn’t think it was too bad. And I felt I could acknowledge it and it would be OK.”

By the early ’70s, she had left her husband, later observing that she was not “prepared to be a submissive wife.” Her changing life was best illustrated by her response to the annual Canadian census. For years, she had written down her occupation as “housewife.” In 1971, she switched to “writer.”

Over the next 40 years, her reputation and readership only grew, with many of her stories first appearing in The New Yorker. Her prose style was straightforward, her tone matter of fact, but her plots revealed unending disruption and disappointments: broken marriages, violent deaths, madness and dreams unfulfilled, or never even attempted. “Canadian Gothic” was one way she described the community of her childhood, a world she returned to when, in middle age, she and her second husband relocated to nearby Clinton.

“Shame and embarrassment are driving forces for Munro’s characters,” Atwood wrote, “just as perfectionism in the writing has been a driving force for her: getting it down, getting it right, but also the impossibility of that.”

She had the kind of curiosity that would have made her an ideal companion on a long train ride, imagining the lives of the other passengers. Munro wrote the story “Friend of My Youth,” in which a man has an affair with his fiancee’s sister and ends up living with both women, after an acquaintance told her about some neighbors who belonged to a religion that forbade card games. The author wanted to know more — about the religion, about the neighbors.

Even as a child, Munro had regarded the world as an adventure and mystery and herself as an observer, walking around Wingham and taking in the homes as if she were a tourist. In “The Peace of Utrecht,” an autobiographical story written in the late 1960s, a woman discovers an old high school notebook and remembers a dance she once attended with an intensity that would envelop her whole existence.

“And now an experience which seemed not at all memorable at the time,” Munro wrote, “had been transformed into something curiously meaningful for me, and complete; it took in more than the girls dancing and the single street, it spread over the whole town, its rudimentary pattern of streets and its bare trees and muddy yards just free of the snow, over the dirt roads where the lights of cars appeared, jolting toward the town, under an immense pale wash of sky.”

This story has been updated to correct the title of “The Beggar Maid.”

literature review on bot

IMAGES

  1. PPT

    literature review on bot

  2. Essay-bot on Twitter

    literature review on bot

  3. Writing a review article: what to do with my literature review

    literature review on bot

  4. A Literature Survey of Recent Advances in Chatbots[v1]

    literature review on bot

  5. Literature on chatbots in different contexts

    literature review on bot

  6. (PDF) A Literature Survey of Recent Advances in Chatbots

    literature review on bot

VIDEO

  1. Fantasie van een druppel

  2. The Menorah: From the Bible to Modern Israel

  3. चोखा चोखट निर्मळ saint chokhamela

  4. (Part6/7) Relations- Learn Nirvana Shatakam w/ Meaning& Pronunciation Before MahaShivaratri

  5. GK Question || GK In Hindi || GK Question and Answer || GK Quiz || General knowledge questions #gk

  6. Literature Bro that bot is weird Bruh 😐

COMMENTS

  1. Bots in Software Development: A Systematic Literature Review and

    Lastly, review bot in PS-24 automates static analyses by generating checks for coding standard violations and common defect patterns and publishing the results of its analysis. In contrast, we identified 14 bots that automate parts of a task. This includes a chatbot by C. Surana et al. in PS-22, Travis CI in PS‑23, and Stale Bot in PS-17.

  2. Exploring agent-based chatbots: a systematic literature review

    This section describes the definition of the structured research questions and the development of the review protocol describing the search strategy, the inclusion and exclusion criteria, the biases and disagreement resolution, and the quality criteria.. 3.1 Research questions. As introduced in Sect. 1, the research community has proposed the usage of multi-agent-based chatbots in recent years ...

  3. Systematic Literature Review of Social Media Bots ...

    A SLR is a type of literature review in which a collection of articles on a specific topic is gathered and critically analyzed in order to identify, select, synthesize, and assess relationships, limitations, and key findings in order to summarize both quantitative and qualitative studies. ... Profiles and social media bot types (RQ1) According ...

  4. (PDF) A Literature Survey of Recent Advances in Chatbots

    Literature Review Analysis. In this section we outline the following main aspects of chatbots based on our find-ing from the literature review: implementation approaches, available public database.

  5. PDF Bots in Software Development: A Systematic Literature Review ...

    This research presents a systematic literature review covering the state of the art of applied and proposed bots for software development. Our study spans literature from 2003 to 2022, with 82 different bots applied in software develo pment activities, covering 83 primary studies.

  6. Frontiers

    Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical ...

  7. A Literature Survey of Recent Advances in Chatbots

    5. Related Works. Previous literature survey work on different aspects of chatbots have focused on the design and implementation, chatbot history and background, evaluation methods and the application of chatbots in specific domain. Our work is similar to previous work where we outline the background of chatbot.

  8. A Literature Survey of Recent Advances in Chatbots

    The authors conclude that stability, scalability and flexibility are the most important issues for consideration in chatbot development. Ref. conducts a study of the literature on the design, architecture, and algorithms used in chatbots. Ref. conducted a systematic literature review and quantitative study related to chatbot. They concluded by ...

  9. Detection of Bots in Social Media: A Systematic Review

    A systematic literature review of social media bots detection methods based on Kichenham and Charters guidelines (2007). ... While researchers are building many models to detect social media bot accounts, attackers, on the other hand, evolve their bots to evade detection. This everlasting cat and mouse game makes this field vibrant and demands ...

  10. (PDF) A Literature Review on chatbots in education: An ...

    A Literature Review on chatbots in education: An intelligent chat agent. Gil Maria dos Santos Romão. Catholic University of Mozambique. [email protected]. Abstract. The class size in a ...

  11. Role of AI chatbots in education: systematic literature review

    AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering explanations, and providing additional resources. Chatbots can also act as virtual teaching assistants, supporting educators through various means. In this paper, we try to understand the full benefits ...

  12. Emotionally Intelligent Chatbots: A Systematic Literature Review

    The systematic literature review guidelines by Kitchenham and Charters outline a thorough method for collecting, analyzing, and documenting findings from secondary data sources. We aim to answer our research questions following this methodology to uncover the latest trends and technologies to develop emotionally intelligent chatbots.

  13. A literature review on users' behavioral intention toward chatbots

    Conclusions. The present paper is a literature review study concerning the empirical investigation of users' behavioral intention to adopt and use chatbots during the last five years. By analyzing key characteristic points of these empirical research studies, a number of significant findings were drawn.

  14. (PDF) Machine learning-based social media bot detection: a

    This literature review attempts. to compile and compare the most recent advancements in Machine Learning-based techniques f or the detection and clas. sification of bots on five primary social ...

  15. A systematic review of artificial intelligence chatbots for promoting

    This systematic review aimed to evaluate AI chatbot characteristics, functions, and core conversational capacities and investigate whether AI chatbot interventions were effective in changing physical activity, healthy eating, weight management behaviors, and other related health outcomes. In collaboration with a medical librarian, six electronic bibliographic databases (PubMed, EMBASE, ACM ...

  16. AI-Powered Research and Literature Review Tool

    Simplify literature reviews and find answers to your questions about any research paper seamlessly. Discover, read, and understand research papers effortlessly with Enago Read, your AI-powered companion for academic research. ... Enago Read is an AI assistant that speeds up the literature review process, offering summaries and key insights to ...

  17. AI Literature Review Generator

    Creates a comprehensive academic literature review with scholarly resources based on a specific research topic. HyperWrite's AI Literature Review Generator is a revolutionary tool that automates the process of creating a comprehensive literature review. Powered by the most advanced AI models, this tool can search and analyze scholarly articles, books, and other resources to identify key themes ...

  18. Silvi.ai

    Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis. Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature ...

  19. LibGuides: Literature Reviews: Artificial intelligence (AI) tools

    Here are some key points to consider: Novelty and Creativity: Generative AI tools can produce content that is both innovative and unexpected. They allow users to explore new ideas, generate unique artworks, and even compose original music. This novelty is one of their most exciting aspects. Ethical Considerations: While generative AI offers ...

  20. AI Literature Review Generator

    What is a Literature Review? A literature review is a comprehensive analysis and evaluation of scholarly articles, books and other sources concerning a particular field of study or a research question. This process involves discussing the state of the art of an area of research and identifying pivotal works and researchers in the domain.

  21. Machine learning-based social media bot detection: a comprehensive

    This literature review attempts to compile and compare the most recent advancements in Machine Learning-based techniques for the detection and classification of bots on five primary social media platforms namely Facebook, Instagram, LinkedIn, Twitter, and Weibo. ... grow an account's following, and repost user-generated content. Bot detection ...

  22. Iris.ai

    The Researcher Workspace. A comprehensive platform for all your research processing: Smart search and a wide range of smart filters, reading list analysis, auto-generated summaries, autonomous extraction and systematizing of data. Workspace tools. Customer Stories.

  23. How to Write a Literature Review in 6 Steps

    A "literature review" is a summary of what previous studies have demonstrated or argued about a topic. It may stand on its own as the focus of a paper, with just an introduction and conclusion summarizing the relevant literature, or it may be part of a more extensive research paper , such as a journal article, research proposal , thesis, or ...

  24. Information

    Bot detection models used in the COVID-19 pandemic helped synthesize account metadata and account digital fingerprints. This work successfully detected bots with different behaviors. ... A literature review of information quality. HIM J. 2023, 52, 3-17. [Google Scholar] Venkatesh, K.P.; Raza, M.M.; Kvedar, J.C. Health digital twins as tools ...

  25. Beyond subcultures: A literature review of gaming communities and

    Video games culture is a complex phenomenon affected by ongoing relevant changes, such as technological advancements and cultural shifts in gaming practices (Cassell and Jenkins, 1998).On one hand, new technological devices and software paved the spread of pervasive games, which involve real world into gaming actions (Capra et al., 2005; Mäyrä, 2015a, 2015b); on the other hand, the birth of ...

  26. A Literature Review of Wake Boat Effects on Aquatic Habitat

    Dr. Joe Nohner of the Michigan DNR presents "A Literature Review of Wake Boat Effects on Aquatic Habitat", published as DNR Fisheries Report 37 in July 2023. The presentation is followed by a discussion with Dr. Nohner and Brian Gunderman of DNR, co-authors of the report. Presented to the Michigan Inland Lakes Partnership in December 2023.

  27. Alice Munro, a Literary Alchemist Who Made Great Fiction From Humble

    The first story in her first book evoked her father's life. The last story in her last book evoked her mother's death. In between, across 14 collections and more than 40 years, Alice Munro ...

  28. 'The Cemetery of Untold Stories,' 'Pages of Mourning' book review

    Both of these novels, Pages of Mourning and The Cemetery of Untold Stories, from an emerging writer and a long-celebrated one, respectively, walk an open road of remembering love, grief, and fate.

  29. A Literature Review On Chatbots In Healthcare Domain

    PDF | On Aug 1, 2019, Nivedita Bhirud and others published A Literature Review On Chatbots In Healthcare Domain | Find, read and cite all the research you need on ResearchGate

  30. Alice Munro, Nobel literature winner revered as short story master

    Updated 12:02 PM PDT, May 14, 2024. Nobel laureate Alice Munro, the Canadian literary giant who became one of the world's most esteemed contemporary authors and one of history's most honored short story writers, has died at age 92. A spokesperson for publisher Penguin Random House Canada said Munro, winner of the Nobel literary prize in ...