Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 April 2023

Voice assistants in private households: a conceptual framework for future research in an interdisciplinary field

  • Bettina Minder   ORCID: orcid.org/0000-0002-5874-4999 1 ,
  • Patricia Wolf 2 , 3 ,
  • Matthias Baldauf 4 &
  • Surabhi Verma 2 , 5  

Humanities and Social Sciences Communications volume  10 , Article number:  173 ( 2023 ) Cite this article

2752 Accesses

2 Citations

1 Altmetric

Metrics details

  • Business and management
  • Criminology
  • Information systems and information technology
  • Science, technology and society

The present study identifies, organizes, and structures the available scientific knowledge on the recent use and the prospects of Voice Assistants (VA) in private households. The systematic review of the 207 articles from the Computer, Social, and Business and Management research domains combines bibliometric with qualitative content analysis. The study contributes to earlier research by consolidating the as yet dispersed insights from scholarly research, and by conceptualizing linkages between research domains around common themes. We find that, despite advances in the technological development of VA, research largely lacks cross-fertilization between findings from the Social and Business and Management Sciences. This is needed for developing and monetizing meaningful VA use cases and solutions that match the needs of private households. Few articles show that future research is well-advised to make interdisciplinary efforts to create a common understanding from complementary findings—e.g., what necessary social, legal, functional, and technological extensions could integrate social, behavioral, and business aspects with technological development. We identify future VA-based business opportunities and propose integrated future research avenues for aligning the different disciplines’ scholarly efforts.

Similar content being viewed by others

voice assistant research paper

Participatory action research

voice assistant research paper

Explore, engage, empower: methodological insights into a transformative mixed methods study tackling the COVID-19 lockdown

voice assistant research paper

Citizen social science in practice: the case of the Empty Houses Project

Introduction.

Scholarly research across disciplines agrees that technological advancement is one of the important drivers of economic development because it brings about efficiency gains for all players of an economic system (Grossman and Helpman, 1991 ; Kortum, 1997 ; Dercole et al., 2008 ). Digitization and emerging technologies thus usually draw intense scholarly interest and are studied with the hope that their adoption will enable companies to generate “new capabilities, new products, and new markets” (Bhat, 2005 , p. 457) based on new business models, specifically designed for digitalized life spheres (Chao et al., 2007 ; Sestino et al., 2020 ; Antonopoulou and Begkos, 2020 ).

One of the recent emergent digital technologies promising companies substantial future revenues from innovative user services is voice assistants (VAs). They are “speech-driven interaction systems” (Ammari et al., 2019 , p. 3) that offer new interaction modalities (Rzepka et al., 2022 ).

Partly based on the integration of complementary Artificial Intelligence (AI) technology, they allow users’ speech to be processed, interpreted, and responded to in a meaningful way. In private households, we witness a rapid adoption rate of VAs in the form of smart speakers such as Amazon Echo, Apple Homepod, and Google Home (Pridmore and Mols, 2020 ) which, particularly in combination with customization of IoT home systems, provide a higher level of control over the smart home experience compared to a traditional setting (Papagiannidis and Davlembayeva, 2022 ). Available in the United States (US) since 2014 and in Europe since September 2016 (Trenholm, 2016 ; Hern, 2017 ), by 2018, already 15.4% of the US and 5.9% of the German population owned an Amazon Echo (Brandt, 2018 ). Overall, private household purchases already grew to 116% in the third quarter of 2018 compared to 2017 (Tung, 2018 ) and, according to a recent research report from the IoT analyst firm Berg Insight (Berg Insight, 2022 ), the number smart homes in Europe and North America reached 105 million in 2021. We realize that, at present, VAs represent an emergent technology that has its challenges (Clark et al., 2022 ), similar to the Internet of Things (IoT) or big data analytics technology. It has triggered an enormous amount of diverse scholarly research resulting “in a mass of disorganized knowledge” (Sestino et al., 2020 , p. 1). For both scholars and managers, the sheer quantity of disorganized information is making it hard to predict the characteristics of future technology use cases that fit users’ needs or to use this information for strategy development processes (Brem et al., 2019 ; Antonopoulou and Begkos, 2020 ). While Computer Science scholars already debate the technological feasibility of specific and complex VA applications, Social Science research points to VA-related market acceptance risks resulting, for example, from biased choices offered by VA (Rabassa et al., 2022 ) or from not identifying and implementing the privacy protection measures required by younger people (Shin et al., 2018 ), motivated by frequent user privacy leaks (Fathalizadeh et al., 2022 ) and worries about adverse incidents (Shank et al., 2022 ). Recent studies also specifically emphasize the need to shift the focus to user-centric product value (Nguyen et al., 2022 ) in the pursuit of the most beneficial solutions in terms of social acceptance and legal requirements (Clemente et al., 2022 ). For the most beneficial solutions, a collaboration between companies or even industries is likely to be necessary (Struckell et al., 2021 ).

There are, to our best knowledge, no systematic review papers focusing on VAs from a single discipline’s perspective that we could draw from. We did find an exploration of recent papers about the use of virtual assistants in healthcare that highlights some critical points (e.g., VA limitations concerning the ability to maintain continuity over multiple conversations (Clemente et al., 2022 ) or a review focusing on different interactions modalities in the ear of 4.0 industry—highlighting the need for strong voice recognition algorithm and coded voice commands (Kumar and Lee, 2022 ). In sum, the research that might allow for strategizing around VA solutions that match the needs of private households is scattered and needs to be organized and made sense of from an interdisciplinary perspective to shed “light on current challenges and opportunities, with the hope of informing future research and practice.” (Sestino et al., 2020 ). This paper thus sets out to identify, organize, and structure the available scientific knowledge on the recent use and the prospects of VAs in private households and propose integrated future research avenues for aligning the different disciplines’ scholarly efforts and leading research on consistent, interdisciplinary informed paths. We use a systematic literature review approach that combines bibliometric and qualitative content analysis to gain an overview of the still dispersed insights from scholarly research in different disciplines and to conceptualize topical links and common themes. Research on emerging technologies acknowledges that the adoption of these technologies depends on more factors than just technological maturity. Also, social aspects (e.g., social norms) and economic maturity (e.g., can a product be produced and sold so that it is cost-effective) play an important role (Birgonul and Carrasco, 2021 ; Xi and Hamari, 2021 ). Research particularly emphasizes that emerging technologies need to not only be creatively and economically explored—but also grounded in the user’s perspectives (Grossman-Kahn and Rosensweig, 2012 ) and serve longer enduring needs (Patnaik and Becker, 1999 ). IDEO conceptualized these requirements into the three dimensions of feasibility, viability, and desirability (IDEO, 2009 ).

Feasibility covers all aspects of VA innovation management that assures that the solution will be technically feasible and scalable. This also includes insuring that legal and regulatory requirements are met (Brenner et al., 2021 ). The viability lens focuses on economic success. “Desirability” ensures that the solutions and services are accepted by the target groups and, more generally, desired by society (Brenner et al., 2021 ). While IDEO and their focus on innovation development processes relate to a different context, the main reasoning about the relevance of these three dimensions (technical, social, and management) is also applicable when looking for research literature that helps find strategies around VA solutions that correspond to people’s needs in private homes. To cover these three dimensions, we focus on studies from Computer Science (CS), Social Science (SS), and Business and Management Science (BMS) to advance our knowledge of the still dispersed insights from scholarly research and highlight shared topics and common themes.

With this conceptual approach, we contribute an in-depth analysis and systematic overview of interdisciplinary scholarly work that allows cross-fertilization between different disciplines’ findings. Based on our findings, we develop several propositions and a framework for future research in the interest of aligning the various scholarly efforts and leading research on consistent, interdisciplinarily informed paths. This will help realize VA’s potential in people’s everyday lives. We moreover identify potential future VA-based business opportunities.

This paper is structured as follows: the section “Business opportunities related to VA use in private households” summarizes the research on potential business opportunities related to the use of VAs in private households. The research methodology, i.e., our approach of combining a bibliometric literature analysis with qualitative content analysis in a literature review, is presented in the section “Methods”. Section “Thematic clusters in recent VA research” identifies nine thematic clusters in recent VA research, and section “Analysis and conceptualization of research streams” analyzes and conceptually integrates them into four interdisciplinary research streams. Section “Discussion: Propositions and a framework for future research, and related business opportunities” identify future business opportunities and proposes future directions for integrated research, and section “Conclusion” concludes with contributions that should help both scholars and managers use this research to predict the characteristics of future technology, use cases that fit users’ needs, and use this information for their strategy development processes around VA.

Business opportunities related to VA use in private households

Sestino et al. ( 2020 , p. 7) argue that when new technologies emerge, “companies will need to assess the positives and negatives of adopting these technologies”. The positives of VA adoption lay mainly in the projection of large new consumer markets offering products and services where text-based human–computer interaction will be replaced by voice-activated input (Pridmore and Mols, 2020 :1), checkout-free stores such as Amazon Go, and the use of VA (Batat, 2021 ). Marketing studies predict high adoption rates in private households due to potential efficiency gains from managing household systems and devices by voice commands anytime from anywhere (Celebre et al., 2015 ; Chan and Shum, 2018 ; Jabbar et al., 2019 ; Vishwakarma et al., 2019 ), as well as the high potential of health check app for improving communication with patients (Abdel-Basset et al., 2021 ) or realize self-care solutions (Clemente et al., 2022 ). A study by Microsoft and Bing (Olson and Kemery, 2019 ) substantiates that claim for smart homes by revealing that, already today, 54% of the 5000 responding US users use their smart speakers to manage their homes, especially for controlling lighting and thermostats. In surveys, users state that they envision a future in which they will increasingly use voice commands to control household appliances from the microwave to the bathtub or from curtains to toilet controls (Kunath et al., 2019 ). CS scholars discuss how to design complementary Internet of Things (IoT) technology features and systems to bring about such benefits (Hamill, 2006 ; Druga et al., 2017 ; Pradhan et al., 2018 ; Gnewuch et al., 2018 ; Tsiourti et al., 2018a / b ; Azmandian et al., 2019 ; Lee et al., 2019 ; Pyae and Scifleet, 2019 ; Sanders and Martin-Hammond, 2019 ). BMS research additionally debates how companies should proceed to capture, organize, and analyze the (big) user data that become potentially available once VA is commonly used in private households, and to identify new business opportunities (Krotov, 2017 ; Sestino et al., 2020 ) and future VA applications, such as communication and monitoring services in pandemics (Abdel-Basset et al. 2021 ).

However, many recent studies also mention the negatives of VA usage, like worrying trends emerging from the so-called surveillance economy (Zuboff, 2019 ) or, instead, debate future questions, such as what happens when technology fails or what the rights of fully automated technological beings would be (Harwood and Eaves, 2020 ). 2050 out of the 5000 respondents to the Microsoft and Bing study reported concerns related to voice-enabled technology, especially about data security (52%) and passive listening (41%). The “significant new production of situated and sensitive data” (Pridmore and Mols, 2020 , p. 1) in private environments and the unclear legal situation related to the usage of these data seem to act as one of the inhibitors to the adoption of more complex VA applications by users. Thus, many of the imaginable future use cases, such as advanced smart home controls (Lopatovska and Oropeza, 2018 ; Lopatovska et al., 2019 ) or personal virtual shopping assistance (Omale, 2020 ; Sestino et al., 2020 ), are still a long way off. Although technologically feasible and partly already available, today’s users use VAs for simple tasks, such as “searching for a quick fact, playing music, looking for directions, getting the news and weather” (Olson and Kemery, 2019 ). Therefore, companies are warned against too high expectations of fast returns. Moreover, there are also some technical issues, and only the not-yet-mature integration of further AI-enabled services in VA is expected to be a game changer leading to growth in the deployment of voice-based solutions (Gartner, 2019 ; Columbus, 2020 ).

At a meta-level, BMS research advises companies to explore and implement new technologies in their products, services, or business processes, because that might result in a considerable competitive first-mover advantage (Drucker, 1988 ; Porter, 1990 ; Carayannis and Turner, 2006 ; Hofmann and Orr, 2005 ; Bhat, 2005 ). At the same time, Macdonald and Jinliang ( 1994 ) have shown that in industrial gestation (or the impact of science on society), the evolution in the demand for technology, and a set of competitors go hand in hand. Consequently, the adoption of an emergent technology by “the ultimate affected customer base” (Bhat, 2005 , p. 462) becomes of utmost importance when looking at how company investments pay off (Pridmore and Mols, 2020 ). This is particularly the case for VAs where companies are greatly dependent on the adoption of respective hardware—typically the aforementioned smart speakers (Herring and Roy, 2007 )—or of new services, such as the envisioned digital assistants (Sestino et al., 2020 , p. 7), by private users. VAs differ from other emergent technologies that allow companies to reap the benefits by implementing them in their own organization and reorganizing business or production processes, like RFID technology (Chao et al., 2007 ), nanotechnology (Bhat, 2005 ) or IoT-based business process (re)engineering (Sestino et al., 2020 ). Hence—although it is one of the most prominent emerging technologies discussed in current mass media—this might be one of the reasons for why there is yet very limited BMS research studying VA-related challenges and opportunities that could inform companies.

High-tech companies striving to develop VA-related business models need to consider and integrate scholarly knowledge from disciplines as different as CS, SS, and BMS to meet the requirements of “a secure conversational landscape where users feel safe” (Olson and Kemery, 2019 , p. 24). However, such interdisciplinary perspectives are yet hardly available—instead, we see a large amount of scattered disciplinary scholarly knowledge. This situation makes it difficult to assess opportunities for future VA-related services and to develop sustainable business models that offer a potential competitive advantage. In this paper, we set out to contribute to such an assessment by organizing and making sense of the scholarly knowledge from CS, SS, and BMS. We follow earlier research on the assumption that assessing the state of emergent technologies and making sense of available knowledge on new phenomena requires an interdisciplinary perspective (Bhat, 2005 ; Melkers and Xiao, 2010 ; Sestino et al., 2020 ) to pin down and forecast the technology’s future impact and to advise companies in their technology adoption decisions (Leahey et al., 2017 ; Demidova, 2018 ; McLean and Osei-Frimpong, 2019 ). The literature review we present here is therefore additionally aimed at substantiating the call for interdisciplinarity of research into emerging technologies that aim to offer insights about business opportunities.

Our aim of making sense of a large amount of disorganized scholarly knowledge on VAs, assessing challenges and opportunities for businesses, and identifying avenues for future interdisciplinary research, made a systematic literature review appear to be the most appropriate research strategy: Literature reviews enable systematic in-depth analyses about the theoretical advancement of an area (Callahan, 2014 ). Earlier research with similar aims that studied other emerging technologies found the method “useful for making sense of the noise” (Sestino et al., 2020 :1) in a fast-growing body of scholarly literature (Fig. 1 ).

figure 1

Innovation dimensions by IDEO: feasibility-viability-desirability (after IDEO, 2009 ).

For our research, we decided to combine a conventional literature review that applies qualitative content analysis, with bibliometric analysis. The bibliometric analysis provides an overview of connections between research articles and the intersection of different research areas (Singh et al., 2020 ). The qualitative content analysis-based literature review offers a more in-depth overview of the current state of the literature (Petticrew and Roberts, 2006 ). Earlier scholarly work indicates that such a combination is particularly useful for analyzing the current state of technology trends and the significance of forecasts (Chao et al., 2007 ; Li et al., 2019 ). Figure 2 depicts the methodological research approach of this study.

figure 2

Overview of the methodological research approach of this study.

In the following, we describe the methodological approach in detail.

Article identification and screening

The literature search employed the Scopus database, as the coverage for the Scopus and Web of Science databases is similar (Harzing and Alakangas, 2016 ). In the literature search, we employed the keywords “voice assistant” and synonyms of it (“Voice assistant” OR “Virtual assistant” OR “intelligent personal assistant” OR “voice-activated personal assistant” OR “conversational agent” OR “SIRI” OR “Alexa” OR “Google Assistant” OR “Bixby” OR “Smart Loudspeaker” OR “Echo” OR “Smart Speaker”) and “home” and synonyms of it (“home” OR “house” OR “household”). The automated bibliometric analysis scanned titles, abstracts, and keywords of the article for these terms. We used the search field “theme” including title, abstract, and keywords (compare 3.2). Due to the focus of the research, the search was restricted to articles published in the CS, SS, and BMS areas, written in English, and published before May 2020.

We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guide proposed by Moher et al. ( 2009 ) for the bibliometric literature review. The initial search yielded 428 articles in the CS, 356 articles in the SS, and 40 articles in BMS. After scanning the abstracts of all documents in the list of each field, further articles were excluded based on their relevance to our topic. The most frequent reason for excluding an article was that it was not about VAs—e.g., articles found with the keyword “echo” referred to acoustic phenomena. Table 1 displays the descriptive results of the bibliometric literature review.

The final dataset included 267 articles in CS, 52 articles in SS, and 20 articles in BMS.

Tables 2 and 3 display the most frequent countries of origin for SS and CS.

Tables 2 and 3 present the top countries of origin of the articles from CS and SS. There was no information related to the countries of origin of the bms articles. In view of the many (regionally differing) legal questions and regulatory issues, it is important to see that, while the US is leading the list by a large margin, the discussion is also spread over countries from different continents.

Data analysis step 1: Bibliometric literature review

The final dataset consisted of bibliometric information including the author names, affiliations, titles, abstracts, publication dates, and citation information. The bibliometric analysis was conducted in each discipline separately using the VOSviewer software. For each discipline, we visualized common knowledge patterns through co-occurrence networks in the VA literature. A co-occurrence network contains keywords with similar meanings that can distort the analyses. Therefore, synonyms were grouped into topics using the VOSviewer thesaurus to ensure a rigorous analysis. For example, the keywords „voice assistant“, „virtual assistant“, „intelligent personal assistant“, „voice-activated personal assistant“, „conversational agent“, „SIRI“, „Alexa“, „Google Assistant“, „Bixby“, „smart loudspeaker“, „Echo“, „smart speaker“ were replaced with the main term “voice assistant”. Also, keywords were standardized to ensure uniformity and consistency (e.g., singular and plural). Further, a few keywords were also deleted from the thesaurus to ensure the focus of the review around the research questions of this study.

Scopus provides Subject Areas—we used these areas to generate the bibliometric analysis (e.g., select CS to analyze all papers from that area). When cleaning the data set—e.g., excluding non-relevant papers—some papers could be assigned to more than one area by checking the author’s affiliation. The co-occurrence networks (Figs. 3 – 5 ) of the keywords were obtained automatically from scanning titles, abstracts, and keywords of the articles in the final cleaned datasets. The networks present similarities between frequently co-occurring keywords (themes or topics) in the literature (Van Eck and Waltman, 2010 ). The co-occurrence number of two keywords is the number of articles that contain both keywords (Van Eck and Waltman, 2014 ). VOSviewer places these keywords in the network and identifies clusters with similar themes, and with each color representing one cluster (Van Eck and Waltman, 2010 ). The colors, therefore, reflect topical links and common themes. Boundaries between these clusters are fluid: ‘affordance’ for example (in Fig. 4 ) is in the light green cluster denoting research on VA systems—but it is also connected to the red cluster, discussing security issues (compare Fig. 4 ). The assignment to the ‘green cluster happens based on more frequent links to this topic. The co-occurrence networks for our three scholarly disciplines are displayed in Figs. 3 – 5 . By discussing the clusters, nine topic themes for our research emerged (compare next section).

figure 3

The frequently co-occurring keywords, themes, or topics in research in the CS field on VAs in private households.

figure 4

The frequently co-occurring keywords, topics, or themes in research in the SS field on VAs in private households.

figure 5

The frequently co-occurring keywords, topics, or themes in research in the BMS field on VAs in private households.

We can see that the networks and the topics covered differ in the three scientific areas. By studying and grouping the research topics that were revealed in the co-occurrence analysis within and across scientific areas, we identified nine thematic clusters in VA research. We labeled these clusters as “Smart devices” (cluster 1), “Human–computer interaction (HCI) and user experience (UX)” (cluster 2), “Privacy and technology adoption” (cluster 3), “VA marketing strategies” (cluster 4), “Technical challenges in VA applications development” (cluster 5), “Potential future VAs and augmented reality (AR) applications and developments” (cluster 6), “Efficiency increase by VA use” (cluster 7), “VAs providing legal evidence” (cluster 8) and “VAs supporting assisted living” (cluster 9). The clusters emerged from discussing the different research areas displayed in Figs. 3 – 5 in relation to our research question on strategies around VA solutions in private households. Essentially, the process of finding appropriate clusters for our research involved scanning the research areas, listening, and discussing possible grouping until the four researchers of this paper agreed on a final set of nine clusters. The nine clusters encompass different areas and terms in the figures—e.g., cluster 1 (smart devices) covered the areas ‚virtual assistants‘, ‚conversational agents‘, ‚intelligent assistants‘, ‚home automation‘, and ‚smart speakers‘, ‚smart technology‘. Cluster 2 (HCI and UX) includes areas such as ‘voice user interfaces’, ‘chatbots’, ‘human–computer interaction’, ‘hand-free speakers’. Some of the clusters we identified in this process contained only a small number of areas, such as cluster 4 (marketing strategies), which essentially covers the research areas ‘marketing’ and ‘advertising’.

Data analysis step 2: Qualitative content analysis

It can be difficult to derive qualitative conclusions from quantitative data, which is why, in this study, we additionally conducted a qualitative content analysis of the 267 articles in the cleaned dataset. The objective of this second step was to rigorously assess the results from the bibliometric review, ensuring that the identified nine themes identified in stage 1 are in accordance with the main tenets presented in the literature. Any qualitative content analysis of literature suffers, to a certain extent, from the subjective opinions of the authors. However, the benefits of this method are indisputable and follow a well-established approach used in past studies of a similar kind. To counter the risk of subjectivity in data analysis, we involved three researchers in it, thereby triangulating investigators (Denzin, 1989 ; Flick, 2009 ). We adopted Krippendorff’s ( 2013 ) content analysis methodology to ensure a robust analysis and help with the contextual dimensions of each research field.

In the first step, the nine clusters identified by using VOSViewer were evaluated by the three researchers independently by assigning each of the 267 articles to one of the nine thematic clusters. During this process, it became apparent that the qualitative content analysis confirmed the bibliometric analysis to a large extent, i.e., that most of the articles belonged to the clusters proposed in the bibliometric analysis. However, we excluded 60 articles in this process step, since many of the less obvious thematic mismatches of the articles can only be found in a more in-depth cleaning of the data set: 5 were duplicates (4 allocated to CS, 1 to SS) and 55 papers (46 from CS, 2 from SS and 6 from BMS) were not about VAs in private households. This left us with an overall sample of 207 articles (see the list in the Appendix).

Moreover, we identified articles that belonged to other clusters than suggested in the bibliometric analysis, and assigned them, after discussions with the research team, to the correct cluster. For example, the bibliometric analyses had originally not classified any of the articles in cluster 2 (“HCI and UX”) as belonging to the BMS area, while we identified such articles during the qualitative content analysis. Table 4 below displays the distribution of articles in the final dataset.

After having accomplished this data cleaning, we developed short summary descriptions summarizing the content of the research in each of the nine clusters (see section “Thematic clusters in recent VA research”).

In a final step, we condensed the nine clusters into four meaningful streams, representing distinguishable VA research topics that can support the emergence of interdisciplinary perspectives in research that studies VAs in private households. We applied the following procedure to obtain clusters and allocate papers from the clusters to the streams: First, three researchers independently conceptualized topical research streams. Then, all researchers discussed these streams and agreed on topical headlines reflecting the terminology used in the respective research. Next, they allocated—again first working independently and later together—papers to the four research streams presented in chapter 5. Our aim of finding meaningful streams that can support the emergence of interdisciplinary research on VA in private households made a qualitative procedure appear to be the most appropriate strategy for this step in the analysis. Qualitative analysis helps organize data in meaningful units (Miles and Huberman, 1994 ).

Thematic clusters in recent VA research

From our analysis, recent research on VA in private households can be divided into nine thematic clusters. In the following, we briefly present these clusters and elaborate on connections between the contributions from the three research areas we considered.

Cluster 1: Smart device solutions

Cluster 1 comprises publications on smart device solutions in smart home settings and their potential in orchestrating various household devices (Amit et al., 2019 ). Many CS papers present prototypes of web-based smart home solutions that can be controlled with voice commands, like household devices enabling location-independent access to IoT-based systems (Thapliyal et al., 2018 ; Amit et al., 2019 ; Jabbar et al., 2019 ). A research topic that appears in both the CS and SS areas relates to users’ choices, decisions, and concerns (Pridmore and Mols, 2020 ). Concerns studied relate to privacy issues (Burns and Igou, 2019 ) or the impact of VA use on different age groups of children (Sangal and Bathla, 2019 ).

A topic researched in all three scientific domains is the potential of VAs for overcoming the limitations of home automation systems. CS papers typically cover suggestions for resolving mainly technical limitations, such as those concerning language options (Pyae and Scifleet, 2019 ), wireless transmission range (Jabbar et al., 2019 ), security (Thapliyal et al., 2018 ; Parkin et al., 2019 ), learning from training with humans (Demidova, 2018 ), or sound-based context information (Alrumayh et al., 2019 ). SS research mostly investigates the limitations of VAs in acting as an interlocutor and social contact for humans (Lopatovska and Oropeza, 2018 ; Hoy, 2018 ; Pridmore and Mols, 2020 ), or identifies requirements for more user-friendly and secure systems (Vishwakarma et al., 2019 ). Finally, BMS papers focus on studying efficiency gains from using VAs, for example in the context of saving energy (Vishwakarma et al., 2019 ).

Cluster 2: Human–computer interaction and user experience

Cluster 2 contains human–computer interaction (HCI) research on the users’ experience of VA technology. Researchers investigate user challenges that result from unmet expectations concerning VA-enabled services (Santos-Pérez et al., 2011 ; Han et al., 2018 ; Komatsu and Sasayama, 2019 ). Papers from the SS area are typically discussing language issues (Principi et al., 2013 ; King et al., 2017 ).

A central topic covered both in the CS and BMS publications is trust in and user acceptance of VAs (e.g., Hamill, 2006 ; Hashemi et al., 2018 ; Lackes et al., 2019 ). From the BMS perspective, researchers find that trust and perceived (dis)advantages are factors influencing user decisions on buying or utilizing VAs (Lackes et al., 2019 ). Complementary, CS researchers find that the usefulness of human-VA interactions and access to one’s own household data impacts the acceptance of VAs (e.g., Pridmore and Mols, 2020 ). The combination of these two scientific disciplines discussing a topic without SS entering the debate is unique in our data material.

‘Humanized VAs’ is a topic discussed both in CS and SS research. In CS, this includes quasi-human voice-enabled assistants acting as buddies or companions for older adults living alone (Tsiourti et al., 2018a , b ) or technical challenges with implementing human characteristics (Hamill, 2006 ; Lopatovska and Oropeza, 2018 ; Jacques et al., 2019 ). Two papers from both CS and SS contributed to the theory of anthropomorphism in the VA context (Lopatovska and Oropeza, 2018 ; Pradhan et al., 2019 ). SS additionally offers findings about user needs, like the preferred level of autonomy and anthropomorphism for VAs (Hamill, 2006 ).

Cluster 3: Privacy and technology adoption

Cluster 3 consists predominantly of CS research into privacy-related aspects like the security risks of VA technology and corresponding technical solutions to minimize them (e.g., Dörner, 2017 ; Furey and Blue, 2018 ; Pradhan et al., 2019 ; Sudharsan et al., 2019 ). An exception concerns the user-perceived privacy risks and concerns that are studied in all three scientific domains. Related papers discuss these topics with a focus on user attitudes towards VA technology, resulting in technology adoption, and identify factors motivating VA application (e.g., Demidova, 2018 ; Fruchter and Liccardi, 2018 ; Lau et al., 2018 ; Pridmore and Mols, 2020 ): Perceived privacy risks are found to negatively influence user adoption rates (McLean and Osei-Frimpong, 2019 ). In CS studies, researchers predominantly propose solutions for more efficient VA solutions that users would want to bring into their homes (Seymour, 2018 ; Parkin et al., 2019 ; Vishwakarma et al., 2019 ). These should be equipped with standardized frameworks for data collection and processing (Bytes et al., 2019 ), or with technological countermeasures and detection features to establish IoT security and privacy protection (Stadler et al., 2012 ; Sudharsan et al., 2019 ; Javed and Rajabi, 2020 ). Complementary, SS researchers investigate measures for protecting the privacy of VA users beyond technical approaches, such as legislation ensuring privacy protection (Pfeifle, 2018 ; Dunin-Underwood, 2020 ).

Cluster 4: VA marketing strategies

Cluster 4 comprises research developing strategies for advertising the use of VAs in private households. We find here articles exclusively from BMS. Scholars address various aspects of VA marketing strategies, such as highlighting security improvements or enhanced user-friendliness and intelligence of the devices (e.g., Burns and Igou, 2019 ; Vishwakarma et al., 2019 ). Others study how to measure user satisfaction with VA technology (e.g., Hashemi et al., 2018 ).

Cluster 5: Technical challenges in VA applications development

Cluster 5 contains predominantly CS research papers investigating and proposing solutions for technical challenges in VA application development. Recent work focuses on extensions and improvements for the technologically relatively mature mass-market VAs (e.g., Liciotti et al., 2014 ; Azmandian et al., 2019 ; Jabbar et al., 2019 ; Mavropoulos et al., 2019 ). Some research investigates ways to overcome the technical challenges of VAs in household environments: For example, King et al. ( 2017 ) work on more robust speech recognition, and Ito ( 2019 ) proposes an audio watermarking technique to avoid the misdetection of utterances from other VAs in the same room. Further research on technological improvements includes work on knowledge graphs (Dong, 2019 ), on cross-lingual dialog scenarios (Liu et al., 2020 ), on fog computing for detailed VA data analysis (Zschörnig et al., 2019 ), and on the automated integration of new services based on formal specifications and error handling via follow-up questions (Stefanidi et al., 2019 ).

We identify a complementarity between CS and SS research within the research topic of “affective computing”. In both research domains, researchers strive to identify ways to create more empathic VAs. For example, Tao et al., ( 2018 ) propose a framework that conceptualizes several dimensions of emotion and VA use. SS research contributes to a virtual caregiver prototype aware of the patient’s emotional state (Tironi et al., 2019 ). However, scholarly contributions in the two areas are not related to each other.

Cluster 6: Potential future VA applications and developments

Cluster 6 investigates the future of VAs research, particularly technological advancements we can expect and suggestions for future research avenues. Most CS papers introduce prospective potential technical applications in many different areas, such as medical treatment and therapy (Shamekhi et al., 2017 ; Pradhan et al., 2018 ; Patel and Bhalodiya, 2019 ) or VA content creation and retrieval (Martin, 2017 ; Kita et al., 2019 ). A sub-group of papers also proposes functional prototypes (e.g., Yaghoubzadeh et al., 2015 ; Freed et al., 2016 ; Tielman et al., 2017 ).

We identify three topics that are discussed in both SS and CS publications. The first focuses on language and VAs and represents an area where CS research relates to SS findings: While SS identifies open language issues in dialogs with VAs (Martin, 2017 ; Ong et al., 2018 ; Huxohl, et al., 2019 ), CS researchers investigate how to approach them - not only at the technological level of speech recognition but also in terms of what it means to have a conversation with a machine (Yaghoubzadeh et al., 2015 ; Ong et al., 2018 ; Santhanaraj and Barkathunissa, 2020 ). A second focus is on near-future use scenarios (Hoy, 2018 ; Seymour, 2018 ; Tsiourti et al., 2019 ; Burns and Igou, 2019 ) such as VA library services, VA services for assisted living or support VAs for emergency detection and handling. The third common topic is about identifying future differences between the use of VAs in private households and in other environments like public spaces (Lopatovska and Oropeza, 2018 ; Robinson et al., 2018 ).

Cluster 7: Efficiency increase by VA use

Cluster 7 consists of papers about efficiency increase through VA use—with a focus on smart home automation systems. Papers in BMS discuss the increasing efficiency of home automation systems through the use of VAs (Vishwakarma et al., 2019 ). CS papers study and appraise the efficiency of home automation solutions and use cases, more efficient VA automation systems, interface device solutions (Liciotti et al., 2014 ; Jabbar et al., 2019 ; Jacques et al., 2019 ), effective activity assistance (Freed et al., 2016 ; Palumbo et al., 2016 ; Tielman et al., 2017 ), care for elderly people (Donaldson et al., 2005 ; Wallace and Morris, 2018 ; Tsiourti et al., 2019 ) , and smart assistive user interfaces and systems of the future (Shamekhi et al., 2017 ; Pradhan et al., 2018 ; Mokhtari et al., 2019 ). SS has not yet contributed to this cluster.

Cluster 8: VAs providing legal evidence

Cluster 8 addresses the rather novel topic of digital forensics in papers from the CS and SS domains. The research studies how VA activities can inform court cases. Researchers investigate which information can be gathered, derived, or inferred from IoT-collected data, and what approaches and tools are available and required to analyze them (Shin et al., 2018 ; Yildirim et al., 2019 ).

Cluster 9: VAs supporting assisted living

Cluster 9 comprises papers on VAs supporting assisted living. CS papers explore and describe technical solutions for the application of VAs in households and everyday task planning (König et al., 2016 ; Tsiourti et al., 2018a ; Sanders and Martin-Hammond, 2019 ), for improving aspects of companionship (Donaldson et al., 2005 ), for stress management in relation to chronic pain (Shamekhi et al., 2017 ), and for the recognition of distress calls (Principi et al., 2013 ; Liciotti et al., 2014 ). CS scholars also study user acceptance and the usability of VA for elderly people (Kowalski et al., 2019 ; Purao and Meng, 2019 ).

CS and SS both share a research focus on VAs helping people maintain a self-determined lifestyle (Yaghoubzadeh et al., 2015 ; Mokhtari et al., 2019 ) and on their potential and limitations for home care-therapy (Lopatovska and Oropeza, 2018 ; Kowalski et al., 2019 ; Turner-Lee, 2019 ), but without relating findings to each other.

Analysis and conceptualization of research streams

When comparing the bibliometric and the qualitative content analysis, the clusters found in the bibliometric analysis were confirmed to a large extent. The comparison did, however, also lead to the allocation of some articles to different areas. The content analysis particularly helped subsume the nine clusters in four principal research streams. The overview that we gained based on the four streams points to interdisciplinary research topics that need to be studied by scholars wanting to help realize VA potential through applications perceived as safe by users.

What all research domains share to a certain extent is a focus on users’ perceived privacy risks and concerns and a focus on the impact of perceived risks or concerns on the adoption of VA technology. At the same time, our findings confirm our assumption that these complementarities are generally not well used for advancing the field: In CS, researchers predominantly study future application development and technological advancements, but—except for language issues (cluster 6)—they do not relate this much to solving challenges identified in SS and BMS research. In the following, we first present an overview of the four deduced research streams and, in the next section, propositions and the conceptual model for future interdisciplinary research that we developed based on our analysis.

The four major research streams into which we consolidated the identified nine thematic clusters from our literature review are labeled as “Conceptual foundation of VA research” (stream 1), ”Systemic challenges, enabling technologies and implementation” (stream 2),” Efficiency” (stream 3) and “VA applications and (potential) use cases” (stream 4). The streams were obtained in a qualitative procedure, where three researchers conceptualized streams independently and discussed potentially meaningful streams together (compare 3.3). Table 5 provides an overview of the four main streams identified in VA literature and presents selected publications for each of the streams.

The streams systematize the scattered body of VA research in a way that offers clearly distinguishable interdisciplinary research avenues to assist in strategizing around and realizing VA technology potential with applications that are perceived as safe and make a real difference in the everyday life of users. The first stream includes all papers offering theoretical and conceptual knowledge. Papers, for example, conceptualize challenges for VA user perceptions or develop security and privacy protection concepts. Systemic challenges and enabling technologies to form a second stream in VA research. This particularly includes systemic security and UX challenges, and legal issues. Efficiency presents the third research stream, in which scholars particularly investigate private people’s awareness of how VA can make their homes more efficient and asks how VA can be advertised to private households. Finally, VA applications and potential use cases form a fourth research stream. It investigates user expectations and presents prototypes for greater VA use in future home automation systems, medical care, or IOT forensics.

The overview that we gain based on the four streams enables us to frame the contributions of the research domains to VA research more clearly than based on the nine clusters. We find that all research areas contribute publications in all streams. However, the number of contributions varies: CS acts as the main driver of current developments with most publications in all research streams. CS research predominantly addresses systemic challenges, enabling technologies and technology implementation. We recognize increasing scholarly attention on user-oriented VA applications and on VA systems for novel applications beyond their originally intended usage—such as exploiting the microphone array for sensing a user’s gestures and tracking exercises (Agarwal et al., 2018 ; Tsiourti et al., 2018a / b ), or using VA data for forensics (Dorai et al., 2018 ; Shin et al., 2018 )—which indicates that the fundamental technical challenges in the development of this emergent technology are solved. SS so far mainly contributed to the theoretical foundation of VA design principles and use affordance (Yusri et al., 2017 ), and with the theory that supports developing concrete applications. It also conceptualizes the potential or desirable impact of VA in real-life settings, such as increasing the comfort and quality of life through low-cost smart home automation systems combining VA and smartphones (Kodali et al., 2019 ), or VA adding to content creation (Martin, 2017 ). The contributions by BMS scholars are mainly aimed at researching and promoting efficiency increases from using VAs.

Discussion: Propositions and a framework for future research, and related business opportunities

In this paper, we used a systematic literature review approach combining a bibliometric and qualitative content analysis to structure the dispersed insights from scholarly research on VAs in CS, SS, and BMS, and to conceptualize linkages and common themes between them. We identified four major research streams and specified the contributions of researchers from the different disciplines to them in a conceptual overview. Our research allows us to confirm advances in the technological foundations of VAs (Pyae and Joelsson, 2018 ; Lee et al., 2019 ; McLean and Osei-Frimpong, 2019 ), and some concrete VAs like Alexa, Google, and Siri have already arrived in the mass market. Still, more technologically robust and user-friendly solutions that meet their legal requirements for data security will be needed to spark broader user interest (Kuruvilla, 2019 ; Pridmore and Mols, 2020 ).

Propositions for future research

We find that recent research from the three domains contributes to the challenges that literature identified as hindering a broader user adaption of VA in different ways, and with different foci. Table 6 summarizes the identified challenges and domain-specific research contributions.

However, to advance VA’s adoption in private households. more complex VA solutions will need to convince users that the perceived privacy risks are solved (Kowalczuk, 2018 ; Lackes et al., 2019 ). To this end, all three research domains will need to contribute: CS is required to come up with defining comprehensible frameworks for data collection and processing (Bytes et al., 2019 ), and solutions to ensure data safety (Mirzamohammadi et al., 2017 ; Sudharsan et al., 2019 ; Javed and Rajabi, 2020 ). Complementary, SS should identify the social and legal conditions which users perceive as safe environments for VA use in private households (Pfeifle, 2018 ; Dunin-Underwood, 2020 ). Finally, BMS is urged to identify user advantages that go beyond simple efficiency gains, investigate the benefits of accessing one’s own data and find metrics for user trust in technology applications (Lackes et al., 2019 ). Particularly, SS research is providing potentially valuable insights into users’ perceptions and use case areas such as home medical care or assisted living that would be worth to be taken into account by CS scholars developing advanced solutions, and vice versa benefit from taking available technical solutions into consideration. Similarly, BMS scholarly research exhibits a rather narrow focus on increasing the efficiency of activities by using VA applications, and on how to market these solutions to private households. CS scholars complement this focus with technical solutions aimed at increasing the efficiency of automated home systems, but the research efforts from the two domains are not well aligned. VA security-related issues and solutions, limitations of VA applications for assisted living, and effects of humanization and anthropomorphism seem to be under-investigated topics in BMS.

Thus, our first proposition reads as follows:

Proposition 1 : To advance users' adoption of complex VA applications in private households, domain-specific disciplinary efforts of CS, SS, and BMS need to be integrated by interdisciplinary research .

Our study has shown that this is particularly important to arrive at the necessary insights into how to overcome VA security issues and VA technological development constraints CS works on and, at the same time, deal with the effects of VA humanization (SS research) and develop VA-related business opportunities (BMS research) in smart home systems, assisted living, medical home therapy, and digital forensic. Therefore, we define the following three sub-propositions:

Proposition 1.1 : In order to realize VA potential for medical care solutions that are perceived as safe by users, research insights from studies on VA perception and on perceived security issues from SS need to be integrated with CS research aimed at resolving the technical constraints of VA applications and with BMS research about the development of use cases desirable for private households and related business models .

Proposition 1.2 : To advance smart home system efficiency and arrive at regulations that make users perceive the usage of more complex applications as safe , research insights from studies on systemic integration, and security-related technical solutions from CS need to be studied and developed .

Proposition 1.3 : In order to increase our knowledge of social and economic conditions for VA adoption in private households, BMS and SS research needs to integrate insights from research with users with VA prototypes and research about near-future scenarios of VA use to model and test valid business cases that are not based on mere assumptions of efficiency gains .

In our four streams, we moreover recognize a common interest in studying VAs beyond isolated voice-enabled ‘butlers’. In essence, VAs are increasingly investigated as gateways to smart home systems which are enabling interaction with entire ecosystems. This calls, next to the development of more complex technical applications in CS, mainly for more future research into the social (SS) and economical (BMS) conditions enabling the emergence of such ecosystems—from the necessary changes in regulations to insurance and real estate issues to designing marketing strategies for VA health applications in the home (Olson and Kemery, 2019 ; Bhat, 2005 ; Melkers and Xiao, 2010 ; Sestino et al., 2020 ). The above is not only true for the three scientific domains which we looked at, but also calls for the integration of complementary VA-related research in adjacent disciplines, such as law, policy, or real estate. Our second proposition thus reads as follows:

Proposition 2: To advance users’ adoption of complex VA applications in private households, research needs to perform interdisciplinary efforts to study and develop ways to overcome ecosystem-related technology adoption challenges .

Conceptual framework for future research

As outlined above, future research wishing to contribute to increasing user acceptance and awareness and to generate use cases that make sense for private households in everyday life is urged to make interdisciplinary efforts to integrate complementary findings.

The conceptual framework (Fig. 6 ) presents avenues for future research. The figure highlights Propositions 1 and 2 that emphasize the need to advance user adaptation through interdisciplinary research that can help overcome challenges from complex VA applications (Proposition 1) and ecosystem-related technology adoption challenges (Proposition 2), to advance users’ adoption of complex VA applications. Furthermore, the figure reflects three sub-propositions that summarize relevant avenues for interdisciplinary work that can help solve VA-related security issues, generate security and privacy protection concepts, and advance frameworks for legal regulations. The first sub-propositions is research that helps find solutions for home medical care where VA limitations and security issues are solved. Sub-proposition 2 consists of research needed to advance systemic integration and security-related solutions for efficiency and the regulation of smart home systems. The third sub-proposition involves research that can help define social and economic conditions for VA and create business opportunities by including insights from user research with VA prototypes and from research with near-future scenarios that can model and test valid business cases that are not based on mere assumptions of efficiency gains.

figure 6

The framework highlights the focus of propositions 1 and 2 and reflecting propositions 1.1, 1,2, and 1.3.

Identified business opportunities that will help realize VA potential

Overall, we confirm that VA is not a technology that enables companies to profit from implementing it in their own organizations or make business processes more efficient like other technological innovations (Bhat, 2005 ; Chao et al., 2007 ; Sestino et al., 2020 ). Instead, we find that companies need to build business models around VA-related products and services that users perceive as safe and beneficial. Table 7 below provides an overview of potential areas providing such business opportunities, the technology maturity of these areas, and social and business-related challenges, which need to be solved to fully access VA potential for the everyday life of users.

As shown, the three areas where we identified business opportunities from literature, i.e. smart home systems (Freed et al., 2016 ; Thapliyal et al., 2018 ; Jabbar et al., 2019 ), assisted living and medical home therapy (König et al., 2016 ; Tsiourti et al., 2018a / b ; Sanders and Martin-Hammond, 2019 ), and digital forensics (Shin et al., 2018 ; Yildirim et al., 2019 ) exhibit different technology, social system conditions, and business model maturity models. It is relevant to say that, although in our review, cluster 8 ‘digital forensics’ consisted of only two papers, we can expect this to be an increasingly salient cluster in the next few years due to the importance of the topic for governmental bodies and society.

Designing appropriate business models will require companies, in the first step, to develop a deep understanding of the potential design of future ecosystems, i.e. of “the evolving set of actors, activities, and artifacts, including complementary and substitute relations, that are important for the innovative performance of an actor or a population of actors.” (Granstrand and Holgersson, 2020 , p. 3). We here call for interdisciplinary research that develops and integrates the necessary insights in a thorough and, for companies, comprehensible manner.

Methodology

In this paper, we used a relatively new approach to a literature review: We combined an automated bibliometric analysis with qualitative content analysis to gain holistic insights into a multi-faceted research topic and to structure the available body of knowledge across three scientific domains. In doing so, we followed the advice in recent research that found the classical, purely content-based literature reviews to be time-consuming, lacking rigor, and prone to be affected by the researchers’ biases (Caputo et al., 2018 ; Verma and Gustafsson, 2020 ). Overall, we can confirm that automating literature research through VOSviewer turned out to be a time-saver regarding the actual search across (partly domain-specific) sources and the collection of scientific literature, and it allowed us to relatively quickly identify meaningful research clusters based on keywords in an enormous body of data (Verma, 2017 ; Van Eck and Waltman, 2014 ). However, we also found that several additional steps were necessary to assuring the quality of the review: Despite the careful selection of keywords, the initial literature list contained several irrelevant articles (i.e., not addressing VA-related topics, yet involving the keywords ‘echo’ and ‘home’).

Thus, manual cleaning of the literature lists was required before meaningful graphs could be generated by VOSviewer. The consequent step of identifying research clusters from the graphs demanded broad topical expertise. We found this identification of clusters to be—as described by Krippendorff ( 2013 )—a necessarily iterative process, not only to continuously refine meaningful clusters but also to reach a common understanding and interpretation in an interdisciplinary team. In a similar vein, deriving higher-level categories, i.e. the research streams, turned out to require iterative refinements.

Retrospectively, the quantitative bibliometric analysis helped in recognizing both core topics and gaps in VA-related research with comprehensive reach. The complementary content analysis yielded insights into intersections and overlaps in research by the different areas considered and enabled the identification of further promising avenues for interdisciplinary research.

Conclusions

From our study, we conclude that research into VA-based services is not taking advantage of the potential synergies across disciplines. Business opportunities can specifically be found in spaces that require the combination of research domains that are still disconnected. This should be taken into account when looking for information that can help predict the service value of smart accommodation (Papagiannidis and Davlembayeva, 2022 ) or characteristics of future technology use cases that can fit users’ needs (Nguyen et al., 2022 ). This can also support scholars and managers in strategizing about future business opportunities (Brem et al., 2019 ; Antonopoulou and Begkos, 2020 ).

In consequence, our framework and the propositions we developed highlight the fact that more interdisciplinary research is needed and what type of research is needed to advance the development and application of VA in private households and, by implication, inform companies about future business opportunities.

The study also provides concrete future characteristics of VA use cases technology: Constant development in research on VAs, e.g., on novel devices and complementary technology like artificial intelligence and virtual reality, suggests that future VAs will no longer be limited to audio-only devices, but increasingly feature screens and built-in cameras, and offer more advanced use cases. Accordingly, embodied VAs in the form of for example social robots, require further technology advancement and integration, and studies on user perception.

Implications for managers

Our research enabled us to identify and describe the most promising areas for business opportunities while highlighting related technological, social, and business challenges. From this, it became obvious that managers need to take all three dimensions and related types of challenges into account in order to successfully predict characteristics of future technology use cases that fit users’ needs, and use this information for their strategy development processes (Brem et al., 2019 ; Antonopouloua and Begkos, 2020). This requires not just the design of new services and business models, but of complete business ecosystems, and the establishment of partnerships from the private sector. We moreover found that establishing trust in the safe and transparent treatment of privacy and data is key in getting users to buy and use services involving VA, while pure efficiency-based arguments are not enough to dispel current worries of potential users, like the data security of technology used to improve the tracking and monitoring of patients or viruses (Abdel-Basset et al., 2021 ).

Although our study investigated VAs in private households, with the growing acceptance of working from home, not the least due to the experiences made in the COVID-19 pandemic, our findings also have implications for organizing homework environments. While, for example, the Alexa “daily check” and Apple health check app can provide a community-based AI technology that can support self-testing and virus tracking efforts (Abdel-Basset et al., 2021 ), managers will need to ensure that company data is safe, and this will require them to consider how their employees use VA hardware at home.

Limitations

As with most research, this study has its limitations. While we see value in the combined approach taken in this research, as it allows insights around strategies for VA solutions that match the needs of private households, limitations can be seen in the qualitative approach of our methodology, which is subject to a certain degree of author subjectivity. Limitations of our work also relate to the fact that we included only articles from the Scopus database in this review. Thus, future research should consider articles published in other databases like EBSCO, Web of Science, or Google Scholar. Also, the study focused on only three scientific domains up to May 2020. This review paper does not offer a discussion of the consequences of the ongoing changes triggered by the Covid-19 pandemic for the use of VA solutions in private households. The impact of this disruptive pandemic experience on the use of VA is not yet well understood. More research will be necessary to obtain a complete account of how Covid-19 transformed the use of VA in private homes today and to help understand the linkages and intersections between further research areas using the same methodology.

The combined bibliometric and qualitative content analysis provided an overview of connections and intersections, and an in-depth overview of current research streams. Future research could conduct co-citation and/or bibliographic coupling analyses of authors, institutions, countries, references, etc. to complement our research.

Data availability

Datasets were derived from public resources. Data sources for this article are provided in the Methods section of this article. Data analysis documents are not publicly available as researchers have moved on to other institutions.

Abdel-Basset M, Chang V, Nabeeh NA (2021) An intelligent framework using disruptive technologies for COVID-19 analysis. Technol Forecast Soc Change 163:120431. https://doi.org/10.1016/j.techfore.2020.120431

Agarwal A, Jain M, Kumar P, Patel S (2018) Opportunistic sensing with MIC arrays on smart speakers for distal interaction and exercise tracking. In: IEEE Press (ed), 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6403–6407

Alrumayh AS, Lehman SM, Tan CC (2019) ABACUS: audio based access control utility for smarthomes. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. pp. 395–400

Amit S, Koshy AS, Samprita S, Joshi S, Ranjitha N (2019) Internet of Things (IoT) enabled sustainable home automation along with security using solar energy. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International Conference on Communication and Electronics Systems (ICCES). pp. 1026–1029

Ammari T, Kaye J, Tsai J, Bentley F (2019) Music, search, and IoT: how people (really) use voice assistants. ACM Trans Comput–Hum Interact 26:1–28. https://doi.org/10.1145/3311956

Article   Google Scholar  

Antonopoulou K, Begkos C (2020) Strategizing for digital innovations: value propositions for transcending market boundaries. Technol Forecast Soc Change 156:120042

Aylett MP, Cowan BR, Clark L (2019) Siri, echo and performance: you have to suffer darling. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–10

Azmandian M, Arroyo-Palacios J, Osman S (2019) Guiding the behavior design of virtual assistants. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 19th ACM international conference on intelligent virtual agents. pp. 16–18

Batat W (2021) How augmented reality (AR) is transforming the restaurant sector: Investigating the impact of “Le Petit Chef” on customers’ dining experiences. Technol Forecast Soc Change 172:121013

Bhat JSA (2005) Concerns of new technology based industries—the case of nanotechnology. Technovation 25(5):457–462. https://doi.org/10.1016/j.technovation.2003.09.001

Berg Insight (2022) The number of smart homes in Europe and North America reached 105 million in 2021, Press Releases, 20 April 2022. https://www.berginsight.com/the-number-of-smart-homes-in-europe-and-north-america-reached-105-million-in-2021

Birgonul Z, Carrasco O (2021) The adoption of multidimensional exploration methodology to the design-driven innovation and production practices in AEC industry. J Constr Eng Manag Innov 4(2):92–10. https://doi.org/10.31462/jcemi.2021.02092105

Brandt M (2018) Wenig echo in Deutschland. Statista

Brasser F, Frassetto T, Riedhammer K, Sadeghi A-R, Schneider T, Weinert C (2018) VoiceGuard: secure and private speech processing. In: International Speech Communication Association (ISCA) (ed), Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH. pp. 1303–1307

Brause SR, Blank G (2020) Externalized domestication: smart speaker assistants, networks and domestication theory. Inf Commun Soc 23(5):751–763. https://doi.org/10.1080/1369118X.2020.1713845

Brem A, Bilgram V, Marchuk A (2019) How crowdfunding platforms change the nature of user innovation–from problem solving to entrepreneurship. Technol Forecast Soc Change 144:348–360

Brenner W, Giffen BV, Koehler J (2021) Management of artificial intelligence: feasibility, desirability and viability. In: Aier S et al. (eds), Engineering the transformation of the enterprise. pp. 15–36

Burns MB, Igou A (2019) “Alexa, write an audit opinion”: adopting intelligent virtual assistants in accounting workplaces. J Emerg Technol Account 16(1):81–92. https://doi.org/10.2308/jeta-52424

Bytes A, Adepu S, Zhou J (2019) Towards semantic sensitive feature profiling of IoT devices. IEEE Internet Things J 6(5):8056–8064. https://doi.org/10.1109/JIOT.2019.2903739

Calaça J, Nóbrega L, Baras K (2019) Smartly water: Interaction with a smart water network. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2019 5th Experiment International Conference (Exp. at’19). pp. 233–234

Callahan JL (2014) Writing literature reviews: a reprise and update. Hum Resour Dev Rev 13(3):271–275. https://doi.org/10.1177/1534484314536705

Caputo A, Ayoko OB, Amoo N (2018) The moderating role of cultural intelligence in the relationship between cultural orientations and conflict management styles. J Bus Res 89:10–20. https://doi.org/10.1016/j.jbusres.2018.03.042

Carayannis EG, Turner E (2006) Innovation diffusion and technology acceptance: the case of PKI technology. Technovation 26(7):847–855. https://doi.org/10.1016/j.technovation.2005.06.013

Celebre AMD, Dubouzet AZD, Medina IBA, Surposa ANM, Gustilo RC (2015) Home automation using raspberry Pi through Siri enabled mobile devices. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2015 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM). pp. 1–6

Chan ZY, Shum P (2018) Smart office: a voice-controlled workplace for everyone. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2nd international symposium on computer science and intelligent control. pp. 1–5

Chao C-C, Yang J-M, Jen W-Y (2007) Determining technology trends and forecasts of RFID by a historical review and bibliometric analysis from 1991 to 2005. Technovation 27(5):268–279. https://doi.org/10.1016/j.technovation.2006.09.003

Clark M, Newman MW, Dutta P (2022) ARticulate: one-shot interactions with intelligent assistants in unfamiliar smart spaces using augmented reality. Proc ACM Interact Mob Wearable Ubiquitous Technol 6(1):1–24

Clemente C, Greco E, Sciarretta E, Altieri L (2022) Alexa, how do i feel today? Smart speakers for healthcare and wellbeing: an analysis about uses and challenges. Sociol Soc Work Rev 6(1):6–24

Google Scholar  

Columbus, L (2020) What’s new in Gartner’s hype cycle for emerging technologies, 2020. Forbes. https://www.forbes.com/sites/louiscolumbus/2020/08/23/whats-new-in-gartners-hype-cycle-for-emerging-technologies-2020/?sh=6363286fa46a

Demidova E (2018) Can children teach AI? Towards expressive human–AI dialogs. In: Vrandečić D, Bontcheva K, Suárez-Figueroa MC, Presutti V, Celino I, Sabou M, Kaffee L-A, Simperl E (eds), International Semantic Web Conference Proceedings (P&D/Industry/BlueSky). p. 2180

Denzin NK (1989) Interpretive biography, vol. 17. SAGE

Dercole F, Dieckmann U, Obersteiner M, Rinaldi S (2008) Adaptive dynamics and technological change. Technovation 28(6):335–348. https://doi.org/10.1016/j.technovation.2007.11.004

Deshpande NG, Itole DA (2019) Personal assistant based home automation using Raspberry Pi. Int J Recent Technol Eng

Donaldson J, Evnin J, Saxena S (2005) ECHOES: encouraging companionship, home organization, and entertainment in seniors. In: Association for Computing Machinery (ACM) (ed), Proceedings of the CHI’05 extended abstracts on human factors in computing systems. pp. 2084–2088

Dong XL (2019) Building a broad knowledge graph for products. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019-April. pp. 25–25

Dorai G, Houshmand S, Baggili I (2018) I know what you did last summer: Your smart home internet of things and your iPhone forensically ratting you out. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 13th international conference on availability, reliability and security. Article 3232814

Dörner R (2017) Smart assistive user interfaces in private living environments. In: Gesellschaft für Informatik e.V. (GI) (ed), Lecture notes in informatics (LNI), proceedings—series of the gesellschaft fur informatik (GI). pp. 923–930

Drucker PF (1988) The coming of the new organization. Reprint Harvard Business Review, 88105. https://ams-forschungsnetzwerk.at/downloadpub/the_coming-of_the_new_organization.pdf . Accessed 10 Jul 2022

Druga S, Williams R, Breazeal C, Resnick M (2017) “Hey Google is it OK if I eat you?” Initial explorations in child-agent interaction. In: Blikstein P, Abrahamson D (eds), Proceedings of the 2017 conference on Interaction Design and Children (IDC ’17). pp. 595–600

Dunin-Underwood A (2020) Alexa, can you keep a secret? Applicability of the third-party doctrine to information collected in the home by virtual assistants. Inf Commun Technol Law 29(1):101–119. https://doi.org/10.1080/13600834.2020.1676956

Elahi H, Wang G, Peng T, Chen J (2019) On transparency and accountability of smart assistants in smart cities. Appl Sci 9(24):5344. https://doi.org/10.3390/app9245344

Fathalizadeh A, Moghtadaiee V, Alishahi M (2022) On the privacy protection of indoor location dataset using anonymization. Comput Secur 117:102665

Flick U (2009) An introduction to qualitative research, 4th edn. SAGE

Freed M, Burns B, Heller A, Sanchez D, Beaumont-Bowman S (2016) A virtual assistant to help dysphagia patients eat safely at home. IJCAI 2016:4244–4245

Fruchter N, Liccardi I (2018) Consumer attitudes towards privacy and security in home assistants. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2018 CHI conference on human factors in computing systems, 2018-April. pp. 1–6

Furey E, Blue J (2018) She knows too much—voice command devices and privacy. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of 2018 29th Irish Signals and Systems Conference (ISSC). pp. 1–6

Gartner (2019) Gartner predicts 25 percent of digital workers will use virtual employee assistants daily by 2021. Gartner https://www.gartner.com/en/newsroom/press-releases/2019-01-09-gartner-predicts-25-percent-of-digital-workers-will-u

Giorgi R, Bettin N, Ermini S, Montefoschi F, Rizzo A (2019) An iris+voice recognition system for a smart doorbell. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 8th Mediterranean Conference on Embedded Computing (MECO). pp. 1–4

Gnewuch U, Morana S, Heckmann C, Maedche A (2018) Designing conversational agents for energy feedback. In: Chatterjee S, Dutta K, Sundarraj RP (eds), Proceedings of the International conference on design science research in information systems and technology, vol 10844. pp. 18–33

Gong Y, Yatawatte H, Poellabauer C, Schneider S, Latham S (2018) Automatic autism spectrum disorder detection using everyday vocalization captured by smart devices. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. pp. 465–473

Goud N, Sivakami A (2019) Spectate home appliances by internet of things using MQTT and IFTTT through Google Assistant. Int J Sci Technol Res 8(10):1852–1857

Granstrand O, Holgersson M (2020) Innovation ecosystems: a conceptual review and a new definition. Technovation 90:102098

Grossman GM, Helpman E (1991) Innovation and growth in the global economy. MIT Press

Grossman-Kahn B, Rosensweig R (2012) Skip the silver bullet: driving innovation through small bets and diverse practices. Lead Through Design 18:815

Hamill L (2006) Controlling smart devices in the home. Inf Soc 22(4):241–249. https://doi.org/10.1080/01972240600791382

Han J, Chung AJ, Sinha MK, Harishankar M, Pan S, Noh HY, Zhang P, Tague P (2018) Do you feel what I hear? Enabling autonomous IoT device pairing using different sensor types. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE symposium on Security and Privacy (SP). pp. 836–852

Harzing A-W, Alakangas S (2016) Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison. Scientometrics 106(2):787–804. https://doi.org/10.1007/s11192-015-1798-9

Hashemi SH, Williams K, El Kholy A, Zitouni I, Crook PA (2018) Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 27th ACM international conference on information and knowledge management. pp. 1183–1192

Hern A (2017) Google Home smart speaker brings battle of living rooms to UK. The Guardian. https://www.theguardian.com/technology/2017/mar/28/google-home-smart-speaker-launch-uk

Herring H, Roy R (2007) Technological innovation, energy efficient design and the rebound effect. Technovation 27(4):194–203. https://doi.org/10.1016/j.technovation.2006.11.004

Hofmann C, Orr S (2005) Advanced manufacturing technology adoption—the German experience. Technovation 25(7):711–724. https://doi.org/10.1016/j.technovation.2003.12.002

Hoy MB (2018) Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med Ref Serv Q 37(1):81–88. https://doi.org/10.1080/02763869.2018.1404391

Article   PubMed   Google Scholar  

Hu J, Tu X, Zhu G, Li Y, Zhou Z (2013) Coupling suppression in human target detection via impulse through wall radar. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2013 14th International Radar Symposium (IRS), vol 2. pp. 1008–1012

Huxohl T, Pohling M, Carlmeyer B, Wrede B, Hermann T (2019) Interaction guidelines for personal voice assistants in smart homes. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 international conference on Speech Technology and Human–Computer Dialogue (SpeD). pp. 1–10

IDEO.org (2009) Human-centred design toolkit. IDEO.org

Ichikawa J, Mitsukuni K, Hori Y, Ikeno Y, Alexandre L, Kawamoto T, Nishizaki Y, Oka N (2019) Analysis of how personality traits affect children’s conversational play with an utterance-output device. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). pp. 215–220

Ilievski A, Dojchinovski D, Ackovska N, Kirandziska V (2018) The application of an air pollution measuring system built for home living. In: Kalajdziski S, Ackovska N (eds) ICT innovations 2018. Engineering and life sciences. Springer, pp. 75–89

Ito A (2019) Muting machine speech using audio watermarking. In: Pan J-S, Ito A, Tsai P-W, Jain LC (eds) Recent advances in intelligent information hiding and multimedia signal processing. Springer, pp. 74–81

Jabbar WA, Kian TK, Ramli RM, Zubir SN, Zamrizaman NS, Balfaqih M, Shepelev V, Alharbi S (2019) Design and fabrication of smart home with internet of things enabled automation system. IEEE Access 7:144059–144074. https://doi.org/10.1109/ACCESS.2019.2942846

Jacques R, Følstad A, Gerber E, Grudin J, Luger E, Monroy-Hernández A, Wang D (2019) Conversational agents: acting on the wave of research and development. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–8

Javed Y, Rajabi N (2020) Multi-Layer perceptron artificial neural network based IoT botnet traffic classification. In: Arai K, Bhatia R, Kapoor S (eds) Proceedings of the Future Technologies Conference (FTC) 2019. Springer, pp. 973–984

Jones VK (2018) Voice-activated change: marketing in the age of artificial intelligence and virtual assistants. J Brand Strategy 7(3):233–245

Kandlhofer M, Steinbauer G, Hirschmugl-Gaisch S, Huber P (2016) Artificial intelligence and computer science in education: from kindergarten to university. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2016 IEEE Frontiers in Education Conference (FIE). pp. 1–9

Kerekešová V, Babič F, Gašpar V (2019) Using the virtual assistant Alexa as a communication channel for road traffic situation. In: Choroś K, Kopel M, Kukla E, Siemiński A (eds) Multimedia and network information systems, vol 833. Springer, pp. 35–44

Khattar S, Sachdeva A, Kumar R, Gupta R (2019) Smart home with virtual assistant using Raspberry Pi. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 9th International conference on cloud computing, data science & engineering (Confluence). pp. 576–579

King B, Chen I-F, Vaizman Y, Liu Y, Maas R, Parthasarathi SHK, Hoffmeister B (2017) Robust speech recognition via anchor word representations. In: International Speech Communication Association (ISCA) (ed), Proceedings of the Interspeech 2017. pp. 2471–2475

Kita T, Nagaoka C, Hiraoka N, Dougiamas M (2019) Implementation of voice user interfaces to enhance users’ activities on Moodle. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of 2019 4th international conference on information technology. pp. 104–107

Kodali RK, Rajanarayanan SC, Boppana L, Sharma S, Kumar A (2019) Low cost smart home automation system using smart phone. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129). pp. 120–125

Komatsu S, Sasayama M (2019) Speech error detection depending on linguistic units. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2019 3rd international conference on natural language processing and information retrieval. pp. 75–79

König A, Francis LE, Malhotra A, Hoey J (2016) Defining Affective Identities in elderly nursing home residents for the design of an emotionally intelligent cognitive assistant. In: Favela J, Matic A, Fitzpatrick G, Weibel N, Hoey J (eds) Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare. ICST, pp. 206–210

Kortum SS (1997) Research, patenting, and technological change. Econometrica 1389–1419. https://doi.org/10.2307/2171741

Kowalczuk P (2018) Consumer acceptance of smart speakers: a mixed methods approach. J Res Interact Mark 12(4):418–431. https://doi.org/10.1108/JRIM-01-2018-0022

Kowalski J, Jaskulska A, Skorupska K, Abramczuk K, Biele C, Kopeć W, Marasek K (2019) Older adults and voice interaction: a pilot study with Google Home. In: Extended abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–6

Krippendorff K (2013) Content analysis: an introduction to its methodology. SAGE

Krotov V (2017) The Internet of Things and new business opportunities. Gener Potential Emerg Technol 60(6):831–841. https://doi.org/10.1016/j.bushor.2017.07.009

Kumar A (2018) AlexaPi3—an economical smart speaker. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE Punecon. pp. 1–4

Kumar N, Lee SC (2022) Human–machine interface in smart factory: a systematic literature review. Technol Forecast Soc Change 174:121284

Kunath G, Hofstetter R, Jörg D, Demarchi D (2019) Voice first barometer Schweiz 2018. Universität Luzern, pp. 1–25

Kuruvilla R (2019) Between you, me, and Alexa: on the legality of virtual assistant devices in two-party consent states. Wash Law Rev 94(4):2029–2055

Lackes R, Siepermann M, Vetter G (2019). Can I help you?—the acceptance of intelligent personal assistants. In: Pańkowska M, Sandkuhl K (eds) Perspectives in business informatics research. Springer, pp. 204–218

Lau J, Zimmerman B, Schaub F (2018) Alexa, are you listening?: Privacy perceptions, concerns and privacy-seeking behaviors with smart speakers. Proc ACM Hum–Comput Interact 2:1–31. https://doi.org/10.1145/3274371 . (CSCW)

Leahey E, Beckman CM, Stanko TL (2017) Prominent but less productive: The impact of interdisciplinarity on scientists’ research. Adm Sci Q 62(1):105–139. https://doi.org/10.1177/0001839216665364

Lee I, Kinney CE, Lee B, Kalker AA (2009) Solving the acoustic echo cancellation problem in double-talk scenario using non-gaussianity of the near-end signal. In: Association for Computing Machinery (ACM) (ed), International conference on independent component analysis and signal separation. pp. 589–596

Lee S, Kim S, Lee S (2019) “What does your agent look like?” A drawing study to understand users’ perceived persona of conversational agent. In: Association for Computing Machinery (ACM) (ed), Extended abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–6

Li S, Garces E, Daim T (2019) Technology forecasting by analogy-based on social network analysis: the case of autonomous vehicles. Technol Forecast Soc Change 148:119731. https://doi.org/10.1016/j.techfore.2019.119731

Li W, Chen Y, Hu H, Tang C (2020) Using granule to search privacy preserving voice in home IoT systems. IEEE Access 8:31957–31969. https://doi.org/10.1109/ACCESS.2020.2972975

Liciotti D, Ferroni G, Frontoni E, Squartini S, Principi E, Bonfigli R, Zingaretti P, Piazza F (2014) Advanced integration of multimedia assistive technologies: a prospective outlook. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2014 IEEE/ASME 10th international conference on Mechatronic and Embedded Systems and Applications (MESA). pp. 1–6

Liu Z, Shin J, Xu Y, Winata GI, Xu P, Madotto A, Fung P (2020) Zero-shot cross-lingual dialogue systems with transferable latent variables. ArXiv. https://arxiv.org/pdf/1911.04081.pdf

Lopatovska I, Oropeza H (2018) User interactions with “Alexa” in public academic space. Proceedings of the Association for Information Science and Technology 55(1):309–318. https://doi.org/10.1002/pra2.2018.14505501034

Lopatovska I, Rink K, Knight I, Raines K, Cosenza K, Williams H, Sorsche P, Hirsch D, Li Q, Martinez A (2019) Talk to me: exploring user interactions with the Amazon Alexa. J Librariansh Inf Sci 51(4):984–997. https://doi.org/10.1177/0961000618759414

Lovato SB, Piper AM, Wartella EA (2019) Hey Google, do unicorns exist? Conversational agents as a path to answers to children’s questions. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 18th ACM international conference on interaction design and children. pp. 301–313

Miles MB, Huberman AM (1994) Qualitative data analysis. A source book of new methods, 2nd edn. Sage

Macdonald RJ, Jinliang W (1994) Time, timeliness of innovation, and the emergence of industries. Technovation 14(1):37–53. https://doi.org/10.1016/0166-4972(94)90069-8

Malik KM, Malik H, Baumann R (2019) Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE conference on Multimedia Information Processing and Retrieval (MIPR). pp. 523–528

Martin EJ (2017) How Echo, Google Home, and other voice assistants can change the game for content creators. EContent. http://www.econtentmag.com/Articles/News/News-Feature/How-Echo-Google-Home-and-Other-Voice-Assistants-Can-Change-the-Game-for-Content--Creators-116564.htm

Masutani O, Nemoto S, Hideshima Y (2019) Toward a better IPA experience for a connected vehicle by means of usage prediction. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Qualitative data analysis. A source book of new methods, 2nd edn2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). pp. 681–686

Mavropoulos T, Meditskos G, Symeonidis S, Kamateri E, Rousi M, Tzimikas D, Papageorgiou L, Eleftheriadis C, Adamopoulos G, Vrochidis S, Kompatsiaris I (2019) A context-aware conversational agent in the rehabilitation domain. Futur Internet 11(11):231. https://doi.org/10.3390/fi11110231 . Article

McLean G, Osei-Frimpong K (2019) Hey Alexa… examine the variables influencing the use of artificial intelligent in-home voice assistants. Comput Hum Behav 99:28–37. https://doi.org/10.1016/j.chb.2019.05.009

McReynolds E, Hubbard S, Lau T, Saraf A, Cakmak M, Roesner F (2017) Toys that listen: a study of parents, children, and internet-connected toys. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2017 CHI conference on human factors in computing systems. pp. 5197–5207

Melkers J, Xiao F (2010) Boundary-spanning in emerging technology research: determinants of funding success for academic scientists. J Technol Transf 37(3):251–270. https://doi.org/10.1007/s10961-010-9173-8

Mirzamohammadi S, Chen JA, Sani AA, Mehrotra S, Tsudik G (2017) Ditio: trustworthy auditing of sensor activities in mobile and IoT devices. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 15th ACM conference on embedded network sensor systems

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097 . Article

Article   PubMed   PubMed Central   Google Scholar  

Mokhtari M, de Marassé A, Kodys M, Aloulou H (2019) Cities for all ages: Singapore use case. In: Stephanidis C, Antona M (eds) HCI international 2019—late breaking posters. Springer, pp. 251–258

Nguyen TH, Waizenegger L, Techatassanasoontorn AA (2022) “Don’t Neglect the User!”–Identifying Types of Human-Chatbot Interactions and their Associated Characteristics. Inf Syst Front 24(3):797–838

Oh S-R, Kim Y-G (2017) Security requirements analysis for the IoT. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 International conference on Platform Technology and Service (PlatCon). pp. 1–6

Olson C, Kemery K (2019) 2019 Voice report: from answers to action: Customer adoption of voice technology and digital assistants. Technical report. Microsoft

Omale G (2020) Customer service and support leaders can use this Gartner Hype Cycle to assess the maturity and risks of customer service and support technologies. Gartner. https://www.gartner.com/smarterwithgartner/5-trends-drive-the-gartner-hype-cycle-for-customer-service-and-support-technologies-2020/

Ong DT, De Jesus CR, Gilig LK, Alburo JB, Ong E (2018) A dialogue model for collaborative storytelling with children. In: Yang JC, Chang M, Wong L-H, Rodrigo MM (eds), 26th International conference on computers in education workshop on innovative technologies for enhancing interactions and learning. pp. 205–210

Palumbo F, Gallicchio C, Pucci R, Micheli A (2016) Human activity recognition using multisensor data fusion based on reservoir computing. J Ambient Intell Smart Environ 8(2):87–107. https://doi.org/10.3233/AIS-160372

Papagiannidis S, Davlembayeva D (2022) Bringing Smart Home Technology to Peer-to-Peer Accommodation: Exploring the Drivers of Intention to Stay in Smart Accommodation. Inf Syst Front 24(4):1189–1208

Parkin S, Patel T, Lopez-Neira I, Tanczer L (2019) Usability analysis of shared device ecosystem security: Informing support for survivors of IoT-facilitated tech-abuse. In: Association for Computing Machinery (ACM) (ed), Proceedings of the new security paradigms workshop. pp. 1–15

Patel D, Bhalodiya P (2019) 3D holographic and interactive artificial intelligence system. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT). pp. 657–662

Patnaik D, Becker R (1999) Needfinding: the why and how of uncovering people’s needs. Design Manag J (Former Ser) 10(2):37–43

Petticrew M, Roberts H (2006) Systematic reviews in the social sciences: a practical guide. John Wiley & Sons

Pfeifle A (2018) Alexa, what should we do about privacy: protecting privacy for users of voice-activated devices. Wash Law Rev 93:421

Porter ME (1990) Competitive advantage of nations. Competitive Intell Rev 1(1):14

Portillo CD, Lituchy TR (2018) An examination of online repurchasing behavior in an IoT environment. In: Simmers CA, Anandarajan M (eds) The Internet of People, Things and Services: workplace tranformations. Routledge, pp. 225–241

Pradhan A, Findlater L, Lazar A (2019) “Phantom friend” or “just a box with information”: personification and ontological categorization of smart speaker-based voice assistants by older adults. In: Association for Computing Machinery (ACM) (ed), Proceedings of the ACM on Human–Computer Interaction, 3(CSCW)

Pradhan A, Mehta K, Findlater L (2018) “Accessibility came by accident”: use of voice-controlled intelligent personal assistants by people with disabilities. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 CHI conference on human factors in computing systems. pp. 1–13

Pridmore J, Mols A (2020) Personal choices and situated data: privacy negotiations and the acceptance of household intelligent personal assistants. Big Data Soc 7(1):205395171989174. https://doi.org/10.1177/2053951719891748 . Article

Principi E, Squartini S, Piazza F, Fuselli D, Bonifazi M (2013) A distributed system for recognizing home automation commands and distress calls in the Italian language. INTERSPEECH, pp. 2049–2053

Purao S, Meng C (2019) Data capture and analyses from conversational devices in the homes of the elderly. In: Guizzardi G, Gailly F, Suzana R, Pitangueira Maciel (eds) Lecture notes in computer science, vol 11787. Springer, pp. 157–166

Purington A, Taft JG, Sannon S, Bazarova NN, Taylor SH (2017) “Alexa is my new BFF”: social roles, user satisfaction, and personification of the Amazon Echo. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems. pp. 2853–2859

Pyae A, Joelsson TN (2018) Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 20th international conference on human-computer interaction with mobile devices and services adjunct. pp. 127–131

Pyae A, Scifleet P (2019) Investigating the role of user’s English language proficiency in using a voice user interface: a case of Google Home smart speaker. In: Association for Computing Machinery (ACM) (ed), (Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems—CHI EA ’19. pp. 1–6

Rabassa V, Sabri O, Spaletta C (2022) Conversational commerce: do biased choices offered by voice assistants’ technology constrain its appropriation? Technol Forecast Soc Change 174:121292

Robinson S, Pearson J, Ahire S, Ahirwar R, Bhikne B, Maravi N, Jones M (2018) Revisiting “hole in the wall” computing: private smart speakers and public slum settings. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 CHI conference on human factors in computing systems. pp. 1–11

Robledo-Arnuncio E, Wada TS, Juang B-H (2007) On dealing with sampling rate mismatches in blind source separation and acoustic echo cancellation. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2007 IEEE workshop on applications of signal processing to audio and acoustics. pp. 34–37

Rzepka C, Berger B, Hess T (2022) Voice assistant vs. Chatbot–examining the fit between conversational agents’ interaction modalities and information search tasks. Inf Syst Front 24(3):839–856

Saadaoui FZ, Mahmoudi C, Maizate A, Ouzzif M (2019) Conferencing-Ng protocol for Internet of Things. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 Third international conference on Intelligent Computing in Data Sciences (ICDS). pp. 1–5

Samarasinghe N, Mannan M (2019a) Towards a global perspective on web tracking. Comput Secur 87:101569. https://doi.org/10.1016/j.cose.2019.101569

Samarasinghe N, Mannan M (2019b) Another look at TLS ecosystems in networked devices vs. web servers. Comput Secur 80:1–13. https://doi.org/10.1016/j.cose.2018.09.001

Sanders J, Martin-Hammond A (2019) Exploring autonomy in the design of an intelligent health assistant for older adults. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 24th International conference on intelligent user interfaces: companion. pp. 95–96

Sangal S, Bathla R (2019) Implementation of restrictions in smart home devices for safety of children. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International Conference on Information Systems and Computer Networks (ISCON). pp. 139–143

Santhanaraj K, Barkathunissa A (2020) A study on the factors affecting usage of voice assistants and the interface transition from touch to voice. Int J Adv Sci Technol 29(5):3084–3102

Santos-Pérez M, González-Parada E, Cano-García JM (2011) AVATAR: an open source architecture for embodied conversational agents in smart environments. In: Bravo J, Hervás R, Villarreal V (eds) Ambient Assisted living. Springer, pp. 109–115

Sestino A, Prete MI, Piper L, Guido G (2020) Internet of Things and Big Data as enablers for business digitalization strategies. Technovation 98:102173. https://doi.org/10.1016/j.technovation.2020.102173 . Article

Article   PubMed Central   Google Scholar  

Seymour W (2018) How loyal is your Alexa? Imagining a respectful smart assistant. In: Association for Computing Machinery (ACM) (ed), Extended abstracts of the 2018 CHI conference on human factors in computing systems. pp. 1–6

Shamekhi A, Bickmore T, Lestoquoy A, Gardiner P (2017) Augmenting group medical visits with conversational agents for stress management behavior change. In: de Vries PW, Oinas-Kukkonen H, Siemons L, Beerlage-de Jong N, van Gemert-Pijnen L (eds) Persuasive technology: development and implementation of personalized technologies to change attitudes and behaviors. Springer, pp. 55–67

Shank DB, Wright D, Nasrin S, White M (2022) Discontinuance and restricted acceptance to reduce worry after unwanted incidents with smart home technology. Int J Hum–Comput Interact 1–14. https://doi.org/10.1080/10447318.2022.2085406

Shin C, Chandok P, Liu R, Nielson SJ, Leschke TR (2018) Potential forensic analysis of IoT data: an overview of the state-of-the-art and future possibilities. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 IEEE International Conference on Internet of Things (IThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). pp. 705–710

Singh V, Verma S, Chaurasia SS (2020) Mapping the themes and intellectual structure of corporate university: co-citation and cluster analyses. Scientometrics 122(3):1275–1302. https://doi.org/10.1007/s11192-019-03328-0

Solorio JA, Garcia-Bravo JM, Newell BA (2018) Voice activated semi-autonomous vehicle using off the shelf home automation hardware. IEEE Internet Things J 5(6):5046–5054. https://doi.org/10.1109/JIOT.2018.2854591

Souden M, Liu Z (2009) Optimal joint linear acoustic echo cancelation and blind source separation in the presence of loudspeaker nonlinearity. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2009 IEEE international conference on multimedia and expo. pp. 117–120

Srikanth S, Saddamhussain SK, Siva Prasad P (2019) Home anti-theft powered by Alexa. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN). pp. 1–6

Stefanidi Z, Leonidis A, Antona M (2019) A multi-stage approach to facilitate interaction with intelligent environments via natural language. In: Stephanidis C, Antona M (eds) HCI International 2019—Late Breaking Posters, vol 1088. Springer, pp. 67–77

Struckell E, Ojha D, Patel PC, Dhir A (2021) Ecological determinants of smart home ecosystems: A coopetition framework. Technol Forecast Soc Change 173:121147. https://doi.org/10.1016/j.techfore.2021.121147

Sudharsan B, Corcoran P, Ali MI (2019) Smart speaker design and implementation with biometric authentication and advanced voice interaction capability. In: Curry E, Keane M, Ojo A, Salwala D (eds), Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, NUI Galway, vol 2563. pp. 305–316

Tao F, Liu G, Zhao Q (2018) An ensemble framework of voice-based emotion recognition system. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). pp. 1–6

Thapliyal H, Ratajczak N, Wendroth O, Labrado C (2018) Amazon Echo enabled IoT home security system for smart home environment. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE International Symposium on Smart Electronic Systems (ISES) (Formerly INiS). pp. 31–36

Tielman ML, Neerincx MA, Bidarra R, Kybartas B, Brinkman W-P (2017) A therapy system for post-traumatic stress disorder using a virtual agent and virtual storytelling to reconstruct traumatic memories. Journal of Medical Systems 41(8):125. https://doi.org/10.1007/s10916-017-0771-y

Tironi A, Mainetti R, Pezzera M, Borghese AN (2019) An empathic virtual caregiver for assistance in exer-game-based rehabilitation therapies. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE 7th International Conference on Serious Games and Applications for Health (SeGAH). pp. 1–6

Trenholm R (2016) Amazon Echo (and Alexa) arrive in Europe, and Echo comes in white now too. CNET. https://www.cnet.com/news/amazon-echo-and-alexa-arrives-in-europe/

Tsiourti C, Weiss A, Wac K, Vincze M (2019) Multimodal integration of emotional signals from voice, body, and context: effects of (in)congruence on emotion recognition and attitudes towards robots. Int J Soc Robot 11(4):555–573. https://doi.org/10.1007/s12369-019-00524-z

Tsiourti C, Quintas J, Ben-Moussa M, Hanke S, Nijdam NA, Konstantas D (2018a) The CaMeLi framework—a multimodal virtual companion for older adults. In: Kapoor S, Bhatia R, Bi Y (eds) Studies in computational intelligence, vol 751. Springer, pp. 196–217

Tsiourti C, Ben-Moussa M, Quintas J, Loke B, Jochem I, Lopes JA, Konstantas D (2018b) A virtual assistive companion for older adults: design implications for a real-world application. In: Sharma H, Shrivastava V, Bharti KK, Wang L (eds), Lecture notes in networks and systems, vol 15. Springer, pp. 1014-1033

Tung L (2018) Amazon Echo, Google Home: how Europe fell in love with smart speakers. ZDnet. https://www.zdnet.com/article/amazon-echo-google-home-how-europe-fell-in-love-with-smart-speakers

Turner-Lee N (2019) Can emerging technologies buffer the cost of in-home care in rural America? Generations 43(2):88–93. http://web.a.ebscohost.com/ehost/pdfviewer/pdfviewer?vid=2&sid=0aaaf704-d3bd-42ab-ad26-ecd36c0a059b%40sdc-v-sessmgr02

Vaca K, Gajjar A, Yang X (2019) Real-time automatic music transcription (AMT) with Zync FPGA. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). pp. 378–384

Van Eck NJ, Waltman L (2014) Visualizing bibliometric networks. In: Ding Y, Roussea R, Wolfram D (eds) Measuring scholarly impact: methods and practice. Springer, pp. 285–320

Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2):523–538

Verma S, Gustafsson A (2020) Investigating the emerging COVID-19 research trends in the field of business and management: a bibliometric analysis approach. J Bus Res 118:253–261

Verma S (2017) The adoption of big data services by manufacturing firms: an empirical investigation in India. J Inf Syst Technol Manag 14(1):39–68

Vishwakarma SK, Upadhyaya P, Kumari B, Mishra AK (2019) Smart energy efficient home automation system using IoT. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International Conference on Internet of Things: smart Innovation and Usages (IoT-SIU). pp. 1–4

Vora J, Tanwar S, Tyagi S, Kumar N, Rodrigues JJPC (2017) Home-based exercise system for patients using IoT enabled smart speaker. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). pp. 1–6

Wakefield CC (2019) Achieving position 0: optimising your content to rank in Google’s answer box. J Brand Strategy 7(4):326–336

Wallace T, Morris J (2018) Identifying barriers to usability: smart speaker testing by military veterans with mild brain injury and PTSD. In: Langdon P, Lazar J, Heylighen A, Dong H (eds) Breaking down barriers. Springer, pp. 113–122

Xi N, Hamari J (2021) Shopping in virtual reality: a literature review and future agenda. J Bus Res 134:37–58. https://doi.org/10.1016/j.jbusres.2021.04.075

Yaghoubzadeh R, Pitsch K, Kopp S (2015) Adaptive grounding and dialogue management for autonomous conversational assistants for elderly users. In: Brinkman W-P, Broekens J, Heylen D (eds) Intelligent virtual agents, vol 9238. Springer, pp. 28–38

Yildirim İ, Bostancı E, Güzel MS (2019) Forensic analysis with anti-forensic case studies on Amazon Alexa and Google Assistant build-in smart home speakers. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International conference on computer science and engineering (UBMK). pp. 271–273

Yusri MM, Kasim S, Hassan R, Abdullah Z, Ruslai H, Jahidin K, Arshad MS (2017) Smart mirror for smart life. In: Institute of Electrical and Electronics Engineers (ed), 2017 6th ICT International Student Project Conference (ICT-ISPC) 2017 6th ICT International Student Project Conference (ICT-ISPC). pp. 1–5

Zschörnig T, Wehlitz R, Franczyk B (2019) A fog-enabled smart home analytics platform. In: Brodsky A, Hammoudi S, Filipe J, Smialek M (eds) Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), vol 1. SciTePress, pp. 604–610

Zuboff S (2019) The age of surveillance capitalism: the fight for a human future at the new frontier of power. Profile Books

Harwood S, Eaves S (2020) Conceptualising technology, its development and future: The six genres of technology. Technol Forecast Soc Change 160:120174

Stadler S, Riegler S, Hinterkörner S (2012) Bzzzt: When mobile phones feel at home. Conference on Human Factors in Computing Systems – Proceedings, 1297-1302. https://doi.org/10.1145/2212776.2212443

Download references

Acknowledgements

This research was funded by the Swiss National Science Foundation (SNSF) as part of the project “VA-People, Experiences, Practices and Routines” (VA-PEPR) (Grant Nr. CRSII5_189955). We are grateful for the support from the wider project team from Lucerne University of Applied Sciences and Arts, Eastern Switzerland University of Applied Sciences, and Northumbria University. We would also like to thank Bjørn S. Cience for his support while working on this paper.

Author information

Authors and affiliations.

Lucerne School of Information Technology and Computer Sciences, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland

Bettina Minder

Department of Business & Management, University of Southern Denmark, Odense, Denmark

Patricia Wolf & Surabhi Verma

Department of Management, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland

Patricia Wolf

Institute for Information and Process Management, Eastern Switzerland University of Applied Sciences, St.Gallen, Switzerland

Matthias Baldauf

Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark

Surabhi Verma

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Patricia Wolf .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

The article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Minder, B., Wolf, P., Baldauf, M. et al. Voice assistants in private households: a conceptual framework for future research in an interdisciplinary field. Humanit Soc Sci Commun 10 , 173 (2023). https://doi.org/10.1057/s41599-023-01615-z

Download citation

Received : 19 May 2022

Accepted : 14 March 2023

Published : 19 April 2023

DOI : https://doi.org/10.1057/s41599-023-01615-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

voice assistant research paper

Voice Assistants - Research Landscape

  • Conference paper
  • First Online: 30 March 2024
  • Cite this conference paper

voice assistant research paper

  • Alaa Almirabi   ORCID: orcid.org/0000-0002-2932-9497 10 ,
  • Nikolay Mehandjiev   ORCID: orcid.org/0000-0003-2229-4268 10 &
  • Panagiotis Sarantopoulos   ORCID: orcid.org/0000-0001-9666-6859 10  

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 501))

Included in the following conference series:

  • European, Mediterranean, and Middle Eastern Conference on Information Systems

176 Accesses

AI-powered Voice Assistants (VAs) emerge as attractive facilitators of the increasing interactions between people and machines in the Metaverse. VAs are still at early stages of adoption, so this systematic literature review charts the landscape of existing VA research with a focus on their use in different context. This helps us identify important aspects that require further examination. The findings indicate that while academic interest in this novel subject is increasing, the literature regarding the factors that drive continuous use does not take into account VAs’ intelligence. Indeed, users perceive such devices as collaborative sentient actors rather than as tools. Taking these perceptions into consideration can help to increase the engagement of users with VAs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Aeschlimann, S., Bleiker, M., Wechner, M., Gampe, A.: Communicative and social consequences of interactions with voice assistants. Comput. Hum. Behav. 112 , 106466 (2020)

Article   Google Scholar  

Ahmed, M., Kwak, I., Huh, J., Kim, I., Oh, T., Kim, H.: Void: a fast and light voice liveness detection system. In 29th USENIX Security Symposium (USENIX Security 20), pp. 2685–2702 (2020)

Google Scholar  

Almirabi, A., Chesney, T.: The effectiveness of the interaction and trust on the intelligent digital assistants usage. In: Proceedings of 140th IASTEM International Conference, London, UK (2018)

Amugongo, L.M.: Understanding what Africans say. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2018)

Araujo, T.: Living up to the chatbot hype: the influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Comput. Hum. Behav. 85 , 183–189 (2018)

Baier, D., Rese, A., Röglinger, M., Baier, D., Rese, A., Röglinger, M.: Conversational user interfaces for online shops? A categorization of use cases. In: ICIS (2018)

Beirl, D., Rogers, Y., Yuill, N.: Using voice assistant skills in family life. In: Computer-Supported Collaborative Learning Conference, CSCL, vol. 1, pp. 96–103. International Society of the Learning Sciences, Inc. (2019)

Bhasin, A., Mathur, G., Yenigalla, P., Natarajan, B.: Phoneme based domain prediction for language model adaptation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2020)

Braun, M., Mainz, A., Chadowitz, R., Pfleging, B., Alt, F.: At your service: designing voice assistant personalities to improve automotive user interfaces. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–11 (2019)

Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3 (2), 77–101 (2006)

Burtsev, M., et al.: Deeppavlov: open-source library for dialogue systems. In: Proceedings of ACL 2018, System Demonstrations, pp. 122–127 (2018)

Chattaraman, V., Kwon, W., Gilbert, J.E., Ross, K.: Should AI-based, conversational digital assistants employ social-or task-oriented interaction style? A task-competency and reciprocity perspective for older adults. Comput. Hum. Behav. 90 , 315–330 (2019)

Cheng, P., Bagci, I.E., Yan, J., Roedig, U.: Smart speaker privacy control-acoustic tagging for personal voice assistants. In: 2019 IEEE Security&Privacy Workshops (SPW), pp. 144–149. IEEE (2019)

Chi, O.H., Denton, G., Gursoy, D.: Artificially intelligent device use in service delivery: a systematic review, synthesis, and research agenda. J. Hosp. Market. Manag. 29 (7), 757–786 (2020)

Cho, E.: Hey Google, can I ask you something in private?. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–9 (2019)

Choi, D., Kwak, D., Cho, M., Lee, S.: “Nobody speaks that fast!” An empirical study of speech rate in conversational agents for people with vision impairments. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2020)

Davis, F.D., Bagozzi, R., Warshaw, P.: User acceptance of computer technology: a comparison of two theoretical models. Manage. Sci. 35 (8), 982–1003 (1989)

Ernst, C. P. H., Malzahn, B.: If at first you don’t succeed, try, try again’might not always make sense: on the influence of past technology category satisfaction on technology usage. In: Proceedings of AMCIS 2019 (2018)

Feng, H., Fawaz, K., Shin, K.G.: Continuous authentication for voice assistants. In: Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking (2017)

Fernandes, T., Oliveira, E.: Understanding consumers’ acceptance of automated technologies in service encounters: Drivers of digital voice assistants adoption. J. Bus. Res. 122 (2021)

Gamzu, I., Haikin, M., Halabi, N.: Query rewriting for voice shopping null queries. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1369–1378 (2020)

Gao, C., Chandrasekaran, V., Fawaz, K., Banerjee, S.: Traversing the quagmire that is privacy in your smart home. In: Proceedings of the 2018 Workshop on IoT Security and Privacy (2018)

Hettiachchi, D., et al.: “Hi! I am the crowd tasker” crowdsourcing through digital voice assistants. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2020)

Hildebrand, C., Efthymiou, F., Busquet, F., Hampton, W.H., Hoffman, D.L., Novak, T.P.: Voice analytics in business research: conceptual foundations, acoustic feature extraction, and applications. J. Bus. Res. 121 , 364–374 (2020)

Horstmann, A.C., Bock, N., Linhuber, E., Szczuka, J.M., Straßmann, C., Krämer, N.C.: Do a robot’s social skills and its objection discourage interactants from switching the robot off? PLoS ONE 13 (7), e0201581 (2018)

Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 1, p. I (2006)

Hwang, G., Lee, J., Oh, C.Y., Lee, J.: It sounds like a woman: exploring gender stereotypes in South Korean voice assistants. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2019)

Juniper Research. Voice Assistants used in smart homes to grow 1000%, reaching 275 million by 2023, as Alexa leads the way (2018). https://www.juniperresearch.com/press/press-releases/voice-assistants-used-in-smart-homes

Kim, J., Kim, W., Nam, J., Song, H.: “ I can feel your empathic voice”: effects of nonverbal vocal cues in voice user interface. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–8 (2020)

Kim, K., de Melo, C.M., Norouzi, N., Bruder, G., Welch, G.F.: Reducing task load with an embodied intelligent virtual assistant for improved performance in collaborative decision making. In: 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 529–538. IEEE (2020)

Kowalski, J., et al.: Older adults and voice interaction: a pilot study with Google Home. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (2019)

Kwak, I.Y., Huh, J.H., Han, S.T., Kim, I., Yoon, J.: Voice presentation attack detection through text-converted voice command analysis. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2019)

Lahoual, D., Frejus, M.: When users assist the voice assistants: from supervision to failure resolution. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–8 (2019)

Lee, K., Lee, K.Y., Sheehan, L.: Hey Alexa! A magic spell of social glue?: sharing a smart voice assistant speaker and its impact on users’ perception of group harmony. Inf. Syst. Front. 22 , 563–583 (2020)

Lin, Q., Sun, X., Chen, X., Shi, S.: Effect of pretreatment on microstructure and mechanical properties of Nafion™ XL composite membrane. Fuel Cells 19 (5), 530–538 (2019)

Mayer, S., Laput, G., Harrison, C.: Enhancing mobile voice assistants with worldgaze. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–10 (2020)

McLean, G., Osei-Frimpong, K.: Hey Alexa… examine the variables influencing the use of AI in-home voice assistants. Comput. Hum. Behav. 99 , 28–37 (2019)

Mitev, R., Miettinen, M., Sadeghi, A.R.: Alexa lied to me: skill-based man-in-the-middle attacks on virtual assistants. In: Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pp. 465–478 (2019)

Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., Prisma Group: Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int. J. Surg. 8 (5), 336–341 (2010)

Moriuchi, E.: Okay, Google!: An empirical study on voice assistants on consumer engagement and loyalty. Psychol. Mark. 36 (5), 489–501 (2019)

Moriuchi, E.: An empirical study on anthropomorphism and engagement with disembodied AIs and consumers’ re-use behavior. Psychol. Mark. 38 (1), 21–42 (2021)

Moussawi, S., Koufaris, M., Benbunan-Fich, R.: How perceptions of intelligence and anthropomorphism affect adoption of personal intelligent agents. Electron. Mark. 31 , 343–364 (2021)

Nasirian, F., Ahmadian, M., Lee, O.K.D.: AI-based voice assistant systems: evaluating from the interaction and trust perspectives (2017)

Ning, Y., et al.: Multi-task deep learning for user intention understanding in speech interaction systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)

Norval, C., Singh, J.: Explaining automated environments: Interrogating scripts, logs, and provenance using voice-assistants. In: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, pp. 332–335 (2019)

Pal, D., Arpnikanondt, C., Funilkul, S., Chutimaskul, W.: The adoption analysis of voice-based smart IoT products. IEEE Internet Things J. 7 (11), 10852–10867 (2020)

Parviainen, E., Søndergaard, M.L.J.: Experiential qualities of whispering with voice assistants. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2020)

Purington, A., Taft, J.G., Sannon, S., Bazarova, N.N., Taylor, S.H.: “Alexa is my new BFF” social roles, user satisfaction, and personification of the Amazon Echo. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2853–2859 (2017)

Raveh, E., Siegert, I., Steiner, I., Gessinger, I., Möbius, B.: Three’s a crowd? Effects of a second human on vocal accommodation with a voice assistant. In: INTERSPEECH, pp. 4005–4009 (2019)

Rokeach, M.: The Nature of Human Values. Free Press (1973)

Rongali, S., Soldaini, L., Monti, E., Hamza, W.: Don’t parse, generate! A sequence to sequence architecture for task-oriented semantic parsing. In: Proceedings of the Web Conference 2020, pp. 2962–2968 (2020)

Rzepka, C.: Examining the use of voice assistants: a value-focused thinking approach. In: Twenty-fifth Americas Conference on Information Systems, Cancun (2019)

Seymour, W.: Privacy therapy with aretha: what if your firewall could talk?. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2019)

Shen, S., Chen, D., Wei, Y.L., Yang, Z., Choudhury, R.R.: Voice localization using nearby wall reflections. In: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pp. 1–14 (2020)

Skidmore, L., Moore, R.K.: Using Alexa for flashcard-based learning. In: Proceedings of Interspeech 2019, pp. 1846–1850. ISCA (2019)

Storer, K.M., Judge, T.K., Branham, S.M.: “All in the same boat”: tradeoffs of voice assistant ownership for mixed-visual-ability families. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2020)

Trajkova, M., Martin-Hammond, A.: “Alexa is a toy”: exploring older adults’ reasons for using, limiting, and abandoning echo. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2020)

Vaidya, T., Sherr, M.: You talk too much: limiting privacy exposure via voice input. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 84–91. IEEE (2019)

Venkatesh, V., Thong, J. Y., Xu, X.: Consumer acceptance and use of information technology: extending the unified theory of acceptance and use of technology. MIS Q. 157–178 (2012)

Vessey, I., Ramesh, V., Glass, R.L.: Research in information systems: an empirical study of diversity in the discipline and its journals. J. Manage. Inf. Syst. 19 (2), 129–174 (2002)

Wagner, K., Schramm-Klein, H.: Alexa, Are You Human? Investigating Anthropomorphism of Digital Voice Assistants-A Qualitative Approach. In: ICIS (2019)

Wang, C., Anand, S.A., Liu, J., Walker, P., Chen, Y., Saxena, N.: Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 42–56 (2019)

Zhang, J., Zhang, B., Zhang, B.: Defending adversarial attacks on cloud-aided automatic speech recognition systems. In: Proceedings of the Seventh International Workshop on Security in Cloud Computing, pp. 23–31 (2019)

Zhang, N., Mi, X., Feng, X., Wang, X., Tian, Y., Qian, F.: Dangerous skills: understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 1381–1396. IEEE (2019)

Zhao, J., Rau, P.L.P.: Merging and synchronizing corporate and personal voice agents: comparison of voice agents acting as a secretary and a housekeeper. Comput. Hum. Behav. 108 , 106334 (2020)

Zhao, S., et al.: Raise to speak: an accurate, low-power detector for activating voice assistants on smartwatches. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2736–2744 (2019)

Zhou, S., Jia, J., Wang, Q., Dong, Y., Yin, Y., Lei, K.: Inferring emotion from conversational voice data: a semi-supervised multi-path generative neural network approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)

Download references

Author information

Authors and affiliations.

Alliance Manchester Business School, The University of Manchester, Booth Street West, Manchester, M13 0PB, UK

Alaa Almirabi, Nikolay Mehandjiev & Panagiotis Sarantopoulos

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Alaa Almirabi .

Editor information

Editors and affiliations.

British University in Dubai, Dubai, United Arab Emirates

Maria Papadaki

University of Nicosia, Nicosia, Cyprus

Marinos Themistocleous

Khalid Al Marri

Dubai Blockchain Center, Dubai, United Arab Emirates

Marwan Al Zarouni

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Almirabi, A., Mehandjiev, N., Sarantopoulos, P. (2024). Voice Assistants - Research Landscape. In: Papadaki, M., Themistocleous, M., Al Marri, K., Al Zarouni, M. (eds) Information Systems. EMCIS 2023. Lecture Notes in Business Information Processing, vol 501. Springer, Cham. https://doi.org/10.1007/978-3-031-56478-9_2

Download citation

DOI : https://doi.org/10.1007/978-3-031-56478-9_2

Published : 30 March 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-56477-2

Online ISBN : 978-3-031-56478-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

ORIGINAL RESEARCH article

The trustworthiness of voice assistants in the context of healthcare investigating the effect of perceived expertise on the trustworthiness of voice assistants, providers, data receivers, and automatic speech recognition.

Carolin Wienrich

  • Julius Maximilian University of Würzburg, Würzburg, Germany

As an emerging market for voice assistants (VA), the healthcare sector imposes increasing requirements on the users’ trust in the technological system. To encourage patients to reveal sensitive data requires patients to trust in the technological counterpart. In an experimental laboratory study, participants were presented a VA, which was introduced as either a “specialist” or a “generalist” tool for sexual health. In both conditions, the VA asked the exact same health-related questions. Afterwards, participants assessed the trustworthiness of the tool and further source layers (provider, platform provider, automatic speech recognition in general, data receiver) and reported individual characteristics (disposition to trust and disclose sexual information). Results revealed that perceiving the VA as a specialist resulted in higher trustworthiness of the VA and of the provider, the platform provider and automatic speech recognition in general. Furthermore, the provider’s trustworthiness affected the perceived trustworthiness of the VA. Presenting both a theoretical line of reasoning and empirical data, the study points out the importance of the users’ perspective on the assistant. In sum, this paper argues for further analyses of trustworthiness in voice-based systems and its effects on the usage behavior as well as the impact on responsible design of future technology.

Introduction

Voice-based artificial intelligence systems serving as digital assistants have evolved dramatically within the last few years. Today, Amazon Echo or Google Home is the most popular representatives of the fastest-growing consumer technology ( Hernandez, 2021 ; Meticulous Market Research, 2021 ). On the one hand, voice assistants (VAs) engage human users in direct conversation through a natural language interface leading to promising applications for the healthcare sector, such as diagnosis and therapy. On the other hand, their constituting features to recognize, process, and produce human language results in this technology to resemble human-human interaction. Attributing some kind of humanness to technology arouses (implicit) assumptions about the technological devices and affects the user’s perception and operation of the device. The media equation approach postulates that the social rules and dynamics guiding human-human interaction similarly apply to human-computer interaction (Reeves and Nass, 1996). Using voice assistants in official application areas involving sensitive data such as medical diagnoses draws attention to the concept of trust: if patients were to reveal personal, sensitive information to the voice-based systems, they would need to trust them. Consequently, questions of the systems’ trustworthiness arise asking for features of voice assistants, which might affect the patients’ willingness to trust them in a medical context. Results stemming from studies investigating trust in human-human interactions revealed that ascribed expertise is a crucial cue of trust ( Cacioppo and Petty, 1986 ; Chaiken, 1987 ; Chaiken and Maheswaran, 1994 ). Reeves and Nass (1996) transferred the analysis of expertise and trust to human-technology interactions. They showed that designating devices (here: a television program) as “specialized” results in more positive evaluations of the content they presented. Many other studies replicated their approach and framed a technological device or a technological agent as a specialist showing that users ascribed a certain level of expertise and evaluated it (implicitly) as more trustworthy ( Koh and Sundar, 2010 ; Kim, 2014 , Kim, 2016 ; Liew and Tan, 2018 ).

Voice assistants gain in importance in healthcare contexts offering promising contributions in the area of medical diagnosis, for instance. However, both the analysis and the understanding of the psychological processes characterizing the patient-voice assistant interaction are still in their early stages. Similarly, the effects of the assistant’s design on the perception of expertise and the evaluation of trust are still in their infancy. Thus, the present paper addressed the following research question: How does framing a voice assistant as a specialist affect the user’s perception of its expertise and its trustworthiness?

To gain first insights into the process of patients’ perception of expertise of voice-based systems and their willingness to trust in them, a laboratory study was conducted in which participants interacted with a voice assistant. The assistant was introduced as a diagnostic tool for sexual health, which asked a list of questions about sexual behavior, sexual health, and sexual orientation to determine the diagnosis. In a first step, and in accordance with the approach of Reeves and Nass (1996) , we manipulated the level of expertise of the voice assistant, which introduced itself as either a “specialist” or a “generalist”. In line with established approaches investigating the trustworthiness of technology (e.g., McKnight et al., 1998 ; Söllner et al., 2012 ), we compared the participants’ perceived trustworthiness of the “specialist” vs. the “generalist” VA. Additionally, we compared the assessments of further source layers of trustworthiness, namely of the platform provider, the provider of the tool, the data receiver, and of automatic speech recognition in general. Moreover, to account for additional explanatory value of interindividual differences in the trustworthiness ratings, we asked for participants’ dispositions and characteristics such as their disposition to trust and their tendency to disclose sexual information about themselves. Finally, we analyzed the different source layers of trustworthiness to predict the trustworthiness of the VA based on the trustworthiness of the other source layers. In sum, the present paper showed for the first time that a short written introduction and a “spoken” introduction presented by the VA itself were sufficient to affect the users’ perception and their trust in the system significantly. Hence it addresses a human-centered approach to voice assistants to show that small design decisions determine user’s trust in VA in a safety-critical application field.

Related Work

Voice assistants in healthcare.

While voice-based artificial intelligence systems have increased in popularity over the last years, their spectrum of functions, their field of applications, and their technological sophistication have not been fully revealed but are still in their early stages. Today’s most popular systems—Amazon Echo (AI technology: Amazon Alexa) or Google Home (AI technology: Google Assistant)—presage a variety of potential usage scenarios. However, according to usage statistics, in a private environment, voice assistants are predominantly used for relatively trivial activities such as collecting information, listening to music, or sending messages or calls ( idealo, 2020 ). Beyond private usage scenarios, voice assistants are applied in professional environments such as industrial production or technical service (e.g., Baumeister et al., 2019 ), voice marketing, or internal process optimization ( Hörner, 2019 ). In particular, the healthcare sector has been referred to as an emerging market for voice-based technology. More and more use cases emerge in the context of medicine, diagnosis, and therapy ( The Medical Futurist, 2020 ) with voice assistants offering promising features in the area of anamnesis. Particularly the possibility to assess data remotely gains in importance these days. Recently, chatbots were employed to collect the patients’ data, their medical conditions, their symptoms, or a disease process ( ePharmaINSIDER, 2018 ; The Medical Futurist, 2020 ). While some products provide only information (e.g., OneRemission), others track health data (e.g., Babylon Health) or check symptoms and make a diagnosis (e.g., Infermedica). Until today, only a few solutions have integrated speech recognition or direct connection to VA, such as Alexa via skills (e.g., Sensely, Ada Health, GYANT). The German company ignimed UG ( https://ignimed.de/ ) takes a similar approach: based on artificial intelligence, the patient’s information is collected and transmitted to the attending physician, who can work with the patient. Although these voice assistants are used for similar purposes, which all require user to trust the system, users’ perceived trustworthiness of voice assistants in healthcare has not been investigated yet.

The medical context imposes different requirements on the system than private usage scenarios do. Data revealed here are more personal and more sensitive, resulting in increasing requirements regarding the system’s security and trustworthiness. Consequently, besides focusing on technological improvements of the system, its security or corresponding algorithms, research needs to focus on the patients’ perception of the system and their willingness to interact with them in a health-related context. Exceeding the question of which gestalt design impacts both usability and user experience, the field of human-computer interaction needs to ask for features affecting the patients’ perceived trustworthiness of the technological counterpart they interact with. One promising approach is analyzing and transferring findings from human-human interactions to human-computer or human-voice assistant interactions. Following the media equation approach of Nass and colleagues, this study postulates similarities between the human counterpart and the technological counterpart, which results in psychological research to be a fruitful source of knowledge and inspiration for the empirically based design of voice assistants in a medical context.

Interpersonal Trust: The Role of Expertise

Interpersonal interactions are characterized by uncertainty and risks since the behavior of the interaction partner is unpredictable—at least to a certain extent. Trust defines the intention to take the risks of interaction by reducing the perceived uncertainty and facilitating the willingness to interact with each other ( Endreß, 2010 ). In communication contexts, trust refers to the listener’s degree of confidence in, and level of acceptance of, the speaker and the message ( Ohanian, 1990 ). Briefly spoken, trust in communication refers to the listener’s trust in the speaker ( Giffin, 1967 ). According to different models of trust, characteristics of both the trustor (the person who gives the trust) and the trustee (the person who receives the trust) determine the level of trust (e.g., Mayer et al., 1995 ). The dimensions of competence, benevolence, and integrity describe the trustee’s main characteristics (see, for example, the meta-analysis of McKnight et al. 2002 ). The perceived trustworthiness of trustees increases with increasing perceived competence, benevolence, or integrity. In communication contexts, the term source credibility closely refers to the trustor’s perceived trustworthiness in terms of the trustee. It refers to the speaker’s positive characteristics that affect the listener’s acceptance of a message. The source-credibility model and the source-attractiveness model concluded that three factors, namely, expertness, trustworthiness, and attractiveness, underscore the concept of source credibility ( Hovland et al., 1953 ). In this context, expertise is also referred to as authoritativeness, competence, expertness, qualification, or being trained, informed, and educated ( Ohanian, 1990 ). In experiments, the perceived expertise of speakers was manipulated by labeling them as “Dr.” vs. “Mr.” or as “specialist” vs. “generalist” (e.g., Crisci and Kassinove, 1973 ). The labels served as cues that can bias the perception of the competence, benevolence, or integrity of trustees or communicators and the perception of trust.

When comprehending the underlying effects of information processing, well-established models of persuasion reveal two routes of processing—the heuristic (peripheral) route and the systematic (central) route (e.g., the heuristic–systematic model, HSM by Chaiken (1987) ; the elaboration likelihood model, ELM by Cacioppo and Petty (1986) . The heuristic (peripheral) route is based on judgment-relevant cues (e.g., source’s expertise) and needs less cognitive ability and capacity than the systematic (central) route, which is based on judgment-relevant information (e.g., message content). Typically, individuals will prefer the heuristic route as the more parsimonious route of processing if they trust the source, particularly if cues activate one of the three trustworthiness dimensions ( Koh and Sundar, 2010 ). For example, individuals will perceive more trustworthiness when a person is labeled as a “specialist” compared to a “generalist” since a specialist sends more cues of expertise and activates the dimension of competence ( Chaiken, 1987 ; Chaiken and Maheswaran, 1994 ). Remarkably, the effect will endure even if both the specialist and the generalist possess objectively the same level of competence or expertise. Consequently, individuals interacting with a specialist are more likely to engage in heuristic processing and implicitly trust the communicator ( Koh and Sundar, 2010 ).

Regarding the resulting level of trust, the trustor’s characteristics were found to moderate the impact of the trustee’s characteristics. First, the perceived level of expertise depends on the interindividual differences in the processing of information. The outlined indicators of trust need to be noticed and correctly interpreted to have an effect. Furthermore, the individual’s personality and experiences were shown to influence the perception of trustworthiness. Finally, an individual’s disposition to trust as the propensity to trust other people has been shown to be a significant predictor ( Mayer et al., 1995 ; McKnight et al., 2002 ).

To summarize, research in various areas revealed that perceived expertise affects the trustee’s trustworthiness (e.g., commercial: Eisend, 2006 , health: Gore and Madhavan, 1993 ; Kareklas et al., 2015 ), general review see Pornpitakpan (2004) . This perception is also affected by the trustor’s characteristics (e.g., the disposition to trust). With the digital revolution proceeding, technology has become increasingly interactive, assembling human-human interaction to an increasing extent. Today, an individual does not only interact with other human beings but also with technological devices. These new ways of human-technology interaction require the individual to trust in technological counterparts. Consequently, the question arises whether the outlined mechanisms of trust can be transferred to non-human technological counterparts.

Trust in Technology: The Role of Expertise

The media equation approach postulates that social rules and dynamics, which guide human-human interaction apply to human-computer interaction similarly ( Reeves and Nass, 1996 ; Nass and Moon, 2000 ). To investigate the media equation assumptions, Nass and Moon (2000) established the CASA paradigm (i.e., computer as social actors) and adopted well-established approaches from research on human-human interaction to the analysis of human-computer interactions. Many experimental studies applying the CASA paradigm demonstrated that individuals tend to transfer social norms to computer agents, for example, gender and ethnic stereotypes and rules of politeness and reciprocity ( Nass and Moon, 2000 ). More specific in the context of trust, experimental studies and imagine-based approaches revealed that trust-related situations activate the same brain regions regardless of whether the counterpart is a human being or a technological agent ( Venkatraman et al., 2015 ; Riedl et al., 2013 ). Consequently, researchers concluded that there are similar basic effects elicited by human and technological trustees ( Bär, 2014 ).

However, when interacting with a non-human partner, the trustee’s entity introduces several interwoven levels of trustworthiness, referred to as source layers in the following ( Koh and Sundar, 2010 ). The trustors can trust the technical device or system (e.g., VA) itself. Moreover, they could also refer to the provider, the domain, or the human being “behind” the system, such as the person who receives the information ( Hoff and Bashir, 2015 ). Similar to interpersonal trust, three dimensions determine the perceived trustworthiness of technology: performance (analogous to human competence), clarity (analogous to human benevolence), and transparency (analogous to human integrity) ( Backhaus, 2017 ). As known from interpersonal trust, credibility factors bias the perception of the expertise ascribed to the technology. For example, in the study by Reeves and Nass (1996) , participants watched and evaluated a news or a comedy television program. In the study, they were assigned to one of two conditions: the “specialist television” or the “generalist television”. The conditions differed regarding the instruction presented by the experimenter, who referred to the television as either the “news TV” or the “entertainment TV” (specialist condition) or to “usual TV” (generalist condition). Findings indicated that individuals evaluated the content presented by the specialist TV set as more positive than the content of the generalist TV set–even though the content was completely identical. The results have been replicated in the context of specialist/generalist television channels (e.g., Leshner et al., 1998 ), smartphones ( Kim, 2014 ), or embodied avatars ( Liew and Tan, 2018 ). Additionally, in the context of e-health, the level of expertise was shown to affect the perception of trustworthiness (e.g., Bates et al., 2006 ). Koh and Sundar (2010) explored the psychological effects of expertise (here: specialization) of web-based media technology in the context of e-commerce. They distinguished multiple indicators or sources of trustworthiness (i.e., computer, website, web agent), they referred to as “source layers of web-based mass communication”. In their study, they analyzed the effects on individuals’ perceptions of expertise and trust distinguishing between these source layers. In their experiment, participants interacted with media technology (i.e., computer website, web agent), which was either labeled as specialist (“wine computer”, “wine shop” or “wine agent”) and generalist (“computer”, “e-shop"” or “e agent”). Again, only the label but not the content differed between the two experimental conditions. Findings supported the positive effects of the specialization label. Participants reported greater levels of trust in specialist media technology compared to generalist media technology with the “specialized” web agent eliciting the strongest effects (compared to “specialized” website or computer). Consequently, this study focusses on the multiple indicators or the multiple source layers contributing to the trustworthiness of a complex technological system. According to which source layer is manipulated the users’ assessment of trustworthiness might differ fundamentally.

To summarize, research so far focused mostly on the credibility of online sources (e.g., websites), neglecting other technological agents like voice-based agents, which currently capture the market in the form of voice bots, voice virtual assistants or smart speaker skills. Furthermore, research so far focused on the engineering progress resulting in increasingly improved performances of the systems but tends to neglect the human user, who will interact with the system. As outlined above, in usage scenarios involving sensitive data, the human users’ trust in the technological system is a fundamental requirement and a necessary condition of the user opening up to the system. Voice-based systems in a healthcare context need to be perceived as trustworthy agents to get a patient to disclose personal information. However, scientific studies so far reveal a lack of detailed and psychologically arguing analyses and empirical studies investigating the perceived trustworthiness of voice assistants. The present study aims for first insights into the users’ perception of the trustworthiness of voice assistants in the context of healthcare raising the following research questions: 1) Does the introduction of a voice assistant as an expert increase its trustworthiness in the context of healthcare? 2) Do the users’ individual dispositions influence the perceived trustworthiness of the assistant? 3) How do the levels of perceived trustworthiness of the multiple source layers (e.g., assistant tool, provider, data receiver) interact with each other?

Outline of the Present Study

To answer the research questions, a laboratory study was conducted. Participants interacted with the Amazon Echo Dot , Amazon’s voice assistant referred to as “the tool” in the following. The VA was introduced to the participants as an “anamnesis tool for sexual health and disorders”, which would ask questions about the participants’ sexual behavior, their sexual health, and their sexual orientation. Following the approach of Nass and Reeves (1996), participants were randomly assigned to one out of two groups, which differed by one single aspect: the labeling of the VA. Participants received a written instruction in which the VA was either referred to as a “specialist” or a “generalist”. Furthermore, at the beginning of the interaction, the VA introduced itself as either a “specialist” or a “generalist”. In line with studies investigating trust in artificial agents (e.g., McKnight et al., 1998 ; Söllner et al., 2012 ) and studies including multiple sources layers of trustworthiness (e.g., Koh and Sundar, 2010 ), we distinguished between different source layers of perceived trustworthiness of our setting: the perceived trustworthiness of the VA tool itself, the provider of the tool (i.e., a German company), the platform provider (i.e., Amazon), automatic speech recognition in general, and the receiver of the data (i.e., the attending physician). Furthermore, participant’s individual characteristics, i.e., the disposition to trust and the tendency to sexual self-disclosure, were considered.

Participants

The 40 participants (28 females, 12 males; average age = 22.45 years; SD = 3.33) were recruited via personal contact or the university recruitment system offering course credit. All participants were German native speakers. Except for one, all participants were students. 80% of them reported having already interacted with a voice assistant. However, when analyzing the duration of these interactions, the sample’s experience was rather limited: 75% reported to have interacted with a VA for less than 10 h and 45% for less than 2 h in total.

Task, Manipulation, Pre-test of Manipulation and Pre-test of Required Trust

During the experiment, participants interacted with a VA, Amazon Echo Dot (3 rd Generation, black). While the VA asked them questions about their sexual behavior, sexual health, and sexual orientation, participants were instructed to answer these questions as honestly as possible using speech input. Participants were randomly assigned to one of two groups ( n = 20 per group), which only differed regarding the label of the VA. In an introduction text, the VA was introduced as an anamnesis tool, labeled either a “specialist” (using words such as “specialist,” “expert”) or as a “generalist” (e.g., “usual,” “common”). Additionally, the VA introduced itself in two ways. In the “specialist” condition, it referred to itself as a “special tool for sexual anamnesis” and in the general condition as a “general survey tool.”

A pre-study ensured the effect of this manipulation and trust to be a prerequisite of answering the anamneses questions. In an online survey, 30 participants read one of the two introduction texts and described the tool in their own words, afterward. A content analysis of their descriptions showed that participants followed the labeling of the text using compatible keywords to describe the device (“specialist” condition: e.g., special, expert; vs. “generalist” condition: normal, common). However, because only twelve participants used at least one predefined condition-related keyword, the experimental manipulation was strengthened by adding more keywords to the instruction text. The final manipulation text is attached to the additional material. Since the VA’s perceived trustworthiness was the main dependent variable, the second part of the pre-test ensured that answering the sexual health-related questions required trust. All anamnesis questions were presented to the participants, who rated how likely they would answer each question. The scale ranged from 100 (very likely) to 0 (no, too private) with lower scores indicating higher levels of required trust to answer the question. Questions were clustered in four categories: puberty, sexual orientation, diseases/hygiene and sexual activity. Results showed that questions regarding puberty (average rating = 75.78) and sexual orientation (67.01) required less trust than diseases/hygiene (56.47) and sexual activity (50.75). To ensure a minimum standard of required trust, one question of the puberty category was removed. Furthermore, four conditional questions were added to the categories of diseases/hygiene and sexual activity, which would ask for more detailed information if previous questions were answered with “yes”).

To assess the perceived trustworthiness of the tool, different source layers were considered. First, the trustworthiness of the tool provider, German company, ignimed UG , had to be evaluated. Second, since the VA tool was connected to Amazon Echo Dot, the trustworthiness of the platform provider (Amazon) was assessed. Third, the trustworthiness of the potential perceiver to the data (gynecologists/urologist) was rated. Finally, we added automatic speech recognition as a proxy for the underlining technology, which the participant also rated in terms of trustworthiness. Note, the experimental manipulation of expertise referred only to the tool itself. Consequently, the VA tool represents the primary source layer, while others refer to further source layers.

The Sexual Health Anamnesis Tool: Questions the VA Asked

After introducing itself, the VA started the anamnesis conversation, which involves 21 questions (e.g., D o you have venereal diseases ?— Which one? ). Four categories of questions were presented: puberty (e.g., What have been the first signs of your puberty? ), diseases/hygiene (e.g., Have you ever had one or more sexual diseases? ), sexual orientation (e.g., What genders do you have sexual intercourse with? ) and sexual activity of the past 4 weeks (e.g., How often have you had sexual intercourse in the past 4  weeks? ). The complete list of final measurements follows below.

Measurements

After finishing the conversation with the VA, participants answered a questionnaire presented via LimeSurvey on a 15.6″ laptop with an attached mouse. The measures of the questionnaire are presented below:

Perceived Trustworthiness of Source Layers

To measure the trustworthiness of the VA three questions adapted from Corritore et al. (2003) were asked (e.g., I think the tool is trustworthy ). Additionally, an overall item adapted from Casaló et al. (2007) was presented ( Overall I think that the tool is a save place for sensitive information ). All questions were answered on a 7-point Likert scale, ranging from not true at all to very true .

Questions concerning institutional trust from the SCOUT Questionnaire ( Bär et al., 2011 ) were transferred to assess the perceived trustworthiness of the tool provider, the platform provider, and automatic speech recognition. Items were answered on a 5-point Likert scale, ranging from not agree at all to agree totally . Five questions assessed the tool provider’s perceived trustworthiness (e.g., I believe in the honesty of the provider ) and the platform provider (same five questions). Four questions assessed the trustworthiness of automatic speech recognition ( Automatic speech recognition is trustworthy technology. ). Finally, the data receiver’s perceived trustworthiness , namely, the participant’s gynecologist/urologist, the KUSIV3-questionnaire, was used ( Beierlein et al., 2012 ). It includes three questions (e.g., I am convinced that my gynecologist/urologist has good aims ) on a 5-point Likert scale, ranging from not agree at all to agree totally .

Individual Characteristics

The disposition of trust was measured with three statements (e.g., For me, it is easy to trust persons or things ), assessed on a 5-point Likert scale (ranging from not agree at all to agree totally ) taken from the SCOUT Questionnaire ( Bär et al., 2011 ). The tendency to sexual self-disclosure was measured with the sexual self-disclosure scale ( Clark, 1987 ). Participants rated four questions, two on a 5-point Likert scale (e.g., How often do you talk about sexuality? , ranging from never-rarely to very often ) and two using a given set of answers (e.g., With whom do you talk about sexuality? mother | father | siblings | partner | friends(male) | friends(female) | doctors | nobody | other ) .

Manipulation Check

To measure how strong the participants believe that the tool is a "specialist" or “generalist,” two questions (e.g., The survey tool has high expertise in the topic ) were answered using a 5-point Likert scale.

The study took about 40 min, starting with COVID-19 hygienic routines (warm-up phase: washing and antisepticizing hands, answering a questionnaire and wearing a mouth-nose-mask). Since the experimental supervisor left the room for the actual experiment, participants could discard their face masks during the interaction with the VA. In the warm-up phase, participants were instructed to do a short tutorial with the VA, which asked some trivial questions (e.g ., How is the weather? or Do you like chocolate? ). When the participants confirmed to be ready to start the experiment, they were instructed to read the introduction text about the anamnesis tool and to start the interaction with the VA (experimental phase). After finishing, the participants answered the questionnaires and were briefly interviewed about the experience with the VA by the supervisor (cool-down phase). Figure 1 illustrates the procedure of the study and the three experimental phases.

www.frontiersin.org

FIGURE 1 . Illustrates the procedure of study.

Design, Hypothesis

Accordingly to previous studies (e.g., Gore and Madhavan, 1993 ; Reeves and Nass, 1996 ; Kareklas et al., 2015 ), the experiment followed a between-subjects design with two conditions (“specialist” or “generalist” VA). In line with the first research question, referring to the effects of perceived expertise on perceived trustworthiness (differentiated regarding source layers), the first hypotheses postulated that the perceived trustworthiness (across the source layers) would be higher in the specialist condition than in the generalist condition. The second research question addressed the impact of individual dispositions on perceived trustworthiness. In line with Mayer et al. (1995) and McKnight et al. (2002) , the second hypotheses assumed that higher individual trust-related dispositions result in higher trustworthiness ratings. Finally, the third research question explorative asked whether the perceived trustworthiness of the multiple source layers (e.g., the assistant tool, the providers, the receiver) interact with each other ( Koh and Sundar (2010) . Table 1 gives an overview of the hypothesis.

www.frontiersin.org

TABLE 1 . Hypotheses and a short overview of corresponding results.

Data Analyses

Data have been prepared as proposed by the corresponding references. To facilitate the comparability of measures of trustworthiness, items answered on a 7-point Likert scale were converted to a 5-point scale. Five t-tests for independent groups (specialist condition vs. generalist condition) were conducted to test the first hypotheses. The second group of hypotheses was tested, conducting five linear regression analyses with the source layers’ trustworthiness as the five criteria variables and the individual characteristics as the predictor variables. Finally, a linear regression analysis was conducted regarding the explorative research question with trustworthiness of the VA as the criteria and the other layers of trustworthiness as the predictors. The following section report means (M), standard deviations (SD) of scales as well as the test statistic parameters such as the t -value (t), and p -value ( p ).

Impact of Expert Condition on Source Indicators’ Perceived Trustworthiness

As expected, the perceived trustworthiness of the tool was higher in the specialist condition ( M = 3.206, SD = 1.037) than in the generalist condition ( M = 2.634, SD = 0.756). However, the result was just not significant ( t (38) = 2.019, p = 0.051, d = 0.638). Contrary to our expectations, the perceived trustworthiness of the platform provider, the tool provider, the data receiver, and the automatic speech recognition did not differ significantly between the conditions (see Figure 2 and Table 2 for descriptive results and t-test results of unadjusted analyses).

www.frontiersin.org

FIGURE 2 . Shows the unadjusted analyses of the different trustworthiness indicators distinguishing between the specialist condition (blue) and the generalist condition (red). The scale ranged from 1 to 5, with higher values indicating higher ratings of trustworthiness.

www.frontiersin.org

TABLE 2 . Results of unadjusted analyses.

Impact of Manipulation Check Success on Source Indicators’ Perceived Trustworthiness

Although a pre-test confirmed the manipulation of the two conditions, the manipulation check of the main study revealed a lack of effectivity: two control questions showed that the specialist tool was not rated as significantly “more special” than the generalist tool (see Table 3 for descriptive and t -test results). Consequently, the assignment to the two groups did not result in significantly different levels of perceived expertise of the VA.

www.frontiersin.org

TABLE 3 . Results of manipulation check.

Consequently, we needed to adjust the statistical analyses of the group comparisons. We re-analyzed the ratings of the control questions (asking for the VA’s expertise). Based on the actually perceived expertise, participants were divided into two groups with ratings below and above the averaged scales’ median ( MD = 3.333). Independently of the intended manipulation, 22 participants revealed higher ratings of the VA’s expertise (> 3.5; group 1) indicating that they rather perceived a specialist tool. In contrast, 18 participants revealed lower ratings of the expertise (< 3.0; group 2) indicating that they rather perceived a generalist tool. In sum, the re-analyses (referred to as the “adjusted analysis” below) will analyze the perceived trustworthiness of participants, who actually perceived the VA as a specialist or a generalist independently of the intended manipulation (see Figure 3 ).

www.frontiersin.org

FIGURE 3 . Shows the adjusted analyses of the different trustworthiness indicators distinguishing between the specialist condition (blue) and the generalist condition (red). The scale ranged from 1 to 5, with higher values indicating higher ratings of trustworthiness.

In line with the hypotheses, the perceived trustworthiness of the tool would be significantly higher, if the tool was actually perceived as a specialist ( M = 3.182, SD = 0.938) compared to the perception of a generalist ( M = 2.599, SD = 0.836; t (38) = 2.051, p = 0.045, d = 0.652). Also in line with expectations, the platform provider’s perceived trustworthiness would be significantly higher, if the VA was perceived as a specialist ( M = 3.581, SD = 1.313) than a generalist ( M = 2.822, SD = 1.170, t (38) = 2.318, p = 0.025, d = 0.737). The same applies to the provider’s perceived trustworthiness : the provider of the specialist tool was rated to be significantly more trustworthy ( M = 3.663, SD = 0.066) than the provider of a generalist tool ( M = 2.878, SD = 0.984; t (38) = 3.331 p = 0.02, d = 1.503). Likewise, the perceived trustworthiness of automatic speech recognition was significantly higher for a perceived specialist ( M = 3.057, SD = 0.994) than a generalist ( M = 2.125, SD = 0.376; t (38) = 3.757, p = 0.01, d = 1.194). However, the data receiver’s perceived trustworthiness did not differ significantly between a specialist ( M = 4.652, SD = 0.488) and a generalist ( M = 4.241, SD = 0.982; t (38) = 1.722, p = 0.093, d = 0.547).

Additional Predictors of the Perceived Trustworthiness

Linear regression analyses were conducted with the five trustworthiness source layers as the criteria variables and the general disposition to trust and to disclose sexual health information as predictor variables. Only two regressions involved significant predictions: the perceived trustworthiness of the VA and of the speech recognition in general. First, the prediction of the tool’s perceived trustworthiness was significant ( β = 0.655, t (37) = 2.056, p = 0.047) with the tendency to disclose health information contributing significantly: the higher this tendency, the higher the perceived trustworthiness of the tool. Second, the prediction of the automatic speech recognition’s trustworthiness was significant, with the participants’ disposition to trust contributing significantly to the prediction ( β = 0.306, t (37) = 2.276, p = 0.029). Moreover, the tendency to disclose sexual health information contributed substantially but only by trend ( β = 0.370, t (37) = 1.727, p = 0.093).

Further Source Layers of Trustworthiness

The final regression analysis investigated whether the further source layers of trustworthiness indicators (providers, speech recognition, receiver) predicted the tool’s trustworthiness. Results revealed that the trustworthiness of the provider of the tool (i.e., ignimed UG) significantly predicted the trustworthiness of the tool itself ( β = 0.632, t (35) = 3.573 , p = 0.001): the higher the provider’s trustworthiness, the higher the trustworthiness of the tool. Similar, but not significantly, the trustworthiness of the automatic speech recognition predicted the trustworthiness of the tool ( β = 0.371, t (35) = 1.768 , p = 0.086): the higher the trustworthiness of the speech recognition, the higher the perceived trustworthiness of the tool. In contrast, the higher the platform provider’s (i.e., Amazon) scores, the lower the trustworthiness of the tool, only by trend, however ( β = −0.446, t (35) = −1.772 , p = 0.085).

Aim of the Present Study

Voice-based (artificial intelligence) systems serving as digital assistants have evolved dramatically within the last few years. The healthcare sector has been referred to as an emerging market for these systems, which imposes different requirements on the systems than private usage scenarios do. Data revealed here are more personal and more sensitive, resulting in increasing engineering requirements regarding data security, for example. To establish voice-based systems in a more sensitive context, the users’ perspective needs to be considered. In a healthcare context, users need to trust their technological counterpart to disclose personal information. However, the trustworthiness of most of the systems in the market and the users’ willingness to trust in the applications has not been analyzed yet. The present study bridged this research gap. In an empirical study, the trustworthiness of a voice-based anamnesis tool was analyzed. In two different conditions, participants either interacted with a VA, which was introduced as a “specialist” or a “generalist”. Then, they rated the trustworthiness of the tool, distinguishing between different source layers of trust (provider, platform provider, automatic speech recognition in general, data receiver). To ensure external reliability, participants interacted with an anamnesis tool for sexual health, which collected health data by asking questions regarding their puberty, sexual orientation, diseases/hygiene and sexual activity. They were informed that the tool uses artificial intelligence to provide a diagnosis, which would be sent to the gynecologists/urologist.

Answering the Research Questions

The present study investigated three research questions: 1) Does the expert framing of a voice assistant increase its trustworthiness in the context of Further, 2) Does individual dispositions influence the perceived trustworthiness? Finally 3) Do different trustworthiness source indicators (e.g., the assistant tool, the providers, the receiver), interact with each other?

In line with previous studies, the present results revealed that participants, who perceive the VA tool as a specialist tool, reported higher levels of trustworthiness across all different source layers—compared to participants, who perceived the tool as a generalist. Considering, that the tool acted completely identically in both conditions and that the conditions only differed in terms of the introduction of the VA to the participant (written introduction and introduction presented by the VA itself), the present study highlights the manipulability of the users’ perception of the system and the effects this perception has on the evaluation of the trustworthiness of the system. The way a diagnostic tool is introduced to the patient seems to be of considerable importance when it comes to the patient’s perception of the tool and the willingness to interact with it. As the present study reveals, a few words can fundamentally change the patients’ opinions of the tool, which might affect their willingness to cooperate. In the presence of the ongoing worldwide pandemic, we all learnt about the need for intelligent tools, which support physicians with remote anamnesis and diagnosis to unburden the stationary medical offices. Our study shows how important it is to not only consider engineering aspects and ensure that the system functions properly but to consider also the users’ perception of the tool and the resulting trustworthiness. Thus, our results offer promising first insights for developers and designers. However, our results also refer to risks. Many health-related tools conquer the market without any quality checks. If these tools framed themselves as experts or specialists, users could be easily misled.

Following the basic assumption of the media equation approach and its significant research body confirming the idea that social rules and dynamics, which guide human-human interaction, similarly apply to human-computer interaction ( Reeves and Nass, 1996 ), voice-based systems could be regarded as a new era of technological counterpart. Being able to recognize process and produce human language, VA adopt features that have been exclusively human until recently. Consequently, VA can verbally introduce themselves to the users resulting in a powerful manipulation of the users’ perception. The presented results show how easily and effectively the impression of a VA can be manipulated. Furthermore, our results indicate an area of research, today’s HCI research tends to miss too often. While its primary focus is on the effect of gestalt design on usability and user experience, our results encourage to refer to the users’ perspective on the system and the perceived trustworthiness as an essential aspect of a responsible and serious design, which bears chances and risks for both high-quality and low-quality applications.

Referring to methodological challenges, the present study reveals limitations of the way we manipulated the impression of the VA. Participants read an introduction text, which referred to the VA as either a specialist or a generalist tool. Moreover, the VA introduced itself as a specialist or a generalist. Although a pre-test was conducted to ensure the manipulation, not all participants took the hints resulting in participants of the “specialist condition”, who did not refer to the VA as a specialist. Similarly, not all participants took the hints resulting in participants of the “generalist condition”, who did not refer to the VA as a generalist. Future studies in this area need to conduct a manipulation check to ensure their manipulation or to adopt their analysis strategy (e.g., post-hoc assignments of groups). In our study, unadjusted analyses, which strictly followed the intended manipulation, resulted in reduced effects compared to the newly composed groups following the participants’ actual perception of the tool. Additionally, future research should focus on manipulations that are more effective. Following the source-credibility model and the source-attractiveness model, the perspective on perceived competence is more complex. Besides the perception of expertise, authoritativeness, competence, qualification, or a system perceived as being trained, informed, and educated could contribute to the attribution of competence ( Ohanian, 1990 ; McKnight et al., 2002 ). Future studies should use the variety of possibilities to manipulate the perceived competence of a VA.

From a theoretical perspective, competence is only one dimension describing the human trustee’s main characteristics. Benevolence and integrity of the trustee are also relevant indicators (e.g., Ohanian, 1990 ; McKnight et al., 2002 ). In terms of an artificial trustees, performance (analogous to human competence), clarity (analogous to human benevolence), and transparency (analogous to integrity) are further dimension, which determine the impression ( Backhaus, 2017 ). Future studies should widen the perspective and refer to the multiple dimensions. Additionally, human information processing was introduced to follow two different routes: the systematic (central) or the heuristic (peripheral) route ( Cacioppo and Petty, 1986 ) with personal relevant topics increasing the probability to be processed systematically. Sexual health, the topic of the anamnesis tool of the present study, was shown to be of personal relevance ( Kraft Foods, 2009 ), indicating systematic processing of judgment-relevant cues (e.g., source’s expertise) ( Cacioppo and Petty, 1986 ). This might explain the limited effect of the manipulation on the ratings of trustworthiness. Possibly, the personally relevant topic of sexual health triggers the central route of processing resulting in the rather quick labeling of the VA as a heuristic (peripheral) cue to have a limited effect. Moreover, the interaction with the tool might have further diminished the effect of the manipulation. That might also explain why only participants, who explicitly evaluated the tool as a specialist, showed more trustworthiness. Thus, it should be further investigated whether effects of the heuristical design of health-related websites, for example ( Gore and Madhavan, 1993 ), can be transferred to voice-based anamnesis tools assessing highly personal relevant topics.

Regarding the second research question, the present results show only minor effects of the participants’ individual characteristics. From the multiple source layers of trustworthiness, the participants’ general disposition to trust only impacted the perceived trustworthiness of automatic speech recognition. Possibly, our sample was too homogenous regarding the participants’ disposition to trust: mean values of trust-disposition were relatively high ( M = 3.567 on a 5-point scale) and rather low ( SD = 1.03). Future studies could consider to incorporate predictors, which are more closely connected with the selected use case such as the tendency to disclose sexual health ( Mayer et al., 1995 ).

The third research question explored the relationship between the different source layers of trustworthiness. When predicting the trustworthiness of the tool, only the tool provider’s perceived trustworthiness was a significant predictor. The trustworthiness of the platform provider and automatic speech recognition are related by trend while the data receiver’s (gynecologists/urologist) trustworthiness was of minor importance. However, as our participants knew that their data would be only saved on the university’s server, the latter results might have been different if the data were transferred to the attending physician (or if participants assumed data transfer). Nevertheless, results are interpreted as a careful first confirmation of Koh and Sundar (2010) , who postulate that the perception of an artificial counterpart is not only influenced by the characteristics of the tool itself but also by indicators related to the tool. Referring to today’s most popular VAs for private use, Amazon Echo and Google Home, both tools might be closely associated with the perceived image or trustworthiness of the companies. If such consumer products are used in the context of healthcare, reservations regarding the companies might have an impact. Furthermore, the general view of automatic speech recognition affected the perceived trustworthiness of the tool. Thus, current public debates about digitalization or artificial intelligence should also be considered when designing and using VA for health-related applications.

Limitations and Future Work

To summarize the limitations and suggestions for future work presented above, the manipulation of competence and the additional indicators of trustworthiness need to be reconsidered. Future work might consider more fine-grained and more in-depth operationalization of different expert levels (e.g., referring to the performance of the tool), include further manipulations of the competence dimensions (e.g., referring to a trained system), or incorporate the dimensions of clarity (analogous to human benevolence) or transparency (resample integrity). The perceived trustworthiness might result from a systematic (central) information process due to potential high personal relevance. Future work should investigate whether the effects of the heuristic design of health-related websites, for example ( Gore and Madhavan, 1993 ; Kareklas et al., 2015 ), can be transferred to voice-based anamnesis tools assessing highly personal relevant topics. The data receiver (gynecologists/urologist) played only a subordinate role in the present study. As participants knew that their disclosed data would be stored on university servers, they were of minor importance. Future work should increase the external validity of the experiment by incorporating the data receiver more explicitly. Finally, only the tool’s expert status and not of the additional source layers of trustworthiness have been manipulated, resulting in relatively simple analyses of interaction effects between the trustworthiness indicators. Future studies might choose a more elaborate design. Finally, perceived trustworthiness is an essential topic for different application areas such as education ( Troussas et al., 2021 ). Another important field might be the perceived trustworthiness of multilingual voice assistants applicated in multilingual societies ( Mukherjee et al., 2021 ) A different approach to the perceived trustworthiness would be testing the impact of different dialogue architectures (e.g., Fernández et al., 2005 ). Strategies of dialogue design can be very different and impact on user’s trustworthiness. Future studies should investigate if hardcoded intents or flexible and natural spoken interactions have a different impact.

Conclusion and Contribution

Voice assistants gain in importance in healthcare contexts. With remote anamnesis and diagnoses gaining in importance these days, voice-based systems offer promising contributions, for instance, in the area of medical diagnoses. Using voice assistants in data sensitive contexts draws attention to the concept of trust: if patients were to reveal personal, sensitive information to the voice-based systems, they would need to trust them. However, the analysis and the understanding of the psychological processes characterizing the patient-voice assistant interaction is still in their early stages. For human-human relationships, psychological research revealed the characteristics of individuals, who give trust (trustor) and those who receive trust (trustee). Moreover, research established models of the characteristics, which are processed and attributed (e.g., source-credibility model , source-attractiveness model , HSM , ELM ). Researchers in the field of human-computer interaction transferred this knowledge to interactions with technological counterparts (e.g., television, web pages, web agents). However, little is known about voice-based tools, which have become increasingly popular, and which involve more complex, more humanlike features (speech processing) compared to technology so far. The present study contributes to close this research gap by presenting ideas for the design of VAs, which have been derived from literature. Furthermore, the study provides empirical data of human users interacting with a device to disclose health-related information. Results showed that participants, who perceived the VA tool as a specialist tool, reported higher trustworthiness scores than participants, who thought to interact with a generalist tool. To conclude, the users’ perception significantly influences the trust users have in the VA. Furthermore, influencing this perception was shown to be rather easy: a short-written introduction and a “spoken” introduction presented by the VA itself were sufficient to affect the users’ perception and their trust in the system significantly. In sum, we want to draw attention to the importance of the human user’s perspective when interacting with technology. Future studies need to address the trustworthiness of technology to contribute to more responsible and serious design processes to take the chances technology offers and to avoid the risks of low-quality applications.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

CW: concept of study, theoretical background, argumentation line, data analyses, supervision; CR: skill implementation, study conductance, data analyses; AC: theoretical background, argumentation line, supervision.

This publication was supported by the Open Access Publication Fund of the University of Wuerzburg.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Backhaus, N. (2017). User’s Trust and User Experience in Technical Systems: Studies on Websites and Cloud Computing 1–281. doi:10.14279/depositonce-5706

CrossRef Full Text | Google Scholar

Bär, N. (2014). Human-Computer Interaction and Online Users’ Trust. PhD dissertation. TU Chemnitz . Retrieved from: http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa149685

Bär, N., Hoffmann, A., and Krems, J. (2011). “Entwicklung von Testmaterial zur experimentellen Untersuchung des Einflusses von Usability auf Online-Trust,” in Reflexionen und Visionen der Mensch-Maschine-Interaktion – Aus der Vergangenheit lernen, Zukunft gestalten . Editors S. Schmid, M. Elepfandt, J. Adenauer, and A. Lichtenstein, 627–631.

Google Scholar

Bates, B. R., Romina, S., Ahmed, R., and Hopson, D. (2006). The Effect of Source Credibility on Consumers’ Perceptions of the Quality of Health Information on the Internet. Med. Inform. Internet Med. 31, 45–52. doi:10.1080/14639230600552601

PubMed Abstract | CrossRef Full Text | Google Scholar

Baumeister, J., Sehne, V., and Wienrich, C. (2019). “A Systematic View on Speech Assistants for Service Technicians,” in LWDA Jäschke Robert, and Weidlich Matthias Editors. Berlin, Germany , 195–206. doi:10.1136/bmjspcare-2019-huknc.228

Beierlein, C., Kemper, C. J., Kovaleva, A., and Rammstedt, B. (2012). “Kurzskala zur Messung des zwischenmenschlichen Vertrauens: Die Kurzskala Interpersonales Vertrauen (KUSIV3)[Short scale for assessing interpersonal trust: The short scale interpersonal trust (KUSIV3)],”in GESIS Working Papers 2012|22 , Kölm .

Cacioppo, J. T., and Petty, R. E. (1986). in “He Elaboration Likelihood Model of Persuasion,” in Advances in Experimental Social Psychology . Editor L. Berkowitz (New York: Academic Press ), 123–205.

Casaló, L. V., Flavián, C., and Guinalíu, M. (2007). The Role of Security, Privacy, Usability and Reputation in the Development of Online Banking. Online Inf. Rev. 31 (5), 583–603. doi:10.1108/14684520710832315

Chaiken, S., and Maheswaran, D. (1994). Heuristic Processing Can Bias Systematic Processing: Effects of Source Credibility, Argument Ambiguity, and Task Importance on Attitude Judgment. J. Personal. Soc. Psychol. 66, 460–473. doi:10.1037/0022-3514.66.3.460

Chaiken, S. (1987). “The Heuristic Model of Persuasion,” in Social influence: the ontario symposium M. P. Zanna, J. M. Olson, and C. P. Herman Editors. Lawrence Erlbaum Associates, Inc , 5, 3–39.

Clark, L. V. (1987). Sexual Self-Disclosure to Parents and Friends . Texas: University Press .

Corritore, C. L., Kracher, B., and Wiedenbeck, S. (2003). On-line Trust: Concepts, Evolving Themes, a Model. Int. J. Human-Computer Stud. 58, 737–758. doi:10.1016/s1071-5819(03)00041-7

Crisci, R., and Kassinove, H. (1973). Effect of Perceived Expertise, Strength of Advice, and Environmental Setting on Parental Compliance. J. Soc. Psychol. 89, 245–250. doi:10.1080/00224545.1973.9922597

Eisend, M. (2006). Source Credibility Dimensions in Marketing Communication–A Generalized Solution. J. Empir. Gen. Mark. Sci. 10.

Endreß, M. (2010). “Vertrauen–soziologische Perspektiven,” in Vertrauen–zwischen Sozialem Kitt Senkung Von Transaktionskosten , 91–113.

ePharmaINSIDER (2018). Das sind die Top 12 Gesundheits-Chatbots. (Accessed March 17, 2021). Available at: https://www.epharmainsider.com/die-top-12-gesundheits-chatbots/ .

Fernández, F., Ferreiros, J., Sama, V., Montero, J. M., Segundo, R. S., Macías-Guarasa, J., et al. (2005). “Speech Interface for Controlling an Hi-Fi Audio System Based on a Bayesian Belief Networks Approach for Dialog Modeling,” in Ninth European Conference on Speech Communication and Technology .

Giffin, K. (1967). The Contribution of Studies of Source Credibility to a Theory of Interpersonal Trust in the Communication Process. Psychol. Bull. 68, 104–120. doi:10.1037/h0024833

Gore, P., and Madhavan, S. S. (1993). Credibility of the Sources of Information for Non-prescription Medicines. J. Soc. Adm. Pharm. 10, 109–122.

Hernandez, A. (2021). The Best 7 Free and Open Source Speech Recognition Software Solutions. Goodfirms. Available at: https://www.goodfirms.co/blog/best-free-open-source-speech-recognition-software (Accessed March 17, 2021).

Hoff, K. A., and Bashir, M. (2015). Trust in Automation. Hum. Factors 57, 407–434. doi:10.1177/0018720814547570

Hörner, T. (2019). “Sprachassistenten im Marketing,” in Marketing mit Sprachassistenten . Springer, Gabler : Wiesbaden , 49–113. doi:10.1007/978-3-658-25650-0_3

Hovland, C. I., Janis, I. L., and Kelley, H. H. (1953). Communication and Persuasion . New Haven, CT: Yale University Press

idealo (2020). E-Commerce-Trends 2020: Millennials Treiben Innovationen Voran. Idealo. Available at: https://www.idealo.de/unternehmen/pressemitteilungen/ecommerce-trends-2020/ (Accessed March 17, 2021).

Kareklas, I., Muehling, D. D., and Weber, T. J. (2015). Reexamining Health Messages in the Digital Age: A Fresh Look at Source Credibility Effects. J. Advertising 44, 88–104. doi:10.1080/00913367.2015.1018461

Kim, K. J. (2014). Can Smartphones Be Specialists? Effects of Specialization in mobile Advertising. Telematics Inform. 31, 640–647. doi:10.1016/j.tele.2013.12.003

Kim, K. J. (2016). Interacting Socially with the Internet of Things (IoT): Effects of Source Attribution and Specialization in Human-IoT Interaction. J. Comput-mediat Comm. 21, 420–435. doi:10.1111/jcc4.12177

Koh, Y. J., and Sundar, S. S. (2010). Heuristic versus Systematic Processing of Specialist versus Generalist Sources in Online media. Hum. Commun. Res. 36, 103–124. doi:10.1111/j.1468-2958.2010.01370.x

Kraft Foods (2009). Tabuthemen. Statista. Available at: https://de.statista.com/statistik/daten/studie/4464/umfrage/themen-ueber-die-kaum-gesprochen-wird/ (Accessed March 17, 2021).doi:10.1007/978-3-8349-8235-3

Leshner, G., Reeves, B., and Nass, C. (1998). Switching Channels: The Effects of Television Channels on the Mental Representations of Television News. J. Broadcasting Electron. Media 42, 21–33. doi:10.1080/08838159809364432

Liew, T. W., and Tan, S.-M. (2018). Exploring the Effects of Specialist versus Generalist Embodied Virtual Agents in a Multi-Product Category Online Store. Telematics Inform. 35, 122–135. doi:10.1016/j.tele.2017.10.005

Mayer, R. C., Davis, J. H., and Schoorman, F. D. (1995). An Integrative Model of Organizational Trust. Amr 20, 709–734. doi:10.5465/amr.1995.9508080335

McKnight, D. H., Choudhury, V., and Kacmar, C. (2002). Developing and Validating Trust Measures for E-Commerce: An Integrative Typology. Inf. Syst. Res. 13, 334–359. doi:10.1287/isre.13.3.334.81

McKnight, D. H., Cummings, L. L., and Chervany, N. L. (1998). Initial Trust Formation in New Organizational Relationships. Amr 23, 473–490. doi:10.5465/amr.1998.926622

Meticulous Market Research (2021). Healthcare Virtual Assistant Market By Product (Chatbot And Smart Speaker), Technology (Speech Recognition, Text-To-Speech, And Text Based), End User (Providers, Payers, And Other End User), And Geography - Global Forecast To 2025. Available at: http://www.meticulousresearch.com/ (Accessed March 17, 2021).

Mukherjee, S., Nediyanchath, A., Singh, A., Prasan, V., Gogoi, D. V., and Parmar, S. P. S. (2021). “Intent Classification from Code Mixed Input for Virtual Assistants,” in 2021 IEEE 15th International Conference on Semantic Computing (ICSC) ( IEEE ), 108–111.

Nass, C., and Moon, Y. (2000). Machines and Mindlessness: Social Responses to Computers. J. Soc. Isssues 56, 81–103. doi:10.1111/0022-4537.00153

Ohanian, R. (1990). Construction and Validation of a Scale to Measure Celebrity Endorsers’ Perceived Expertise, Trustworthiness, and Attractiveness. J. Advertising 19, 39–52. doi:10.1080/00913367.1990.10673191

Pornpitakpan, C. (2004). The Persuasiveness of Source Credibility: A Critical Review of Five Decades’ Evidence. J. Appl. Soc. Pyschol 34, 243–281. doi:10.1111/j.1559-1816.2004.tb02547.x

Reeves, B., and Nass, C. (1996). The media Equation: How People Treat Computers, Television, and New media like Real People and Places . Cambridge, UK: Cambridge University Press .

Riedl, B., Gallenkamp, J., and Hawaii, A. P. (2013). “The Moderating Role of Virtuality on Trust in Leaders and the Consequences on Performance,” in 2013 46th Hawaii International Conference on System Sciences IEEE , 373–385. Available at: https://ieeexplore.ieee.org/Xplore/home.jsp . doi:10.1109/hicss.2013.644

Söllner, M., Hoffmann, A., Hoffmann, H., and Leimeister, J. M. (2012). Vertrauensunterstützung für sozio-technische ubiquitäre Systeme. Z. Betriebswirtsch 82, 109–140. doi:10.1007/s11573-012-0584-x

The Medical Futurist (2020). The Top 12 Healthcare Chatbots. Med. Futur . Available at: https://medicalfuturist.com/top-12-health-chatbots (Accessed March 17, 2021).

Troussas, C., Krouska, A., Alepis, E., and Virvou, M. (2021). Intelligent and Adaptive Tutoring through a Social Network for Higher Education. New Rev. Hypermedia Multimedia , 1–30. doi:10.1080/13614568.2021.1908436

Venkatraman, V., Dimoka, A., Pavlou, P. A., Vo, K., Hampton, W., Bollinger, B., et al. (2015). Predicting Advertising success beyond Traditional Measures: New Insights from Neurophysiological Methods and Market Response Modeling. J. Marketing Res. 52, 436–452. doi:10.1509/jmr.13.0593

McKnight, D. H., Choudhury, V., and Kacmar, C. (2002b). Developing and validating trust measures for e-commerce: An integrative typology. Information Systems Research 13, 334–359. doi:10.1287/isre.13.3.334.81

Keywords: voice assistant, trustworthiness, trust, anamnesis tool, expertise framing (Min5-Max 8)

Citation: Wienrich C, Reitelbach C and Carolus A (2021) The Trustworthiness of Voice Assistants in the Context of Healthcare Investigating the Effect of Perceived Expertise on the Trustworthiness of Voice Assistants, Providers, Data Receivers, and Automatic Speech Recognition. Front. Comput. Sci. 3:685250. doi: 10.3389/fcomp.2021.685250

Received: 24 March 2021; Accepted: 19 May 2021; Published: 17 June 2021.

Reviewed by:

Copyright © 2021 Wienrich, Reitelbach and Carolus. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Carolin Wienrich, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

An Empirical Study of Older Adult’s Voice Assistant Use for Health Information Seeking

voice assistant research paper

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations.

  • Chang F Sheng L Gu Z (2024) Investigating the Integration and the Long-Term Use of Smart Speakers in Older Adults’ Daily Practices: Qualitative Study JMIR mHealth and uHealth 10.2196/47472 12 (e47472) Online publication date: 12-Feb-2024 https://doi.org/10.2196/47472
  • Yang Z Xu X Yao B Rogers E Zhang S Intille S Shara N Gao G Wang D (2024) Talk2Care: An LLM-based Voice Assistant for Communication between Healthcare Providers and Older Adults Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 10.1145/3659625 8 :2 (1-35) Online publication date: 15-May-2024 https://dl.acm.org/doi/10.1145/3659625
  • Wong N Jeong S Reddy M Stamatis C Lattie E Jacobs M (2024) Voice Assistants for Mental Health Services: Designing Dialogues with Homebound Older Adults Proceedings of the 2024 ACM Designing Interactive Systems Conference 10.1145/3643834.3661536 (844-858) Online publication date: 1-Jul-2024 https://dl.acm.org/doi/10.1145/3643834.3661536
  • Show More Cited By

Index Terms

Human-centered computing

Human computer interaction (HCI)

Ubiquitous and mobile computing

Empirical studies in ubiquitous and mobile computing

Recommendations

“it’s kind of like code-switching”: black older adults’ experiences with a voice assistant for health information seeking.

Black older adults from lower socioeconomic environments are often neglected in health technology interventions. Voice assistants have a potential to make healthcare more accessible to older adults, yet, little is known about their experiences with this ...

“If Alexa knew the state I was in, it would cry”: Older Adults’ Perspectives of Voice Assistants for Health

AI-powered technologies are increasingly being leveraged in health and care practices for aging populations. However, we lack research about older adults perceptions of AI-driven health in long-term care settings. This paper investigates older adults’ ...

Recruiting Older Adults in the Wild: Reflections on Challenges and Lessons Learned from Research Experience

It is important to understand the older adults prior to the design process. The understanding can better facilitate design conversations between the researchers and the older adults. In this paper, we discussed our experiences of building a relationship ...

Information

Published in.

cover image ACM Transactions on Interactive Intelligent Systems

Juji, Inc., USA

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Voice assistants
  • interactive systems
  • older adults
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 24 Total Citations View Citations
  • 2,439 Total Downloads
  • Downloads (Last 12 months) 827
  • Downloads (Last 6 weeks) 57
  • Rey B Sakamoto Y Sin J Irani P (2024) Understanding User Preferences of Voice Assistant Answer Structures for Personal Health Data Queries Proceedings of the 6th ACM Conference on Conversational User Interfaces 10.1145/3640794.3665552 (1-15) Online publication date: 8-Jul-2024 https://dl.acm.org/doi/10.1145/3640794.3665552
  • Zhan X Abdi N Seymour W Such J (2024) Healthcare Voice AI Assistants: Factors Influencing Trust and Intention to Use Proceedings of the ACM on Human-Computer Interaction 10.1145/3637339 8 :CSCW1 (1-37) Online publication date: 26-Apr-2024 https://dl.acm.org/doi/10.1145/3637339
  • Brewer R Ankenbauer S Hashmi M Upadhyay P (2024) Examining Voice Community Use ACM Transactions on Computer-Human Interaction 10.1145/3635151 31 :2 (1-29) Online publication date: 5-Feb-2024 https://dl.acm.org/doi/10.1145/3635151
  • Liu B Wang C (2024) Elderly-Centric Chromatics: Unraveling the Color Preferences and Visual Needs of the Elderly in Smart APP Interfaces International Journal of Human–Computer Interaction 10.1080/10447318.2024.2338659 (1-10) Online publication date: 15-Apr-2024 https://doi.org/10.1080/10447318.2024.2338659
  • Wang X Cao N Chen Q Cao S (2024) The interaction design of 3D virtual humans: A survey Computer Science Review 10.1016/j.cosrev.2024.100653 53 (100653) Online publication date: Aug-2024 https://doi.org/10.1016/j.cosrev.2024.100653
  • Lin K (2024) Towards Inclusive Voice User Interfaces: A Systematic Review of Voice Technology Usability for Users with Communication Disabilities HCI International 2024 Posters 10.1007/978-3-031-61947-2_9 (75-85) Online publication date: 2-Jun-2024 https://doi.org/10.1007/978-3-031-61947-2_9
  • Zubatiy T Mathur N Heck L Vickers K Rozga A Mynatt E (2023) "I don't know how to help with that" - Learning from Limitations of Modern Conversational Agent Systems in Caregiving Networks Proceedings of the ACM on Human-Computer Interaction 10.1145/3610170 7 :CSCW2 (1-28) Online publication date: 4-Oct-2023 https://dl.acm.org/doi/10.1145/3610170

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

View this article in Full Text.

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Voice Assistant Using Artificial Intelligence

Proceedings of the International Conference on Innovative Computing & Communication (ICICC) 2022

4 Pages Posted: 14 Mar 2023

Greater Noida Institute of Technology (GNIT)

Ritik Bhargava

Priyanshu priya, rupa kumari.

Date Written: March 10, 2023

Artificial intelligence is crucial to day-to-day life. science of computers defines AI research as the study of intelligent agents. Today, most people, intentionally or not, have turned to some form of computerized information processing technology. Artificial Intelligence (AI) is already changing our lifestyle. A device that senses its surroundings and performs actions that maximize the likelihood of achieving a goal. The input to a database can be a choose to of users and articles, and the output is a ruthless recommendation. Input can be verbally or textually submitted by the user within the system. This paper presents a new approach to intelligent search. Overall, there are many people around the world who use assistants. This paper describes the provocation of applying virtual assistant technology. The paper also introduces the application of virtual assistants that can help open up opportunities for humanity in various fields. Voice control is an important growing feature that will change people's lives. Voice assistants are available for laptops, desktops and mobile phones. Assistant is now available on all electronic devices. A voice assistant is a software agent that can interpret human speech and respond in machine language.

Keywords: Perception, Artificial Intelligence, Python, Chatbot

Suggested Citation: Suggested Citation

Ajay Sahu (Contact Author)

Greater noida institute of technology (gnit) ( email ), do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, electrical engineering ejournal.

Subscribe to this fee journal for more curated articles on this topic

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants

  • January 2018
  • Medical Reference Services Quarterly 37(1):81-88
  • 37(1):81-88

Matthew B Hoy at Mayo Foundation for Medical Education and Research

  • Mayo Foundation for Medical Education and Research

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Johanne Trippas

  • Luke Gallagher
  • Joel Mackenzie
  • Sally Richmond
  • Mietta Bell
  • Marie B. H. Yap
  • Swati Aggarwal
  • Anshul Mittal
  • Sanchit Aggarwal
  • Anshul Kumar Singh
  • Paola Esquivel
  • Kayden Gill

Cheng-Shiu Chung

  • Julie Faieta
  • Adriano Dutra de Jesus
  • Raquel da Silva Vieira Coelho
  • Willyan Alves da Silva
  • Karan Berry
  • Amrik Singh
  • Teresa Weller

Irina Lock

  • Do-Hyung Park

Adam Dooley

  • Sean McGrath

Eoin O'Connell

  • Health Informat J

Maria Wolters

  • Jonathan Kilgour

Julia Hirschberg

  • Br Med J Int Ed

Sumant Patil

  • Clayton Moore
  • Valentina Palladino
  • Mark Hachman
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Internet Res
  • v.22(9); 2020 Sep

Logo of jmir

Investigating the Accessibility of Voice Assistants With Impaired Users: Mixed Methods Study

Fabio masina.

1 Human Inspired Technologies Research Center, University of Padova, Padova, Italy

Valeria Orso

Patrik pluchino.

2 Department of General Psychology, University of Padova, Padova, Italy

Giulia Dainese

3 L'Incontro Social Enterprise, Castelfranco Veneto, Italy

Stefania Volpato

Cristian nelini.

4 Day Center for Severe Acquired Brain Injury, Opere Pie—Istituto Pubblico di Assistenza e Beneficienza, Pederobba, Italy

Daniela Mapelli

Anna spagnolli, luciano gamberini, associated data.

Linear regression models considered in the analyses with their respective values of R2, adjusted R2, and Akaike information criterion.

Voice assistants allow users to control appliances and functions of a smart home by simply uttering a few words. Such systems hold the potential to significantly help users with motor and cognitive disabilities who currently depend on their caregiver even for basic needs (eg, opening a door). The research on voice assistants is mainly dedicated to able-bodied users, and studies evaluating the accessibility of such systems are still sparse and fail to account for the participants’ actual motor, linguistic, and cognitive abilities.

The aim of this work is to investigate whether cognitive and/or linguistic functions could predict user performance in operating an off-the-shelf voice assistant (Google Home).

A group of users with disabilities (n=16) was invited to a living laboratory and asked to interact with the system. Besides collecting data on their performance and experience with the system, their cognitive and linguistic skills were assessed using standardized inventories. The identification of predictors (cognitive and/or linguistic) capable of accounting for an efficient interaction with the voice assistant was investigated by performing multiple linear regression models. The best model was identified by adopting a selection strategy based on the Akaike information criterion (AIC).

For users with disabilities, the effectiveness of interacting with a voice assistant is predicted by the Mini-Mental State Examination (MMSE) and the Robertson Dysarthria Profile (specifically, the ability to repeat sentences), as the best model shows (AIC=130.11).

Conclusions

Users with motor, linguistic, and cognitive impairments can effectively interact with voice assistants, given specific levels of residual cognitive and linguistic skills. More specifically, our paper advances practical indicators to predict the level of accessibility of speech-based interactive systems. Finally, accessibility design guidelines are introduced based on the performance results observed in users with disabilities.

Introduction

Voice-activated technologies are becoming pervasive in our everyday life [ 1 , 2 ]. In 2017, 46% of Americans reported using voice-activated technologies [ 3 - 5 ]. One of the most prominent application domains is the domestic environment, where voice assistants, a branch of voice-activated technologies, allow the user to control and interact with several home appliances in a natural way by uttering voice commands [ 6 , 7 ]. When integrated into a smart house, voice assistants allow the user to perform numerous everyday actions without the need to move and reach the actual object. More specifically, the user can operate all the devices that are connected, ranging from switching the lights on and off to opening and closing the doors and windows, for instance.

Research on voice assistants is focused mainly on the general population. Indeed, the studies investigating user experience and usability of voice assistants mainly involved able-bodied users [ 3 , 8 - 11 ], thereby neglecting a broad community of users with disabilities. However, people suffering from motor and cognitive impairments would significantly benefit from the possibility of controlling home appliances and personal devices remotely. Voice assistants hold the potential to enable individuals with disabilities to govern their houses without the need to constantly depend on caregivers [ 3 , 12 ].

One of the obvious barriers that some users with disabilities can encounter by interacting with voice assistants is related to speech impairments [ 13 ] that are a frequent secondary consequence of motor disorders [ 14 ]. Although most voice assistants exploit machine learning algorithms to adapt to the user and increase their speech recognition accuracy over time [ 15 , 16 ], these systems are still designed and developed for people with clear and intelligible speech. Thus, the difficulty of clearly utter sentences and speaking with adequate vocal intensity may represent a relevant accessibility challenge of voice assistants. The accessibility of voice assistants has not been thoroughly investigated yet. In this study, we explored how users with motor, linguistic, and cognitive disabilities interact with a commercial voice assistant in a natural situation. More specifically, the aim was to investigate the role of cognitive and linguistic functions to predict the performance of individuals affected by physical, linguistic, and cognitive difficulties in interacting with a voice assistant.

Voice Assistants for Users With Disabilities

Studies investigating the interaction between users with disabilities and voice assistants are still sparse. However, some evidence is starting to shed light in this field. Recently, Pradhan and colleagues [ 7 ] investigated the opinions of disabled users who regularly deploy voice assistants by examining their reviews. Most comments (about 86%) were positive, highlighting how the device has made it easier to accomplish specific tasks autonomously (eg, playing songs). The complaints were mainly focused on the lack of desired features, yet users pointed out that the main challenges they have in interacting with the voice assistant were due to the need to speak aloud and respect a precise timing for uttering the command. On the whole, these findings were confirmed by a following interview-based study with users with disability [ 7 ].

Ballati and colleagues [ 17 ] investigated to what extent people affected by speech impairments could be understood by three different voice assistants available off the shelf. More specifically, accuracy in speech processing was tested using sentences extracted from the TORGO database, which includes the recordings of 8 English speakers with dysarthria [ 18 ]. The sentences extracted were spoken to the voice assistants, and the accuracy across systems was compared. Each system processed the sentences one by one, while the experimenter scored the system accuracy with respect to the ability of the system to understand the sentence and consistency of the answer by the system. Results of this study revealed a general speech recognition accuracy of 50% to 60%, with all three systems having similar performance. These findings were partially confirmed with dysarthric Italian patients [ 19 ], where authors found different performance accuracy across the voice assistants.

While insightful, the studies reported above have limitations that might make it challenging to generalize the results. First, the actual speech abilities of the users were not assessed because they were either self-reported [ 7 ] or not reported at all [ 17 , 19 ]. This approach fails to provide clear indications for the design of voice assistants, as it does not highlight the users’ needs. In addition, previous studies focused on speech abilities, neglecting cognitive skills. Cognitive skills were proven to affect the ability to operate a voice-controlled device by Weiner and colleagues [ 20 ]. In this study, the voice-controlled system showed a decrease in accuracy of speech recognition when the speakers suffered from Alzheimer disease or age-related cognitive decline. Furthermore, patients with Alzheimer disease experience difficulties interacting with a voice-controlled robot because of the timing imposed by the device [ 21 ].

Some of the previous research [ 17 , 19 ] did not even involve humans as participants, as they relied on prerecorded sentences. While this ensures high reliability in terms of assessing the robustness of the system, it also fails to account for the variability of individual performances and motivation behind the actual use of the device. Likewise, Pradhan and colleagues [ 7 ] found that about a third of the reviews analyzed were written by caregivers, who may have reported their viewpoint, misleading the perspective of the person they assisted. Finally, to the best of the writers’ knowledge, the research available so far was conducted in laboratory settings, where the background noise is controlled, if any, and where there are no group interactions, as is likely to happen in a household.

Speech and Cognitive Factors Accounting for User Performance

Speech and cognitive skills play a significant role in the ability to effectively control voice assistants [ 17 , 19 , 22 ]. To properly convey a voice command, users must adequately control the speed and rhythm of speech. As reported in a previous study, speech disfluency can represent an accessibility barrier to voice assistants. For instance, long hesitations or pauses can be misinterpreted by the system as a sentence delimiter [ 23 ], causing an alteration of speech segmentation. Moreover, users must be able to correctly articulate words, especially multisyllable words (eg, temperature) or specific words that may require more effort to be articulated [ 24 ]. A further aspect to properly interact with these devices is the voice intensity, which should be sufficiently loud to make voice assistants detect and segment the sounds [ 7 ].

Along with these speech skills, cognitive abilities are required to utter a command. The user must remember specific keywords and specific sequences of words to operate the system. These abilities involve memory functions, specifically long-term memory and working memory, both crucial when interacting with voice interfaces [ 25 ]. In addition, the user must respect a specific timing to provide the commands, a capacity that counts on executive functions, namely a set of functions needed to plan and control actions [ 26 ]. Not least, to properly use a voice assistant, the user must also monitor the feedback of the system (which sometimes consists of simple lights) and correctly interpret it. Such skills rely on underlying attention processes.

Study Design

This study was meant to assess the accessibility of a commercial voice assistant. In particular, we investigated whether specific cognitive and/or linguistic skills were related to the effectiveness of the interaction. To this end, the study consisted of two phases. In phase 1, participants were involved in group sessions, in which they were invited to interact with the voice assistant by performing several realistic tasks in a living laboratory (eg, switching on the light). Each group session involved 4 participants. This choice was motivated by our desire to build a friendly and informal setting that could facilitate interaction and prevent the feeling of being in a testing situation. Group sessions were video recorded to allow offline analysis of participant performances. In phase 2, participants received an evaluation of their neuropsychological and linguistic functions. The two phases of the study took place in different settings and on separate days and required different experimental materials. The study was approved by the Ethics Committee of the Human Inspired Technologies Research Center, University of Padova, Italy (reference number 2019_39).

Participants

A total of 16 participants (9 males, 7 females) took part in the study. The mean age of the sample was 38.3 (SD 8.6) years (range 22 to 51 years). On average, they had 11.8 (SD 2.7) years of education (range 8 to 18 years). To partake in the study, participants had to meet the following inclusion criteria: (1) suffering from ascertained motor impairments and related language difficulties and (2) needing daily assistance from at least one caregiver. The sample was characterized by 6 participants affected by congenital disorders, 2 participants with neurodegenerative disorders, 4 participants affected by traumatic brain injury, and 4 participants with nontraumatic brain injury (ie, tumor). The heterogeneity of the sample well represents the population that can be found in daycare centers. Participants were indeed recruited from a daycare center for people with disabilities, with which the research team collaborates. Before enrollment, all invited participants received an explanation of the activity. Upon agreement, they were provided written informed consent (if necessary, the individual’s legal guardian was informed about the scope and unfolding of the activity and gave the informed consent for the person they assisted to partake in the study). In any case, informed consent was given prior to their enrollment. Participants received no compensation for taking part in the study.

Phase 1: Interaction With the Voice Assistant

The first phase took place in a living laboratory. The room was furnished to resemble a living room with a large table in the middle. The voice assistant was placed at the center of the table, around which participants and experimenters were sitting ( Figure 1 ). The laboratory was equipped with several devices that were connected to the voice assistant and could be controlled by prompting voice commands. All of the voice-controlled devices were placed so that users could easily see them. The room was also equipped with two camcorders to video record the sessions. One camera was placed above the table and enabled the observation of users’ interactions with the voice assistant. The other camera served to record the outcomes of the interaction ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is jmir_v22i9e18431_fig1.jpg

Representation of the experimental setting.

For this study, a commercial voice assistant was deployed. More specifically, we chose to use Google Home (Google LLC), given its growing popularity. Two lamps and a floor fan were connected to smart plugs, which were in turn connected to the voice assistant, thereby enabling control of the switch on/off and light color change (for the lamps only). A 50-inch television was connected to Chromecast (Google LLC), which was in turn connected to Google Home. By doing this, it was possible to operate the TV using voice commands. For the video recordings, two video cameras were installed, one was a C920 Pro HD (Logitech) and the other one was a Handycam HDR-XR155E (Sony Europe BV).

Participants were invited to individually prompt some commands to the voice assistant, as indicated by the experimenter. The tasks comprised turning on/off the fan and the lights, changing the color of the light, interacting with the TV (activating YouTube, Spotify, and Netflix), and making specific requests to the voice assistant (eg, “set an alarm for 1 pm”). The full list of commands that participants were asked to speak can be seen in Textbox 1 .

The list of voice commands that participants were asked to speak during the first phase of the study.

  • Turning on/off
  • Changing colors
  • Changing light intensity

TV (YouTube)

  • Selecting videos
  • Increasing/decreasing volume

TV (Netflix)

  • Selecting movies
  • Pausing movies
  • Playing movies

TV (Spotify)

  • Selecting songs

Voice assistant

  • Asking for the latest news
  • Asking for the weather forecast
  • Setting an alarm

Participants were first welcomed in the living laboratory and invited to make themselves comfortable. They were reminded about the aim and the unfolding of the activity. In addition, they were shown the camcorders and after they all proved to be aware of them, the video recording started. At this point, the experimenter showed how the voice assistant worked by prompting some example commands and properly explained the correct sequence of words to convey the command. Next, participants were allowed to familiarize themselves with the voice assistant until they felt confident. When they considered themselves ready, the experimental session started. The experimenter asked each participant to individually perform the selected tasks ( Textbox 1 ). The tasks were not proposed in a strict order across participants. To keep the session lively and prevent boredom and fatigue, the tasks were alternated across participants. Should a participant fail to accomplish a requested task (eg, the voice assistant did not respond in the expected manner), the experimenter gently encouraged them to try again. A fixed number of attempts was not set a priori to prevent participants from feeling frustrated as a consequence of repeated failed attempts. Participants were allowed to try until they felt comfortable.

Once the task list was completed by all participants, the experimenter asked them their impressions about the voice assistant in a semistructured group interview. The questions regarded an overall evaluation of the pleasantness of the voice assistant (from 1 to 10), in which rooms it would be more helpful, if they would like to have it in their own houses, and which additional functions they would like to control. Phase 1 took about 2.5 hours.

Phase 2: Neuropsychological and Linguistic Assessment

Data collection.

All of the participants involved in phase 1 received an individual examination by a trained neuropsychologist and a speech therapist, who were both blind to the outcomes of the users’ performances with the voice assistant. Several assessment tools were selected and adopted. More specifically, the neuropsychological functions were assessed with the Addenbrooke’s Cognitive Examination–Revised (ACE-R) [ 27 ] and the Frontal Assessment Battery (FAB) [ 28 ]. The linguistic assessment was conducted by collecting several measures, namely participant vocal intensity, and other speech production indices gathered using the standardized Italian version of the Robertson Dysarthria Profile [ 29 ]. The evaluation sessions took place in a quiet room at the daycare center where participants were recruited and lasted about 1.5 hours for the neuropsychological evaluation and 2 hours for the linguistic evaluation.

Neuropsychological and Linguistic Tests

The ACE-R [ 27 ] is a screening test originally proposed as an extension of the Mini-Mental State Examination (MMSE) [ 30 ]. The ACE-R allows the evaluation of 5 cognitive domains, attention/orientation, memory, verbal/category fluency, language, and visuospatial ability, in addition to providing the MMSE score. Attention/orientation is assessed by asking the participant about the date, season, and current location where the evaluation is taking place, as well as repeating 3 single words and doing serial subtractions. Memory consists of items that evaluate episodic and semantic memory. Verbal and category fluency require the ability to list in 1 minute as many words as possible complying with a verbal criterion and a category criterion. Language includes several subtasks, requiring speech comprehension, naming figures, repeating words and sentences, reading regular and irregular words, and writing. Finally, visuospatial ability consists of copying and drawing specific pictures.

With respect to the MMSE, it represents a general index of cognitive functioning ranging from 0 to 30. A score below 24 may indicate the presence of cognitive impairment [ 30 ].

Frontal Assessment Battery

The FAB [ 28 ] is a brief inventory for the evaluation of executive functions. It is composed by 6 subscales exploring domains: conceptualization (similarities test), mental flexibility (verbal fluency test), motor programming (Luria motor sequences), sensitivity to interference (conflicting instructions), inhibitory control (go/no-go test), and environmental autonomy (prehension behavior). Each domain consists of 3 items and is scored from 0 (unable to complete the requests) to 3 (fully able to fulfill the requests). The maximum overall score for the FAB is 18.

Vocal Intensity

Vocal intensity reflects the loudness of the voice. Physically, it represents the magnitude of the oscillations of the vocal folds, and it is measured in decibels (dB). In this study, vocal intensity was collected by using the PRAAT software [ 31 ], a tool for speech analysis. Participants were invited to repeat aloud a specific sentence (ie, “Turn off the light, turn on the TV” in their native language) for 5 minutes at a distance of 1.5 meters from the recording device.

Speech Production

An expert speech and language therapist assessed participant speech production. The protocol adopted for the evaluation was extracted from the Robertson Dysarthria Profile [ 29 ]. This test is divided into 8 subscales (ie, intelligibility, respiration, phonation, facial muscles, diadochokinesis, oral reflexes, prosody, articulation), each including several items. The therapist assigns a score on a 4-point scale (1 = severe, 2 = moderate, 3 = mild, 4 = normal) for each item of the test. In this study, the subscales considered were prosody and articulation. More specifically, for prosody (2 items) the items assessed the speed and rhythm of speech production. With regard to articulation (5 items), the items considered the ability to articulate single letters (consonants and vowels) and clustered letters (groups of consonants and multisyllable words), as well as the capacity to repeat sentences.

Data Analysis

The data analysis comprised analysis of the video recordings to assess the extent to which users were capable to effectively interacting with the voice assistant. The outcomes of the analysis were summarized into a performance index. The index was then associated with the neuropsychological and linguistic measures collected in the second phase of this study. Since the main purpose of this study was the identification of predictors (cognitive and/or linguistic) capable of accounting for an effective interaction with the voice assistant, multiple linear regression models were run.

Video Analysis

The two video streams recorded during the sessions were synchronized into a single video file using a video editing software. The resulting video was then imported into a dedicated software for the analysis (The Observer XT 12, Noldus Information Technology Inc). The analysis was conducted in two passes. During the first pass, two of the authors watched the videos and selected the events of interest: the experimenter’s requests, participants’ actions, and voice assistant’s responses. The two researchers then agreed on the events to code, defining the objective triggers detailing the beginning and the end of each. A trained coder was in charge of rating the videos.

For each participant, the number of attempts they made for each task request and the resulting outcome were coded. More specifically, the beginning of an attempt was coded when the experimenter prompted the participant to try to accomplish a given task. The attempt ended with either the actual activation of the intended function (successful outcome) or with a failure to observe the expected outcome (unsuccessful outcome). In particular, unsuccessful outcomes were further categorized based on the type of error made by the participants. Four categories of errors were identified:

  • Timing errors included all of the unsuccessful outcomes caused by the participant not respecting the timing imposed by the system (eg, the participant uttered the waking command “Hey Google” and did not wait for the system to reply before prompting the full command)
  • Phrasing errors comprised all the failed attempts that followed an incorrect sequence of words to prompt the command (eg, the participant saying “Hey Google...put the red the lamp” instead of “Hey Google...make the lamp red”)
  • Comprehension errors referred to all mistakes participants made because they could not understand the experimenter’s request (eg, changing the color of the lamp instead of turning it off)
  • Pronunciation errors included all of the failures that followed a wrong articulation of one or more words within the sentence (eg, participants struggling to pronounce words that were not in their native language, such as Netflix)

Participants’ attempts could also be coded as self-corrections (with successful or unsuccessful outcome) when the participant realized autonomously that the command was wrong and tried to amend it.

To understand whether participants were able to prompt commands to the voice assistant, an overall performance index was computed expressing the percentage of successful attempts and the total number of attempts. Importantly, self-corrections with successful outcomes were considered successful attempts whereas self-corrections with unsuccessful outcomes were considered unsuccessful attempts.

Neuropsychological and Linguistic Assessment

Regarding the neuropsychological measures, not all participants were able to complete all of the subscales of the ACE-R. More specifically, several participants could not fully complete some items of the ACE-R (eg, drawing a clock) because of their physical impairments (eg, dystonia). However, since all participants could complete at least the items of the MMSE, only the MMSE score was considered in the multiple linear regression models, in addition to the FAB score. With regard to the linguistic assessments, all the collected measures were considered in the regression models.

Multiple Linear Regression Models

Data were statistically analyzed using RStudio software version 1.2 (RStudio PBC). To investigate which predictors of the performance index (participant performances during the use of the voice assistant) are best, multiple linear regression models were adopted. In order to make accurate predictions, we considered, among several models, the one that best described the data. The best model was identified by adopting a selection strategy based on the Akaike information criterion (AIC). The AIC value provides an estimation of the quality of a model given several other candidate models. The AIC considers both the complexity of a model and its goodness of fit. According to the AIC, given a set of models, the one characterized by the lowest AIC is the best [ 32 ].

The neuropsychological and linguistic predictors entered in the models were the MMSE score, FAB score, vocal intensity (dB), and scores obtained from the 2 items of the prosody subscale and 5 items of the articulation subscale of the Robertson Dysarthria Profile. More specifically, the linear regression models were performed entering the predictors grouped into four clusters: (1) neuropsychological cluster (ie, MMSE and FAB), (2) vocal intensity cluster (ie, dB), (3) prosody cluster (ie, speed and rhythm), and (4) articulation cluster (ie, initial consonants, vowels, groups of consonants, multisyllable words, and repetition of sentences). The latter two clusters consisted of the items in the Robertson Dysarthria Profile. Since the forced entry method was adopted, the order in which predictors were entered in the model did not affect the results.

The performance index extracted from the video analysis shows that participant accuracy was on average 58.5% (SD 18.6%). The most frequent type of errors made by participants were phrasing errors (75/182, 41.2%). Participants mainly had problems uttering long commands, especially when they were required to respect a specific syntax. It should be noted that uttering the right sequences of words was not problematic to the same extent for all participants, as one participant never made this type of error, while one made it 21 times.

Timing errors were the second most frequent type of error (74/182, 40.7%), and they can be clustered into anticipatory timing errors and delayed timing errors. More specifically, as for the anticipatory timing errors, participants tended not to wait for the system to reply to the waking command before prompting the actual command. For one participant, respecting the timing seemed particularly difficult, as they made this type of error 30 times. To a lesser extent, with regard to the delayed timing errors, participants waited too long after the system had replied to the waking command. In many cases, the actual command overlapped to the system prompting the error message “Sorry, I don’t know how to help you.”

Less frequent were the comprehension errors (19/182, 10.4%) and pronunciation errors (14/182, 7.7%). Regarding the former, participants mainly tended to misunderstand the most complex commands (eg, playing a video on YouTube). Regarding the latter, users had some difficulties with English words, like Netflix. Nevertheless, the system could successfully respond even when they had strong dialectal stress.

Overall, all participants enjoyed the interaction with the voice assistant. Indeed, the general evaluation of the system was extremely positive, with a mean score of 9.4 (SD 1.2). As for the rooms in which participants would like to install the voice assistant, 8 of them suggested the bedroom and 4 the kitchen. On the whole, all participants would like to have a voice assistant at their own house. Finally, with regard to the functions that participants would have liked to implement in their own house, they mentioned playing music (n=5) and controlling the home automation (n=5), such as opening/closing windows and doors.

Interestingly, during the interaction with the voice assistant, several participants provided their spontaneous opinions highlighting the benefits and drawbacks of the system. For instance, P3 stated: “Since my shoulder hurts, it is useful because it is easier when I have to open doors.” However, P3 claimed as well: “sometimes it does not understand me and I am afraid to crash the Google program.” Another participant mentioned some difficulties as well, especially concerning the general utility of having a voice assistant at home. P9 stated: “I cannot think as before [the accident], it is not so easy to have such a device at home, it might not be useful.”

Table 1 shows the raw scores from participants in the neuropsychological and linguistic assessments made in the second phase of the study. With regard to the neuropsychological scores, participants showed a mean MMSE score of 26.1 and a mean FAB score of 12.6. Concerning the linguistic assessment, participants had a mean vocal intensity of 61.6 dB. Finally, the mean scores of the Robertson Dysarthria Profile indicated speed of speech of 2.7, and rhythm of speech of 2.6. Overall, these scores indicated mild to moderate prosody difficulties. Finally, the mean scores of items measuring articulation abilities showed mild issues regarding the pronunciation of initial consonants (mean 3.3), vowels (mean 3.3), groups of consonants (mean 3.2), multisyllable words (mean 3.3), and the repetition of sentences (mean 3.1).

Summary of participant scores from the neuropsychological and linguistic assessments.

MeasureMean score (SD)
Mini-Mental State Examination26.1 (2.9)
Frontal Assessment Battery12.6 (3.8)
Vocal intensity (dB)61.6 (4.2)
 
 Speed of speech production2.7 (0.7)
 Rhythm of speech production2.6 (0.7)
 
 Initial consonants3.3 (0.6)
 Vowels3.3 (0.5)
 Groups of consonants3.2 (0.7)
 Multisyllable words3.3 (0.6)
 Repetition of sentences3.1 (0.6)

In order to identify the best model to predict participant accuracy (assessed as the performance index), several multiple linear regression models were considered. Multimedia Appendix 1 shows all estimated models with their respective AIC scores. Comparing the AICs in all the models, model ad (F 6,9 =4.91, P =.02, R 2 =.77), which included the neuropsychological and articulation clusters, was the best one (AIC 130.69; Multimedia Appendix 1 ).

When checking for the coefficients of this model, 2 predictors were found to explain a significant amount of the variance of accuracy. The predictors that significantly accounted for accuracy were the MMSE (β=6.16, t 9 =3.88, P =.004) and repetition of sentences (β=31.14, t 9 =2.71, P =.02). Of importance, among the nonsignificant predictors, 3 (ie, initial consonant, group of consonants, and multisyllable words) had a variance inflation factor (VIF) >10 (tolerance statistics: 1/VIF<0.1), showing multicollinearity [ 33 , 34 ]. As a consequence, a new model was performed, removing all the nonsignificant and collinear predictors by entering only the MMSE and repetition of sentences. The results confirmed the previous model, namely that the MMSE and repetition of sentences were significant predictors of accuracy: MMSE (β=3.70, t 13 =3.26, P =.006) and repetition of sentences (β=22.06, t 13 =4.16, P =.001). The AIC value of this final model was 130.11, showing that it was the best model compared with the previous ones ( Multimedia Appendix 1 ).

To test the assumptions of the linear regression model, diagnostic statistics were performed. The model met the assumption of independence (Durbin-Watson 2.29, P =.68). The Q-Q plot of standardized residuals suggested that the residuals were normally distributed. Tolerance statistics (1/VIF) indicated that multicollinearity was not a concern (MMSE tolerance .92; repetition of sentences tolerance .92).

The standardized values were .57 (MMSE) and .73 (repetition of sentences). The first value suggests that as the MMSE increases by 1 standard deviation (2.89 points), the performance index increases by 1 standard deviation as well (10.6%). This prediction is true only if the repetition of sentences is constant. The second standardized value predicts that every time the repetition of sentences improves by 1 standard deviation (0.6 points), the performance index increases of 1 standard deviation (13.6%). This interpretation is true only if the MMSE is fixed.

Principal Findings

This work aimed to investigate whether cognitive and/or linguistic functions could predict the user’s performance in operating an off-the-shelf voice assistant. To this end, a group of users suffering from motor and cognitive difficulties was invited to a living laboratory. The lab was purposefully equipped with a voice assistant connected to several smart devices (ie, TV, lamps, floor fan), and participants were asked to perform specific tasks following the experimenter’s instructions. In order to assess user performances, interactions with the voice assistant were video recorded. Cognitive and linguistic functions were assessed with standardized inventories and subsequently related to the user performances with the voice assistant.

The performance index was found to be predicted by the overall cognitive abilities, as assessed by the score on the MMSE and by the ability to repeat sentences. In other words, a minimum level of residual cognitive functioning (ie, MMSE score above the cutoff [≥24]) is recommended to effectively operate a voice assistant. Among the linguistic skills, the ability to repeat sentences was necessary. These findings contribute to provide specific indications of the level of inclusion of commercial voice assistants.

More generally, the average accuracy was around 60%, extending previous findings that were limited to synthesized utterances [ 17 ]. Different than the previous studies, we arranged a living lab and involved actual users with disabilities in a realistic group situation. This approach allowed us to identify and categorize the most prominent types of errors emerging during the interaction. The most frequent mistakes were phrasing errors (41.2%), highlighting the difficulties of participants to respect the syntax of the voice command, especially when the command was long. The second most frequent error consisted of the difficulty in respecting the timing imposed by the device (40.7%), as already reported by Pradhan and colleagues [ 7 ]. Specifically, participants uttered the command too quickly or too slowly, showing a tendency to ignore the feedback of the voice assistant. This was probably due to the lack of saliency of the feedback provided by Google Home after the waking command [ 35 ], which consists only of dim lights moving on top of the device. Additionally, two other types of errors emerged, relating to difficulty of comprehension of the request (10.4%) and pronunciation issues (7.7%). The latter was limited to English words. These findings suggest significant implications for the design of universal voice assistants. First, more salient feedback should be included to make it easier for users with disabilities to interact with the system. Additionally, the timing should be adjustable to better respond to the actual abilities of the user and adapt to their proficiency in using it over time. Finally, to increase the likelihood of users remembering how to operate the voice assistant, commands should include familiar words.

These results are particularly relevant because they provide new implications for the design of voice assistants using an inclusive design perspective that also considers users with special needs. On the other hand, these findings can provide an indication to caregivers, both family members and health care professionals, for choosing assistant technologies that are suitable for the people they assist. More specifically, the ability to interact and use voice assistants does not depend exclusively on linguistic skills, as it could seem. In fact, aspects related to cognitive functions, in particular the global level of cognitive functioning, seem to play a crucial role. Hence, linguistic and cognitive abilities predict performance with voice assistants. Users with severe cognitive impairment (MMSE score <18) [ 36 ] may still be able to use these systems effectively (performance index >50%) if their level of linguistic skills is normal (Robertson Dysarthria Profile = 4) [ 29 ], which somehow compensates for the cognitive difficulties. Similarly, a user with severe difficulty in articulating sentences may successfully use voice assistants if they have a normal level of cognitive functioning such that they can invoke compensative strategies. Taken together, these findings may serve as promising indicators to foresee the degree of accessibility of voice assistants. Importantly, the predictors employed in this study are extracted from standardized inventories that are highly widespread and administered in many clinical environments.

Finally, despite the mistakes, participants positively received the system and enjoyed their experience, consistent with the findings of Pradhan and colleagues [ 7 ]. Users found the system useful and reported that they would like to have it at their own houses. In addition, they suggested that such a system would be helpful in compensating for their difficulties with movements (eg, opening doors). The positive user opinions about the system revealed the general acceptance of voice assistants, highlighting the importance of using these mainstream systems in the field of assistive technologies in order to help users with disabilities regain some independence and increase their quality of life.

This study suggests that with specific and targeted adjustments a commercial voice assistant can be turned into an assistive technology that can effectively complement the individual’s skills. Indeed, voice assistants could offer tremendous benefits. First of all, these systems are widespread and inexpensive compared with assistive technologies, which are often harder to find and costly. Furthermore, assistive technologies can be stigmatizing. The fear of feeling exposed and feelings of autonomy and dignity loss are significant barriers to the adoption of assistive technology [ 37 ]. On the other hand, the popularity of voice assistants, as well as their appealing design, may make them really inclusive technology, being helpful to individuals with or without disabilities.

Limitations

We acknowledge that this study has some limitations. First, the sample size was limited to 16 participants. Therefore, further studies should extend our findings with larger and even more heterogenous samples. In addition, we have explored a likely use scenario, where users interact with the voice assistant in a group situation, as happens in shared living environments. Nevertheless, future experiments should also investigate a use scenario in which the user operates the system individually to examine more closely the interaction between the individual and the voice assistant.

In this work we report on a group experiment involving users with motor, linguistic, and cognitive difficulties that was meant to predict participant performances based on their level of cognitive and linguistic skills. Previous studies did not involve actual users or consider their capabilities. For the first time, we conducted an experiment in a living lab with individuals with disabilities and provide a detailed report of their performances and difficulties. More importantly, participant performances showed they could be predicted by their residual level of cognitive and linguistic capabilities. In addition, these results contribute to the field of assistive technology by describing the different types of errors made by users and providing design implications.

The enthusiastic reaction of participants highlights the potential of voice assistants to provide or return some autonomy in basic activities, like turning the light on/off when they are lying in bed. Further research effort should be devoted to fine-tuning voice assistants to better serve users’ needs and evaluating in the field to what extent the systems are actually helpful. To conclude, by polishing the existing widespread voice assistants, there will be the concrete opportunity to increase the quality of life of people with disabilities by providing them with truly inclusive technology.

Acknowledgments

This paper was supported by the project “Sistema Domotico IoT Integrato ad elevata Sicurezza Informatica per Smart Building” (POR FESR 2014-2020).

Abbreviations

ACE-RAddenbrooke’s Cognitive Examination–Revised
AICAkaike information criterion
dBdecibel
FABFrontal Assessment Battery
MMSEMini-Mental State Examination
VIFvariance inflation factor

Multimedia Appendix 1

Conflicts of Interest: None declared.

IMAGES

  1. (PDF) VOICE ASSISTANT: A SYSTEMATIC LITERATURE REVIEW

    voice assistant research paper

  2. Snapshot Paper

    voice assistant research paper

  3. Steps To Integrate Chatgpt Into Voice Assistants Ppt Slides Summary

    voice assistant research paper

  4. How to Insert your Voice / Analysis into a Research Paper

    voice assistant research paper

  5. (PDF) Earwitness Identification: Some Influences on Voice Recognition

    voice assistant research paper

  6. (PDF) AI-Based Voice Assistant Systems: Evaluating from the Interaction

    voice assistant research paper

COMMENTS

  1. (PDF) VOICE ASSISTANT: A SYSTEMATIC LITERATURE REVIEW

    work is a systematic review of the literature on V oice assistants (V A). Innovative mode of interaction, the V oice. Assistant definition is derived from advances in artificial intelligence ...

  2. Humanizing voice assistant: The impact of voice assistant personality

    A voice assistant (VA), a type of voice-enabled artificial intelligence, is no longer just a character in science fiction movies. Currently, voice is embedded in a variety of products such as smartphones (mobile applications) and smart speakers in consumers' homes. ... However, little research has taken note of the role voice interaction in a ...

  3. (PDF) Development of AI-based voice assistants using ...

    1. ABSTRACT. Voice assistants have become an integral part of our daily lives, enabling natura l and seamless. interactions with technology. Recent advancements in natural language processing (NLP ...

  4. Voice assistants in private households: a conceptual framework for

    The present study identifies, organizes, and structures the available scientific knowledge on the recent use and the prospects of Voice Assistants (VA) in private households. The systematic review ...

  5. (PDF) VOICE ASSISTANT USING PYTHON

    AI companion that listens and responds to your voice, offering. instantaneous acce ss to a wealth of informatio n, entertainment, and essential services. Key features distinguish PAA as an ...

  6. Artificial Intelligence-based Voice Assistant

    Voice control is a major growing feature that change the way people can live. The voice assistant is commonly being used in smartphones and laptops. AI-based Voice assistants are the operating systems that can recognize human voice and respond via integrated voices. This voice assistant will gather the audio from the microphone and then convert that into text, later it is sent through GTTS ...

  7. Enhanced AI Voice Assistance using Machine Learning and NLP

    The project's primary intent is to emphasize an Enhanced AI Voice Assistant that wields machine learning (ML) and Natural Language Processing (NLP) to carry through the tasks using voice commands. With a minimal, user-friendly graphical interface in the idea, the automated voice-controlled assistant has been designed to execute a comprehensive set of voice commands. This provides users with ...

  8. A Systematic Review of Voice Assistant Usability: An ISO 9241-11

    Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of VA, to the best of ...

  9. Voice Assistants

    AI-powered Voice Assistants (VAs) emerge as attractive facilitators of the increasing interactions between people and machines in the Metaverse. VAs are still at early stages of adoption, so this systematic literature review charts the landscape of existing VA research with a focus on their use in different context.

  10. PDF Voice Assistants and Smart Speakers in Everyday Life and in ...

    voice assistants will reach $19 billion by the same year according to Juniper Research (2017). The Alexa platform is the dominant market leader, with more than 70% of all intelligent voice assistant-enabled devices (other than phones), running the Alexa plat-form (Griswold, 2018). Voice assistants have several interesting capabilities such as:

  11. VOICE ASSISTANT USING PYTHON

    The primary goal of Artificial Intelligence is to make human interaction with computers and other electronic devices considerably easier and more practical. Personal assistant that can carry out activities for everyday needs with just a simple phrase is a rapidly increasing field nowadays. Many firms have leveraged dialogue systems technology to create Virtual Personal Assistants (VPAs) based ...

  12. Usability Evaluation of Artificial Intelligence-Based Voice Assistants

    Current State-of-Art of Voice-Assistants. Current research on voice-assistants stress on three main aspects: (a) improving the technology empowering these devices with an aim to provide better voice recognition, ability to understand multiple languages, providing human-like speech output, adding emotions to these devices, and likewise, (b) improving the privacy and security of these devices ...

  13. Frontiers

    Introduction. Voice-based artificial intelligence systems serving as digital assistants have evolved dramatically within the last few years. Today, Amazon Echo or Google Home is the most popular representatives of the fastest-growing consumer technology (Hernandez, 2021; Meticulous Market Research, 2021).On the one hand, voice assistants (VAs) engage human users in direct conversation through ...

  14. (PDF) Voice User Interface: Literature Review ...

    The main contributions of this paper are therefore: (1) overview of existing research, (2) analysis and exploration of the field of intelligent voice assistant systems, with details at the ...

  15. An Empirical Study of Older Adult's Voice Assistant Use for Health

    Also, prior work shows how voice assistant responses can provide misleading or inaccurate information and be harmful particularly in health contexts. Because of increased health needs while aging, this paper studies older adult's (ages 65+) health-related voice assistant interactions.

  16. A Systematic Review of Voice Assistant Usability: An ISO 9241-11

    Interactive design is the most effective experiment that provides real-time results employed [ 56, 79 ]. The ISO 9241-11 framework works well with effectiveness, efficiency, and satisfaction; however, the users have more expectations from the voice assistant with the recent advancement of VA capabilities.

  17. JARVIS: An interpretation of AIML with integration of gTTS and Python

    Abstract: This paper presents JARVIS, a virtual integrated voice assistant comprising of gTTS, AIML[Artificial Intelligence Markup Language], and Python-based state-of-the-art technology in personalized assistant development. JARVIS incorporates the power of AIML and with the industry-leading Google platform for text-to-speech conversion and the voice of the Male Pitch in the gTTS libraries ...

  18. Effectiveness of Using Voice Assistants in Learning: A Study at the

    Research studies therefore have emphasized the need for extensive research in this area [34,35]. 1.3. The Use of Voice Assistants: Applicability in Prevention of Learning Difficulties ... In addition, it is important to consider that the usage of a voice assistant could help students on their learning process, especially during the COVID-19 ...

  19. Voice Assistant Using Artificial Intelligence

    This paper describes the provocation of applying virtual assistant technology. The paper also introduces the application of virtual assistants that can help open up opportunities for humanity in various fields. Voice control is an important growing feature that will change people's lives.

  20. Comparative Analysis of Smart Voice Assistants

    Lately, voice-activated interfaces are becoming more popular, such as Amazon Alexa, Google Assistant, and Microsoft Cortana voice recognition applications. This paper represents the outcome of gauging these three intelligent voice assistant applications. Their answers are based on user questions and how the user perceives these three intelligent voice assistant applications. As per the survey ...

  21. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants

    Voice assistants are software agents that can interpret human. speech and respond via synthesized voices. Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant. are the ...

  22. Investigating the Accessibility of Voice Assistants With Impaired Users

    Users with motor, linguistic, and cognitive impairments can effectively interact with voice assistants, given specific levels of residual cognitive and linguistic skills. More specifically, our paper advances practical indicators to predict the level of accessibility of speech-based interactive systems.

  23. PDF Personal Desktop Voice Assistant

    Personal Desktop Voice Assistant - Research Paper 1Sakshi R Jain, 2Prof Feon Jason 1Master Student, School of CS & IT, Jain University, Bengaluru, India 2 Professor, School of CS & IT, Jain University, Bengaluru, India Abstract: The term "virtual assistant" refers to a software agent that can carry out tasks or provide services on behalf of a ...