FutureSmart AI Blog

Building a Conversational Voice Chatbot: Integrating OpenAI's Speech-to-Text & Text-to-Speech

Building a Conversational Voice Chatbot: Integrating OpenAI's Speech-to-Text & Text-to-Speech

Ved Vekhande's photo

Table of contents

Introduction, 1. install required libraries, 2. set up the .env file, 3. understanding the project structure, streamlit interface setup, handling voice inputs, chatbot response processing, speech_to_text function, text_to_speech function, get_answer function, chatbot interaction flow, additional resources.

Welcome to an engaging tutorial where we'll develop a voice-responsive chatbot utilizing OpenAI's advanced speech-to-text and text-to-speech services, all integrated within a Streamlit web application. This project is not just about textual interactions; it's about enabling a natural, voice-based dialogue with a chatbot.

For those who might not be familiar with OpenAI's capabilities in handling speech, I recommend watching my detailed video ( watch here ). It provides an excellent introduction to the speech-to-text and text-to-speech functionalities that are central to our project.

In this blog, we will walk through the entire process of setting up the development environment, incorporating OpenAI services into our application, and crafting a chatbot that can seamlessly converse with users using voice inputs and outputs.

Setting Up the Environment

To begin building our voice-responsive OpenAI chatbot, it's essential to set up the right development environment. This involves installing necessary libraries and configuring API access. Here's how you can get started:

Your chatbot relies on several Python libraries, as listed in the requirements.txt file. These libraries include Streamlit for the web interface, OpenAI for accessing speech processing services, and others for specific functionalities like audio recording. Install them by running the following command in your project directory:

Here's a quick breakdown of the key libraries:

streamlit : For building and running the web app.

openai : To access OpenAI's API for speech-to-text and text-to-speech services.

audio_recorder_streamlit : To record audio within the Streamlit app.

streamlit-float : Provides floating elements in the Streamlit interface.

Sensitive information such as your OpenAI API key should be stored in a .env file. This approach keeps your credentials secure. Create a .env file in the root of your project and include your OpenAI API key like this:

Ensure that this file is not shared publicly, especially if you are pushing your code to a public repository.

Your project primarily consists of two Python files:

app.py : This file contains the Streamlit web application logic. It's where you define the user interface and manage the flow of input/output for the chatbot.

utils.py : This file includes functions for processing speech-to-text and text-to-speech, as well as generating chatbot responses.

With your environment set up and a basic understanding of your project's structure, you're now ready to start building the chatbot's functionalities.

Building the Chatbot: Streamlit Interface ( app.py )

In this section, we dive into the construction of our chatbot, focusing on how the Streamlit interface is set up and how voice inputs are handled and processed in app.py .

Streamlit is a powerful tool that allows us to quickly build interactive web applications for our chatbot. In app.py , the Streamlit application is initialized and configured to handle user interactions:

In this setup, we initialize the Streamlit app, import necessary functions from utils.py , and set up the session state to track and manage chat messages. The float_init() function from streamlit_float is used to create floating elements, enhancing the user interface.

The core functionality of our chatbot is its ability to handle voice inputs. This is achieved using the audio_recorder_streamlit library, which allows us to record audio directly in the Streamlit interface:

The audio_recorder() function captures audio input from the user. Once the audio is recorded, it's processed to extract the spoken text:

Here, we write the recorded audio to a file and then use the speech_to_text function from utils.py to convert it into text. The transcribed text is then added to the session state for the chatbot to process.

Once a user's voice input is converted to text, the chatbot processes this input to generate a response:

In this part of the code, the get_answer function is used to generate a text response based on the user's input. This response is then converted to speech using the text_to_speech function, and the audio is played back to the user.

Integrating OpenAI's Services ( utils.py )

In utils.py , we have defined key functions that integrate OpenAI's speech-to-text and text-to-speech services, along with the logic for generating chatbot responses. Let's explore these functions in detail.

The speech_to_text function is responsible for converting the audio input from the user into text. This is a critical step in enabling the chatbot to understand and process user queries:

In this function, the audio file captured from the user is opened and sent to OpenAI's speech-to-text service. The service transcribes the audio into text using the Whisper model, which is known for its high accuracy in speech recognition. The transcribed text is then returned for further processing by the chatbot.

Conversely, the text_to_speech function takes the chatbot's textual response and converts it into an audio format, allowing the chatbot to 'speak' back to the user:

Here, the chatbot's response text is converted into speech using OpenAI's text-to-speech service. The output is saved as an audio file, which is then played back to the user, creating an audio response.

The get_answer function generates the chatbot's responses to user inputs. It uses OpenAI's language models to create contextually appropriate and conversational replies:

In this function, the conversation history is combined with a system message defining the chatbot's role. This data is then sent to OpenAI's conversational AI model, which generates a response based on the input and context.

The interaction flow of the chatbot, as orchestrated in app.py , is a seamless integration of these functionalities. When a user speaks to the chatbot, the audio is recorded and converted to text using speech_to_text . The chatbot then processes this input with get_answer to generate a response. Finally, this response is converted back into speech using text_to_speech , allowing the chatbot to audibly communicate with the user. This flow creates a natural and interactive conversational experience, showcasing the potential of integrating advanced AI and speech processing technologies in a user-friendly application.

As we wrap up our exploration of building a voice-responsive OpenAI chatbot with Streamlit, let's reflect on what we've accomplished and the potential for further development.

Reflecting on the Project

This project demonstrates the power and versatility of integrating advanced AI services into a user-friendly application. By combining OpenAI's speech-to-text and text-to-speech capabilities with Streamlit, we've created a chatbot that can understand spoken language and respond in kind. The key functionalities we've implemented, such as handling voice inputs, generating intelligent responses, and speaking back to the user, exemplify how AI can be used to create more natural and engaging user interfaces.

For a detailed walkthrough of this project and a practical demonstration, make sure to watch my YouTube video . Also, you can access the complete code and documentation on my GitHub repository .

If you're curious about the latest in AI technology, I invite you to visit my project, AI Demos, at aidemos.com . It's a rich resource offering a wide array of video demos showcasing the most advanced AI tools. My goal with AI Demos is to educate and illuminate the diverse possibilities of AI.

For even more in-depth exploration, be sure to visit my YouTube channel at youtube.com/@aidemos.futuresmart . Here, you'll find a wealth of content that delves into the exciting future of AI and its various applications.

How to build a text and voice-powered ChatGPT bot with text-to-speech and speech-to-text capabilities

Norah Sakal

Chatbots are becoming increasingly popular for providing quick and efficient customer support, answering questions, and helping users navigate through complex tasks.

In this blog post, I'll walk you through the process of building an AI-powered chatbot using ReactJS, OpenAI API and AWS Lambda.

The chatbot is designed to engage in conversation based on the user's input.

Do you want the full source code? This tutorial is quite extensive, and I've prepared the entire source code. Visit this page to download the entire source code, you'll get instant access to the files, which you can use as a reference or as a starting point for your own voice-powered ChatGPT bot project.

Here's the simple chatbot interface we'll be building together:

Chatbot app overview

Here are some of its key features:

1. Text-based interaction Users can type their questions

2. Voice input and output Users can send voice messages and the chatbot can transcribe them and reply with both text and audio responses

3. Context-aware conversations The chatbot leverages the OpenAI ChatGPT API to maintain context during the conversation. Which enables coherent interactions.

We'll be using the following technologies:

1. ReactJS A popular Javascript library for building user interfaces.

2. OpenAI API Powered by GPT-3.5-turbo to generate human-like responses

3. AWS Lambda A serverless compute service, where we can run our backend code without provisioning or managing servers. We'll use Lambda to handle audio transcription, text-to-speech, and calling the OpenAI API.

4. Material UI A popular React UI framework with components and styling.

5. ElevenLabs API A powerful API developed by ElevenLabs that offers state-of-the-art text-to-speech, voice cloning, and synthetic voice designing capabilities.

In the upcoming sections, I'll guide you through the entire process of building the chatbot, from setting up the frontend and backend to deploying the chatbot.

Let's get started!

1. Create a new ReactJS app ​

To begin, start by creating a parent folder for your new chatbot project, we'll create this folder structure in the next step:

Folder structure

Navigate to the location you'd like to have your project and then run the following command in your terminal or command prompt:

Replace your-project-name with the desired name for your chatbot project. Then navigate to that new folder by running the following command:

Then, let's create a new ReactJS app using create-react-app .

This command-line tool helps us quickly set up a React project with the necessary build configurations and folder structure.

Run the following command in your terminal to create the app:

After the project is created, navigate into the folder and start the development server:

This command will launch a new browser window, showing the default React app starter template:

Now that our React app is up and running, let's install the required libraries.

2. Install libraries ​

We'll need several libraries for our chatbot project. First, we'll use Material UI (MUI) v5 for styling and UI components. MUI is a fully-loaded component library and design system with production-ready components.

To install MUI , run the following command in your project folder frontend that got created earlier:

Additionally, we'll install MUI's icon package, which provides a set of SVG icons exported as React components:

Next, we'll need a library to handle microphone recording and output the audio as an mp3 file. For this guide, we'll use the mic-recorder-to-mp3 library, but you can pick any library that will record your microphone and output an audio file in mp3 :

The mic-recorder-to-mp3 library also enables playback of recorded audio, which is a useful feature for our chatbot.

Finally, let's install aws-amplify . This library will help us send the recorded audio to our backend using AWS Amplify:

With all the necessary libraries installed, we're ready to start building the audio recording functionality for our chatbot app.

3. Create the chat interface components ​

In this section, we'll build the components needed for a simple chatbot interface that allows users to record audio, stop recording, playback the recorded audio, and upload the audio to the backend:

We'll create the following components for the chat interface:

1. ChatHeader - to display the chatbot title and header information 2. ChatMessages - to display chat messages exchanged between the user and the chatbot 3. AudiControls - to provide the user audio control, including recording and unloading audio. 4. MessageInput - to provide the user text input option 5. ResponseFormatToggle - to provide the user the option to receive audio responses in addition to text responses

Let's start by changing the title of the app. Open up public/index.html and change the title tag to your desired name:

Create React app comes with reloading and ES6 support, so you should already see the changes in the browser tab:

New title

Let's now set up our App.js file.

Open App.js from your src folder and remove all the code within the return statement.

Also, delete the default logo and import React and the hook useState . Your App.js file should now look like this:

Now, let's import the necessary MUI components, such as Container and Grid .

Wrap your app with a Container component and add a maxWidth of sm to keep the window narrow for the chat interface. Additionally, add some padding to the top.

Your App.js should now look like this:

3.1. Create the ChatHeader component ​

The ChatHeader component will display the chatbot title and any relevant header information. This component will be positioned at the top of the chat interface.

Start by creating a ChatHeader component inside your App function, we'll use Typography , so import that component from MUI :

Then, define the ChatHeader component with a headline for the chatbot:

The Typography component from MUI is used to display text in a consistent and responsive manner. The variant prop sets the font size and style, while align adjusts the text alignment, and gutterBottom adds a bottom margin to create a space below the headline.

Next, include the ChatHeader component in the return statement of the App function:

By adding the ChatHeader component to the Container , it is now integrated into the overall layout of the application.

Your app should now look like this:

New title

3.2. Create the ChatMessages component ​

The ChatMessages component will display chat messages exchanged between the user and the chatbot. It should update dynamically as new messages are added:

ChatMessages component

First, create an initial greeting message from the chatbot inside your App function:

Then, import the useState hook and save the mockMessages to the state:

Each object in the messages array will have 3 keys:

  • role determines if it is the chatbot or the user talking
  • text key is the text shown in the app
  • content is the text we'll use to send to the backend to create a completion.

The text key will store both text and React components, which we'll get back to later.

Import necessary components from MUI , such as List , ListItem , ListItemText , Box , and Paper :

To style our components, import useTheme and styled from MUI :

Before creating the chat area, define three styles for the chat messages: one for user messages , one for agent messages and one for the MessageWrapper that wraps both messages inside your App function.

The user messages style should use the audio prop to adjust the padding for the audio icon:

Then create the styling for the Agent messages:

Finally, let's create the styling for the MessageWrapper that wraps both the agent messages and the user messages:

Each message in the ChatMessages will have a play icon if any audio is available, so we'll import a fitting icon component:

Now, create a ChatMessages component that displays the messages from the messages array:

Then add a useTheme hook to access the MUI theme:

To improve user experience, we want the chat window to automatically scroll to the bottom whenever a new message is added to the conversation.

Start by importing useEffect and useRef hooks from React:

useEffect allows us to run side effects, such as scrolling the chat window, in response to changes in the component's state or properties. useRef is used to create a reference to a DOM element so that we can interact with it programmatically.

Continue with defining a local variable bottomRef in the ChatMessages component, to create a reference to the bottom of the chat window:

Then create the scrollToBottom function, which will be responsible for scrolling the chat window to the bottom:

This function first checks if bottomRef.current is defined. If it is, it then checks if the scrollIntoViewIfNeeded function is available. If available, it smoothly scrolls the chat window to the bottom using scrollIntoViewIfNeeded . scrollIntoViewIfNeeded is only supported by some browsers, for example not by Safari. So if it's not available, it uses the scrollIntoView function instead, which is more widely supported, to achieve the same effect.

Next, add a useEffect hook that triggers the scrollToBottom function whenever the messages prop changes:

This will ensure that the chat window always scrolls to the bottom when new messages are added to the conversation.

Then finally create the components where the chat messages will be displayed in the return statement of ChatMessages :

Lastly, add the bottomRef to your List component to make the auto-scrolling functionality work:

By adding the bottomRef to an empty <div> at the end of the List component, we can now programmatically scroll the chat window to the bottom whenever new messages are added to the conversation.

Let's break down what we're doing in the ChatMessages component in detail.

We start by defining the ChatMessages component, which takes the messages prop. We also use the useTheme hook to access the Material-UI theme:

We then wrap the chat area with a Container component. Inside the container, we use a Box component with specific styles for width, margin, maximum height, and overflow. This ensures that the chat area has a fixed height and scrolls if there are more messages than can fit in the available space.

We then use a Paper component with an elevation of 0 to remove a raised effect, making the chat area stand out from the background. We also add some padding to the Paper component.

Inside the Paper component, we use a List component to hold the chat messages:

We iterate over the messages array and create a ListItem component for each message. We then set the padding of the ListItem to 0 and provide a unique key using the index. We then use ListItemText component to display the message content.

We conditionally align the message based on the role using the MessageWrapper component. The MessageWrapper component uses the align prop to justify the content to either

- flex-end for user messages or ​

- flex-start for agent messages ​.

We conditionally apply the UserMessage or AgentMessage styling based on the role .

We pass the Material-UI theme and the audio prop, if available, to the UserMessage component. If the message has associated audio, we display an IconButton component with the VolumeUpIcon . The IconButton has an onClick event that plays the audio when clicked.

The same structure is applied to the AgentMessage component. The styling for the AgentMessage is slightly different, but the functionality remains the same.

In summary, the ChatMessages component is responsible for displaying chat messages in a styled, scrollable area. It takes an array of messages and iterates over them, creating a list of messages aligned based on the role , user or agent.

It also displays an audio icon for messages with associated audio, allowing users to play the audio by clicking the icon.

Now we're ready to include the ChatMessages component in our return statement of the App function, your return statement should look like this now:

Your app should now look like this with a greeting message:

App with agent greeting

Let's go ahead and create the audio controls in the next segment.

3.3 Create the AudioControls ​

The next step is to create the audio controls:

App with audio controls

Start by importing the MicRecorder library:

Then, go ahead and define the function outside the App function and create four new state variables:

AudioControls is placed outside of the App function to encapsulate its state and logic, making the component reusable and easier to maintain. The separation concerns also helps prevent unnecessary re-renders of the App component when state changes occur within the AudioControls component. By defining the AudioControls component outside of the App function, you can more efficiently manage the state related to recording, playing, and uploading audio, making your application more modular and organized.

We'll have four buttons in the AudioControls component:

1. Start a recording 2. Stop the recording 3. Play the recording 4. Upload audio

App with audio controls parts

For the icon buttons, we'll need a microphone icon and a dot, import those icon components:

Also, import the Button component from MUI :

Let's create the function for starting an audio recording inside the AudioControls function:

Let's break down what we're doing in the function. We're declaring an asynchronous function using async :

This allows us to use the keyword await within the function to handle the Promise from MicRecorder .

The next step is to create a new instance of MicRecorder with a bitrate of 128 kps. The bitrate option is specifying the quality of the recorded audio. A higher bitrate means better quality but a larger file size:

Then we're calling the start() method on the newRecorder instance to start recording in a try block:

The await keyword is used with newRecorder.start() to pause the function's execution until the Promise resolves or rejects.

If the audio recording start successfully, the Promise resolves and proceeds to update the React component's states:

The setIsRecording(true) call sets the isRecording state to true, indicating that the recording is in progress.

The setRecorder(newRecorder) call sets the recorder state to the newRecorder instance, so it can be used later to stop the recording.

If the start() method fails, which could be due to permission issues or the microphone being unavailable, then the catch block gets executed:

This block logs the error and shows an alert so you can troubleshoot the issue.

Let's also crate the function for stopping the audio recording:

Here's the breakdown of what we did; starting with declaring the function as an asynchronous function with the async keyword to handle Promises:

Then we added the try block to attempt to stop the recording and get the MP3 data:

The await keyword is used with recorder.stop().getMp3() to pause the function's execution until the Promise is resolved or rejected.

If the Promise is resolved, the buffer and blob variables are assigned values returned by the getMp3() method.

Then we converted the recorded audio into an MP3 file:

In this code, the File constructor is used to create a new File object with the audio data, the name voice-message.mp3 , the appropriate file type and the last-modified timestamp.

The MP3 file is then used to create a new Audio object, which can be played back:

The URL.createObjectURL(file) method creates a URL representing the file, and the new Audio() constructor creates a new Audio object using that URL.

The setPlayer(newPlayer) call updates the React component's player state with the new Audio object.

In the next step, we update the React component's state:

The setIsRecording(false) call sets the isRecording state to false , indicating that the recording is no longer in process.

If the stop().getMp3() method fails, which could be due to an issue with the recorder, the catch block is executed:

Let's also create the function for playing a recording:

Now that we have the audio control functions ready, we can create the AudioControl component:

We're ready to include the AudioControls component in the return statement of the App function, and your return statement should now look like this:

If you look at your app, you'll see that one button is missing: upload audio :

App with audio controls missing a button

We'll create it in the coming sections, but first, let's first create the logic for switching between audio and text.

3.4 Create the audio response toggle ​

In this section, we'll build the ResponseFormatToggle , which allows users to decide if they want an audio response in addition to the text response:

App with the audio response format

Just like we did with AudioControls , we'll place the ResponseFormatToggle outside of the App function to encapsulate its state and logic, making the component reusable and easier to maintain.

First, add the isAudioResponse and setIsAudioResponse variables to your main state:

Next, create the ResponseFormatToggle component outside of the App function and pass the variables as props:

Define the function for handling the toggle change in the ResponseFormatToggle function:

We'll need to import two new MUI components; FormControlLabel and Switch :

Now, create the component for the toggle, and your ResponseFormatToggle should now look like this:

Finally, add the ResponseFormatToggle to the return statement of the App function:

Your app should now display a functioning toggle button:

App with toggle button

With the toggle button in place, we're ready to create the missing UploadButton :

3.5 Create the upload button ​

The SendButton is part of the AudioControls component and is responsible for uploading the audio file to the backend.

To keep the user informed while the audio is being sent and processed in the backend, we'll create a new component, ThinkingBubble , that pulses while the chatbot is "thinking".

Both ThinkingBubble and SendButton are placed outside of the App function to encapsulate its state and logic, making the component reusable and easier to maintain.

To create the pulse motion, we'll need to import keyframes from MUI :

Then define the pulse motion outside of your App function:

We'll use the MoreHorizIcon for the thinking bubble, so import it from MUI :

Now, create the ThinkBubbleStyled component with the pulse animation below the pulse definition:

Finally, create the ThinkingBubble component:

This ThinkingBubble will be styled with MUI so it needs to define the theme .

Now we're ready to create the SendButton component, begin by defining it with a useTheme hook:

Continue by creating a function in the SendButton for uploading the audio file to the backend, which starts a check for if an audio file exists:

Before we add the backend API call function, let's create a helper function that will create the message objects needed as ChatGPT prompt. Add this function in the main application since we'll use it for components both within and outside of the App function:

Make sure to add filterMessageObjects as props and SendButton should now have two props:

This function maps the messages and creates a new array with only the role and content keys. For the backend call itself, we'll use Amplify which we installed earlier, go ahead and import the library:

The next step is adding the async backend call and your uploadAudio function should now look like this:

Let's break down how the uploadAudio function is built and examine each step in detail:

1. Check if an audio file exists The function starts by checking if an audioFile exists. If not, it logs a message and returns early to prevent further execution.

2. Create a FileReader instance A new FileReader instance is created to read the audio file's content. The reader.onloadend event is used to handle the file reading completion. It's an async event to ensure that the reading process is complete before proceeding:

3. Convert the audio file to Base64 The reader.result contains the audio file's content in Base64 format. This is needed for further processing and transmitting the file to the backend:

4. Generate a unique message ID To uniquely identify messages, generate a unique ID based on the current timestamp. We're doing this to keep track of a placeholder message (the pulsing ThinkingBubble ) while the backend is processing the audio file:

5. Create message objects Use the filterMessageObjects helper function to create an array containing only the role and content keys for each message:

6. Add the user's audio message Update the messages array with the new audio message, including its role, content, audio, text, and the unique ID:

The unique ID is used later to update the content key with the transcribed audio message from the backend

7. Add the thinking bubble Display the ThinkingBubble component to indicate that the chatbot is processing the user's input:

We'll add the key thinking to keep track of the object for when we're removing it from the array later.

8. Make the backend call Use the API.post method from Amplify to send the Base64 audio file, message objects, and the isAudioResponse flag to the backend for processing:

9. Remove the thinking bubble Once the response is received, remove the ThinkingBubble component from the message array:

10. Read the audio file Lastly, initiate the process of reading the audio file using the reader.readAsDataURL(audioFile) method:

Let's update the SendButton component to include the necessary isAudioResponse , messages and setMessages as props:

Let's also create the Button component, we'll need the CloudUploadIcon , so start by importing it and then add the Button component to the return statement of the SendButton :

Now that the SendButton component is complete, incorporate it into the AudioControls component created earlier:

Since SendButton requires the props isAudioResponse , filterMessageObjects , messages and setMessages , make sure to include them in both the return statement for AudioControls :

Also add isAudioResponse , filterMessageObjects , messages and setMessages as props for AudioControls :

With these updates, your SendButton component receives the necessary props and is now integrated into the AudioControls component.

Your app should now have an Upload Audio button:

App with audio controls with upload audio button

Now you have a functional SendButton component that uploads the audio file to the backend and displays a ThinkingBubble component while the chatbot processes the user's input. Once the response is received, the ThinkingBubble is removed, and the assistant's response is displayed.

3.6 Create the message input ​

For this guide, we're giving the users the option to send both audio and text messages. Let's create the last component, the MessageInput , which will allow users to type and send text messages.

Start by defining a message variable in the main App function:

Then continue with defining the component outside of the App function:

This component will need to send the isAudioResponse flag to the backend, so add it as props:

Also, add the variables message and setMessage as props:

Next, create a function to handle the text input change, and place this function inside the MessageInput function:

Now, add a function that sends the text message to the backend, and place it inside the App function:

The handleSendMessage function uses the theme so let's add a useTheme hook to access the MUI theme in the main App function:

Let's break down what we're doing in handleSendMessage and examine each step in detail:

1. Check if the message is not empty The function starts by checking if the message is not an empty string (ignoring leading and trailing whitespaces). If it's empty, the function will not process further:

2. Add the user's text message Update the messages array with the new text message, including its role, content, text and audio:

3. Clear the input field Clear the input field to allow the user to enter a new message after the response:

4. Add the thinking bubble Display the ThinkingBubble component to indicate that the chatbot is processing the user's input.

5. Create message objects Use the filterMessageObjects helper function to create an array containing only the role and content keys for each message. Then, push the new text message into the array:

6. Make the backend API call Use the API.post method from Amplify to send the text message, message object, and the isAudioResponse flag to the backend for processing:

7. Remove the thinking bubble Once the response is received, remove the ThinkingBubble component from the messages array:

8. Catch any errors If there are any errors while sending the text message to the backend, log the error message and show an alert :

The handleSendMessage function is now handling sending the text message, updating the UI with a thinking bubble, and making a backend API call to process the user's input.

To add functionality for listening to a key event within the MessageInput component, define the handleKeyPress function:

The handleKeyPress function checks if the Enter key is pressed. If so, it calls the handleSendMessage function, triggering the message-sending process.

Add the handleSendMessage as props in MessageInput , and it should now look like this:

We now just need to add a TextField so the user can use it to type and send a text message. Start by importing the TextField component from MUI :

And then import the SendIcon :

Then add the TextField and IconButton within the return statement of the MessageInput component:

Lastly, add the MessageInput component in the return statement above the ResponseFormatToggle in your App function:

If you check your app, you should now see a text input field where you can type a text message:

App with text input

3.7 Create the backend response handling ​

Before we can start to build the backend, there is one last function we'll need to build; handleBackendResponse . This function is responsible for transforming the backend response into the format required by the ChatMessages component and is placed inside the App function.

Start by defining the function:

We have two arguments: the backend response and id . The id is used to track the user message when it is an audio file and has been transcribed.

Whenever a user sends an audio message, the placeholder chat message is 🎤 Audio Message So when the audio has been transrcibed into text, we want to add it to the messages to be able to keep track of what the user said to the chatbot. That's why we're keeping track of the chat message id

The backend response will have three keys:

The generated text (the ChatGPT answer) The generated audio (if isAudioResponse is true ) Transcription of the message

Create local variables of each response key:

Next, let's create an audio element if it is present:

Now, create an AudioMessage component. This chat message can be clicked on by the user if audio is present:

The final step is to add a conditional statement for updating the messages array, put it below the AudioMessage component:

Let's break down the conditional statement within the handleBackendResponse function and examine each step in detail:

1. Check if id is present The conditional statement checks if the id argument is provided. If id is present, it means the message is an audio transcription, and we need to update the existing message with the transcribed text. If id is not present, we directly add the chatbot's response to the messages array:

2. Update the existing message with the transcription If id is present, we iterate through the messages array using the map function. For each message, if the message's id matches the provided id , we create a new message object with the same properties and update its content with the transcription:

3. Add the chatbot's response to the updated messages array Next, we add the chatbot's response, including the generated text, audio element, and AudioMessage component, to the updateMessages array:

4. Set the updated messages array The setMessages function is called with the updated messages array, which contains the transcribed message and the chatbot's response:

5. Directly add the chatbot's response when no id is involved If the id is not present, we don't need to update any existing messages. Instead, we directly add the chatbot's response, including the generated text, audio element and AudioMessage component, to the messages array:

The entire process ensures that the messages array is updated correctly, whether the user input is transcribed audio message or a simple text message.

Finally, you'll need to call the handleBackendResponse function in two locations within your code:

1. After removing the thinking bubble in the SendButton component

Add handleBackendResponse as a prop and call the function:

2. After removing the thinking bubble in the handleSendMessage component

Add a call to the handleBackendResponse function:

After adding handleSendMessage as a prop, update the AudioControls :

Also, update the return statement to this:

We're all set to start building our backend in Python.

4. Create an AWS account ​

In this guide, we'll use AWS Lambda for the Python backend, powered by AWS API Gateway to handle the REST calls. We'll create the Lambda with our Python code using the Serverless framework.

To begin, you'll need to create a new AWS account if you don't already have one.

1. Visit https://aws.amazon.com and click Sign In to the Console : ​

AWS website

2. Click Create a new AWS account : ​

AWS create a new AWS account

3. Complete the signup process: ​

Finalize signup process

Before proceeding, create a billing alarm to ensure you receive a notification if your bill increases unexpectedly. Follow these steps to set up a billing alarm: AWS docs

4. Next, create a user on your account. In the top menu, type IAM , then click on IAM from the dropdown: ​

AWS console

5. Click on Users in the left menu: ​

AWS IAM console

6. Click on Add users : ​

AWS IAM users

7. Choose a username for the new user and click Next : ​

IAM new user

8. Set up permissions for the new user. Click on Attach policies directly : ​

IAM attach policies

9. Scroll down and type admin in the search field, then select AdministratorAccess : ​

IAM admin policies

10. Scroll to the bottom of the page and click Next : ​

IAM admin policies

11. Review the policies, then scroll down to the bottom of the page and click Create user : ​

IAM review

12. Click on the user you just created: ​

IAM new user

13. Click on Security credentials : ​

IAM new user

14. In the Security credentials menu, scroll down to the Access keys section and click Create access key : ​

IAM user access key

15. Choose Command Line Interface and scroll down to the bottom of the page, then click Next : ​

IAM user cli

16. Optionally, add tags for the new user, then click Create access key : ​

IAM user tags

17. You've now reached the final step of creating a new IAM user. Be sure to save the access key and the secret access key : ​

IAM access key

Either copy the keys and store them in a secure location or download the CSV file. This is crucial since you won't be able to reveal the secret access key again after this step.

Make sure to save both the secret access key somewhere safe, since you won't be able to reveal it again after this step.

We'll configure your AWS user in the next step so make sure to have both the access key and the secret access key available for the next step.

5. Set up AWS CLI and configure your account ​

In this section, we'll guide you through installing the AWS Command Line Interface (CLI) and configuring it with your AWS account.

5.1 Install AWS CLI ​

First, you'll need to install the AWS CLI on your computer.

Follow the installation instructions for your operating system: AWS docs

After the installation is complete, you can verify that the AWS CLI is installed by running the following command in your command prompt:

You should see an output similar to this:

5.2 Configure your AWS CLI ​

Now that the AWS CLI is installed, you'll need to configure it with your AWS account. Make sure you have your access key and the secret access key from the previous section.

Run the following command in your terminal or command prompt:

You'll be prompted to enter your AWS credentials:

- AWS Access Key ID [None]: Enter your access key ​

- aws secret access key [none]: enter your secret access key ​.

Next, you'll need to specify a default region and output format. The region is where your AWS resources will be created. Choose the region that's closest to you or your target audience.

You can find a complete list of available regions and their codes in the AWS documentation: https://docs.aws.amazon.com/general/latest/gr/rande.html

For example:

- Default region name [None]: Enter your desired region code, such as us-east-1 ​

- default output format [none]: enter the output format you prefer, such as json ​.

Your AWS CLI is now configured, and you can start using it to interact with your AWS account.

In the next section, we'll create an AWS Lambda function and configure it with the necessary resources to power your chatbot's backend.

6. Set up a Serverless project with a handler.py file ​

This section will guide you through creating a new Serverless project with a handler.py file using the Serverless Framework. The handler.py file will contain the code for your AWS Lambda function, which will power your chatbot's backend.

6.1 Install the Serverless Framework ​

First, you need to install the Serverless Framework on your computer. Make sure you have Node.js installed and then run the following command in your terminal or command prompt:

After the installation is complete, verify that the Serverless Framework is installed by running the following command:

You should see output similar to this:

6.2 Create a new Serverless project ​

Now that the Serverless Framework is installed, you can create a new Serverless project. first, navigate to the folder we created for the project my-chatbot-project :

Then run the following command in your terminal or command prompt:

We're using backend here to create a new project in a folder called backend.

Then navigate to the new backend folder by running:

Inside the folder, you'll find two files:

- handler.py : This is the file that contains your AWS Lambda function code ​

- serverless.yml : this is the configuration file for your serverless project, which defines the resources, function, and events in your application ​, 6.3 configure the serverless.yml file ​.

In this section we'll walk through the serverless.yml file configuration, explaining the purpose of each part.

Open the serverless.yml file in your favorite text editor or IDE. You'll need to customize this file to define your chatbot's backend resources, function, and events.

Replace the current code with the following:

Let's break down and explain the purpose of each part.

1. Service name

This line defines the name of your service, which is used by the Serverless Framework to group related resources and functions. In this case, make sure to replace your-service-name with your own name.

2. Provider configuration

This section specifies the cloud provider, in our case AWS, and sets up some basic configurations:

- name : The cloud provider for your service aws ​

- runtime : the runtime for your lambda function python3.9 ​, - stage : the stage of your service deployment dev - you can use different stages for different environments (e.g. development, staging, production) ​, - region : the aws region where your service will be deployed us-east-1 . make sure to select a region that supports the required services and is closest to your users for lower latency ​.

This section lists any Serverless Framework plugins you want to use in your project. In this case, we're using the serverless-python-requirements plugin to automatically package and deploy any Python dependencies your Lambda function requires.

4. Functions

This section defines the Lambda functions in your service:

- chatgpt-audio-chatbot : The name of the Lambda function ​

- handler : the reference to the function within your handler.py file handler.handler - this tells the serverless framework to use the handler function defined in the handler.py file ​, - timeout : the maximum time your lambda function is allowed to run before it's terminated, in seconds. we've set it to 30 seconds. ​, - events : the events that trigger your lambda function. in this case, we've set up an http event, which is triggered by a post request to the /get-answer endpoint. the cors: true setting enables cors (cross-origin resource sharing) for this endpoint, allowing requests from different origins (e.g. your frontend application) ​.

Now that you have a better understanding of the serverless.yml file, you can customize it to suit the future needs of your chatbot's backend.

In the next section, we'll walk through implementing the Lambda function in the handler.py file.

7. Create the python backend ​

7.1 import necessary libraries ​.

Open up the handler.py file, delete all the prewritten code and let's start by importing the necessary libraries:

We'll need json to parse and handle JSON data from incoming requests and to generate JSON responses.

base64 will be used to encode and decode the audio data sent and received in requests.

The io library is necessary for handling the in-memory file-like objects used in the audio transcription process.

The openai library enables us to interact with the OpenAI API for transcribing and generating text, while requests will be used to make HTTP requests to the Eleven Labs API for converting text to speech.

7.2 Add your OpenAI API key ​

Next, let's add our OpenAI API key. Here're the steps for getting your OpenAI API key if you don't already have it.

Go to https://beta.openai.com/ , log in and click on your avatar and View API keys :

Open AI API keys

Then create a new secret key and save it for the request:

Create OpenAI API key

Remember that you'll only be able to reveal the secret key once, so make sure to save it somewhere for the next step.

Then add your API key to the handler.py file below the library imports:

And initiate the OpenAI libraary with your API key:

7.3 Create function to transcribe audio ​

For the user audio, we'll need to create a function that takes the audio data as input and returns the transcribed text using the OpenAI API. This function will be called transcribe_audio and will accept a single argument, audio_data .

Add the function to handler.py :

In this function, we first create a file-like object using io.BytesIO by passing in the audio_data . We then add a name attribute to the BytesIO object to indicate that it is an MP3 file.

Next, we call the openai_client.audio.transcriptions.create method, providing the model whisper-2 , the audio_file object, and specifying the language as en (English).

The API call returns a response containing the transcribed text, which we extract and return from the function.

7.4 Create function to generate a text reply ​

Once we have the audio transcribed, we'll need to create a function that calls the OpenAI API to generate a chat completion based on the user message.

Let's create the function generate_chat_completion to achieve this:

Now, let's break down the function:

1. The generate_chat_completion function takes a single argument, messages , which is a list of message objects from the frontend.

2. We call the openai_client.chat.completions.create method to generate a chat completion using the gpt-3.5-turbo model. We pass the messages list as an argument to the method. The messages are formatted in the frontend and should be a list of dictionaries, each containing a role , either system , user , or assistant , and content , which is the message text. We've also added the max_tokens parameter and set it to 100 .

When generating chat completions using GPT, you might want to limit the length of the responses to prevent excessively long answers. You can do this by setting the max_tokens parameter when making the API call. In our generate_chat_completion function, we've added the max_tokens parameter and set it to 100. By setting max_tokens to 100, we limit the response to a maximum of 100 tokens. You can adjust this value according to your requirements. Keep in mind that if you set it too low, the generated text might be cut off and not make sense to users. Experiment with different values to find the best balance between response length and usability.

3. The API call returns a response that contains a list of choices, with each choice representing a possible chat completion. In our case, we simply select the first choice response.choices[0] .

4. Finally, we extract the content of the message from the first choice using response.choices[0].message.content .

With this function in place, we can not generate a text reply based on the transcribed user audio and any other messages provided in the messages list.

7.5 Create function to generate audio from text ​

Now that we have the text reply generated by our chatbot, we might want to convert it back to audio if the flag isAudioResponse is true . For this, we'll create a function called generate_audio that uses the ElevenLabs API to synthesize speech from the generated text.

ElevenLabs has a generous free tier with API access - just remember to add an attribution to elevenlabs.io when on the free tier:

ElevenLabs pricing

Start by creating a free ElevenLabs account, if you don't already have one. Visit https://beta.elevenlabs.io/ and click Sign up :

ElevenLabs sign up

Then click on the avatar in the upper right corner and click Profile :

ElevenLabs dashboard

Copy your API key and have it available for the next step when we're calling the ElevenLabs API:

ElevenLabs API key

The last step is to get a voice id for the API call. Go back to your dashboard, click Resources and then API :

ElevenLabs resources

Click to expand the documentation for text-to-speech :

ElevenLabs API overview

Here you'll find a voice id we'll use when synthesizing the audio in our backend, copy and save the voice id for the next steps:

ElevenLabs voice id

Let's finally create the function to synthesize speech from the generated text:

Let's break down this function step by step:

1. We define the API key and voice ID as variables. Replace YOUR_API_KEY with your actual ElevenLabs API key we just generated:

2. We create a dictionary called data that contains the generated text and voice settings. The text key contains the text that we want to convert to speech. The voice_settings key is a dictionary containing options for controlling the stability and similarity of the generated voice:

3. We define the API endpoint URL using the voice ID and the API key. The URL includes the base endpoint, https://api.elevenlabs.io/v1/text-to-speech/ followed by the voice ID and the API key as a query parameter:

4. We set up the HTTP headers for our API request. The accept header indicates that we expect the response to be in the audio/mpeg format, while the Content-Type header specifies that we will send JSON data in our request:

5. We then use the request.post method to make a POST request to the API endpoint, passing the headers and JSON data as arguments. The API call returns a response containing the synthesized audio data:

Try block for timeout In some cases, the ElevenLabs API request time is long, causing the API Gateway to time out while waiting for a response. To handle this, we've added a timeout of 15 seconds to the generate_audio function. This ensures that our application does not hang indefinitely while waiting for a response from the API, and provides a more predictable user experience. If the API does not respond within 15 seconds, the request will be terminated and return None . We added a try block around the request and catch the requests.exceptions.Timeout exception.

6. Since the audio data is in bytes format, which is not JSON serializable, we need to convert it to a Base64 string. We use the base64.b64encode method to do this and then decode the result to a UTF-8 string using the decode method:

7. Finally, we return the Base64-encoded audio data as the output of the function.

With this generate_audio function, we can now convert the text reply generated by our chatbot back into an audio format that can be played by the user.

7.6 Create the handler function to tie everything together ​

Finally, we need to create the main handler function that will be triggered by the API Gateway event. This function will tie together all the other functions we've created, allowing us to process the incoming request, transcribe audio, generate chat completions, and create audio responses.

Add the handler function to your handler.py file:

Let's break down this handler function step by step:

1. We start by defining the handler function with two arguments: event and context . The event object contains the data from the API Gateway event, and context contains runtime information:

2. We then extract the body from the event object by loading it as a JSON object:

3. We then check if the body contains an audio key. If it does, we decode the base64-encoded audio data and transcribe it using the transcribe_audio function. We create a message_objects list by combining the existing messages from the frontend data with the transcribed message:

4. If the body contains a text key instead, we simply use the text provided and create the message_objects list from the frontend data:

5. If neither audio nor text keys are present, we raise a ValueError to indicate that the request format is invalid:

6. We then call the generate_chat_completion function, passing the message_objects list as an argument. This returns the generated text response from our Chatbot:

7. We check if the body contains an isAudioResponse key and use its value to determine if we should generate an audio response from the generated text:

8. If an audio response is requested from the frontend, we call the generate_audio function to convert the generated text back to audio, If not, we set generated_audio to None :

9. We create a response dictionary with the following keys:

- statusCode : The HTTP status code for the response. We set it to 200, indicating a successful operation.

- headers : The HTTP headers to include the response. We set the Access-Control-Allow-Origin header to * to enable cross-origin requests.

- body : The response body, which we serialize as a JSON object. The response body contains the following keys: transcription : The transcribed text from the user's audio input generated_text : The generated text response from the chatbot generated_audio : The generated audio response if requested, encoded as a base64 string:

10. We return the response dictionary:

11. If a ValueError occurs, e.g., due to an invalid request format, we catch the exception, print the traceback, and return a 400 status code along with an error message:

12. If any other exception occurs, we catch the exception, print the traceback, and return a 500 status code along with a generic error message:

With the handler function complete, we now have a fully functional backend for our chatbot that can handle text and audio input, generate chat completions using OpenAI and return text or audio responses as needed.

8. Deploying the backend to AWS ​

Now that we have our chatbot backend implemented in the handler.py file, it's time to deploy it to AWS using Serverless Framework, In this section, we'll go through the deployment process step by step.

8.1 Ensure AWS credentials are configured ​

Before deploying, ensure that you have properly set up your AWS credentials on your local machine. If you haven't done this yet, refer to section 6.2 for a detailed guide on setting up your AWS credentials.

8.2 Install dependencies ​

Before deploying the backend, we need to install the required Python packages. In your backend folder, create a requirements.txt file and add the following dependencies:

8.3 Install and configure the serverless-python-requirements plugin ​

Before deploying the Serverless project, you need to ensure that you have the serverless-python-requirements plugin installed and configured. This plugin is essential for handling your Python dependencies and packaging them with your Lambda function.

To install the plugin, run the following command in your project directory:

This command will add the plugin to your project package.json file and install it in the node_modules folder.

These are the Python packages needed for our backend implementation. The Serverless Framework will automatically package these dependencies and include them in the deployment.

8.4 Deploy the backend ​

Now that we have our AWS credentials configured and our dependencies installed, it's time to deploy the backend. Open a terminal, navigate to your backend folder located in your project folder, and run the following command:

This command will package and deploy tour Serverless service to AWS Lambda. The deployment process might take a few minutes. Once the deployment is completed, you'll see output similar to this:

Take note of the endpoints section, as it contains the API Gateway URL for your deployed Lambda function. You'll need this URL in the next section when we'll make requests to your chatbot backend from the frontend.

8.5 Locating the deployed Lambda function in the AWS console ​

Once your backend is successfully deployed, you may want to explore and manage your Lambda function using the AWS Console. In this section, we'll guide you through the process of finding your deployed Lambda function in the AWS Console.

1. Sign in to your AWS Management Console: https://aws.amazon.com/console/

2. Under the "Services" menu, navigate to "Lambda" or use the search bar to find and select "Lambda" to open the AWS Lambda Console.

3. In the AWS Lambda Console, you'll see a list of all the Lambda functions deployed in the selected region. The default function name will be in the format service-stage-function, where service is the service name defined in your serverless.yml file, stage is the stage you deployed to (e.g., dev), and function is the function name you defined in the same file.

For example, if your serverless.yml has the following configurations:

The Lambda function will have a name like chatgpt-audio-chatbot-dev-chatgpt-audio-chatbot .

4. Click on the Lambda function name in the list to view its details, configuration, and monitoring information. On the Lambda function details page, you can:

- Edit the function code in the inline code editor (for smaller functions), or download the deployment package to make changes offline. ​

- modify environment variables, memory, timeout, and other settings. ​, - add triggers, layers, or destinations. ​, - view monitoring data, such as invocation count, duration, and error rate in the monitoring tab. ​, - access cloudwatch logs to view and search the function's logs in the monitoring tab, by clicking on view logs in cloudwatch ​.

5. Additionally, you can navigate to the API Gateway console to view and manage the API Gateway that's integrated with your Lambda function:

- In the AWS Management Console, search for API Gateway under the Services menu or use the search bar. ​

- select the api gateway that corresponds to your serverless.yml configuration (e.g., chatgpt-audio-chatbot-dev if your service name is chatgpt-audio-chatbot and the stage is dev). ​, - in the api gateway console, you can view and manage resources, methods, stages, and other settings for your api. you can also test the api endpoints directly from the console. ​.

By following these steps, you can locate, manage, and monitor your deployed Lambda function and other AWS resources from the AWS Management Console. This allows you to better understand your application's performance, troubleshoot any issues, and make any necessary updates to the backend as needed.

8.6 Test the deployed backend ​

To ensure that your backend is working correctly, you can use a tool like Postman or curl to send a test request to the API Gateway URL. Replace https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com with your own API Gateway URL you received when you deployed the backend:

For a text-based request:

For an audio-based request, replace your_base64_encoded_audio_string with an actual Base64 encoded audio string:

You should receive a response containing the transcription of the user's input, the generated text from the chatbot, and (optionally) the generated audio if isAudioResponse is set to true.

If you receive an error, double-check your request payload and ensure that your Lambda function has the correct permissions and environment variables set.

9. Update the frontend ​

Now that your backend is deployed and working correctly, let's update the frontend application to use the API Gateway URL. We'll leverage AWS Amplify to configure the API call and make it easy to interact with our backend.

First, open the App.js file in your frontend project. Import Amplify from aws-amplify :

Just before the function App() , add the Amplify configuration, including the API endpoint you received when you deployed the backend:

Make sure to replace xxxxxxxxxx with the actual endpoint from the backend deploy.

With your backend deployed and your frontend updated, your ChatGPT Audio Chatbot is now ready to use!

Let's try it out, here's how it works:

10. Redeploying after changes ​

If you make any changes to your backend code or serverless.yml configuration, you can redeploy your service by running serverless deploy again. The Serverless Framework will update your AWS resources accordingly.

Remember to test your backend after each deployment to ensure everything is working as expected.

That's it! You have successfully created and deployed a ChatGPT Audio Chatbot using OpenAI, AWS Lambda, and the Serverless Framework. Your chatbot is now ready to receive and respond to both text and audio-based requests.

The source code ​

Do you want the full source code? This tutorial is quite extensive, and following along step-by-step may be time-consuming.

Visit this page to download the entire source code, you'll get instant access to the files, which you can use as a reference or as a starting point for your own voice-powered ChatGPT bot project.

Improvements ​

Protecting the Lambda API Endpoint Currently, our Lambda function is openly accessible, which can lead to potential misuse or abuse. To secure the API endpoint, you can use Amazon API Gateway's built-in authentication and authorization mechanisms. One such mechanism is Amazon Cognito, which provides user sign-up and sign-in functionality, as well as identity management.

By integrating Amazon Cognito with your API Gateway, you can ensure that only authenticated users have access to your chatbot API. This not only secures your API but also enables you to track and manage user access, providing a more robust and secure experience.

In summary, leveraging Amazon Cognito for authentication is an excellent way to protect your Lambda API Endpoint and enhance the security of your chatbot application.

Error Handling The chatbot application could benefit from more comprehensive error handling. This would involve checking for error responses from both the text-to-speech API , the speech-to-text API and the Lambda function and gracefully displaying relevant error messages to the user. This would help users understand any issues encountered during their interaction with the chatbot.

Saving Chat History to a Database Currently, the chat history between the user and the chatbot is stored in the application's state, which means that the messages disappear when the page is refreshed. To preserve the chat history, you can save the conversation to a database. This can be achieved using a variety of database solutions, such as Amazon DynamoDB or MongoDB.

Storing chat history in a database provides additional benefits, such as the ability to analyze user interactions for further improvements, track user satisfaction, and monitor the chatbot's performance.

By implementing these improvements, you can enhance the security, user experience, and functionality of your chatbot application, making it more robust and reliable for real-world use.

Questions ​

Get in Touch for Assistance or Questions Do you need help implementing the ChatGPT chatbot, or have any other questions related to this tutorial? I'm more than happy to help. Don't hesitate to reach out by sending an email to [email protected]

Alternatively, feel free to shoot me a DM on Twitter @norahsakal .

I look forward to hearing from you and assisting with your ChatGPT chatbot journey!

  • 1. Create a new ReactJS app
  • 2. Install libraries
  • 3. Create the chat interface components
  • 4. Create an AWS account
  • 5. Set up AWS CLI and configure your account
  • 6. Set up a Serverless project with a handler.py file
  • 7. Create the python backend
  • 8. Deploying the backend to AWS
  • 9. Update the frontend
  • 10. Redeploying after changes
  • The source code
  • Improvements

September 25, 2023

ChatGPT can now see, hear, and speak

ChatGPT Can Now See Hear And Speak

We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.

Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.

We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.

Speak with ChatGPT and have it talk back

You can now use voice to engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.

Use voice to engage in a back-and-forth conversation with your assistant.

To get started with voice, head to Settings → New Features on the mobile app and opt into voice conversations. Then, tap the headphone button located in the top-right corner of the home screen and choose your preferred voice out of five different voices.

The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech. We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.

Chat about images

You can now show ChatGPT one or more images. Troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data. To focus on a specific part of the image, you can use the drawing tool in our mobile app.

Show ChatGPT one or more images.

To get started, tap the photo button to capture or choose an image. If you’re on iOS or Android, tap the plus button first. You can also discuss multiple images or use our drawing tool to guide your assistant.

Image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images.

We are deploying image and voice capabilities gradually

OpenAI’s goal is to build AGI that is safe and beneficial. We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.

The new voice technology—capable of crafting realistic synthetic voices from just a few seconds of real speech—opens doors to many creative and accessibility-focused applications. However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.

This is why we are using this technology to power a specific use case—voice chat. Voice chat was created with voice actors we have directly worked with. We’re also collaborating in a similar way with others. For example, Spotify is using the power of this technology for the pilot of their Voice Translation (opens in a new window) feature, which helps podcasters expand the reach of their storytelling by translating podcasts into additional languages in the podcasters’ own voices.

Image input

Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains. Prior to broader deployment, we tested the model with red teamers for risk in domains such as extremism and scientific proficiency, and a diverse set of alpha testers. Our research enabled us to align on a few key details for responsible usage.

Making vision both useful and safe

Like other ChatGPT features, vision is about assisting you with your daily life. It does that best when it can see what you see. 

This approach has been informed directly by our work with Be My Eyes, a free mobile app for blind and low-vision people, to understand uses and limitations. Users have told us they find it valuable to have general conversations about images that happen to contain people in the background, like if someone appears on TV while you’re trying to figure out your remote control settings.

We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.

Real world usage and feedback will help us make these safeguards even better while keeping the tool useful.

Transparency about model limitations

Users might depend on ChatGPT for specialized topics, for example in fields like research. We are transparent about the model's limitations and discourage higher risk use cases without proper verification. Furthermore, the model is proficient at transcribing English text but performs poorly with some other languages, especially those with non-roman script. We advise our non-English users against using ChatGPT for this purpose.

You can read more about our approach to safety and our work with Be My Eyes in the system card for image input .

We will be expanding access

Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

  • Announcements

Acknowledgments

Voice mode core research

Alec Radford, Tao Xu, Jong Wook Kim

Vision deployment core research

Raul Puri, Jamie Kiros, Hyeonwoo Noh, Long Ouyang, Sandhini Agarwal

View GPT-4V(ision) technical work and authors

Related articles

Introducing OpenAI o1 > cover image

how to make chatbot text to speech

Building Your Own Conversational Voice AI With Dialogflow & Speech to Text in Web Apps. (Part I)

This is the first blog in the series:

A best practice for streaming audio from a browser microphone to Dialogflow & Google Cloud Speech To Text.

In this first blog, I will address why customers would integrate their own conversational AI compared to building for the Google Assistant. I will introduce all the conversational AI components in Google Cloud and where you would use each component for.

Later in this blog series, I will show you how to integrate an HTML5 microphone in your web application. How to stream audio streams to a (Node.js) back-end. How to use the Dialogflow API for audio streaming. How to use the Speech API. And, lastly, how to return audio (Text to Speech) to a client to play this in a browser.

These blogs contain simple code snippets , and a demo application; the Airport Self Service Kiosk , which will be used as a reference architecture.

Google Assistant vs. a custom conversational AI

I often speak with customers and their wish to include the Google Assistant in their business web apps. Unless you are a manufacturer for tv setup boxes or headphones, I always answer;

“Is this really what you want? Or do you mean you want to extend your own app with a conversational AI?”

If you have one or more of the below requirements, you probably want to make direct use of the Google Cloud Speech and Dialogflow APIs, instead of packing your voice AI as an action in the Google Assistant or wrapping the Google Assistant in your app .

  • This application shouldn’t be public available.
  • This application doesn’t need to be available on the Google Assistant / Nest Home.
  • You don’t want to start your app with the wake words: “Hey Google, talk to my app”.
  • The application doesn’t need to answer native Google Assistant questions, such as: “what’s the weather in Amsterdam”.
  • The application can only make use of the Google Cloud terms & conditions, instead of combining it with the consumer terms & conditions of the Google Assistant.

Convinced that you want to extend your own (mobile) web app by integrating voice AI capabilities? Here’s the ultimate developer guide, on implementing voice streaming from a web application to Google Cloud Speech and Dialogflow.

Dialogflow versus Text-to-Speech API versus Speech-to-Text API

Dialogflow is an AI-powered tool for building text and voice-based conversational interfaces such as chatbots and voice apps. It uses Machine Learning models such as Natural Language Understanding to detect the intentions of a conversation.

The way how Dialogflow intent detection works is, it first tries to understand the user utterance. Then, it will check the Dialogflow agent, which contains intents (or chat flows), based on the training phrases. The intent with the best match (highest confidence score), will return the answer, which could be a text response or a response from a system through a fulfillment.

Although many of us will use Dialogflow with text input, for web or social media chatbots, it is also possible to do intent matching with your voice as audio input, and it can even return spoken text (TTS) as an audio result.

Dialogflow speech detection & output will have some overlap with Cloud Speech to Text API (STT) and Cloud Text to Speech (TTS). Even the API calls look similar! However those services are different, and they have been used in separate use cases.

Speech to Text (STT) transcribes spoken words to written text. This is great for when you want to generate subtitles in a video, generate text transcripts from meetings, etc. You could also combine it with Dialogflow chatbots (detect intent from text transcripts) to synthesize the chatbot answers, however STT doesn’t do intent detection like Dialogflow does. STT is very powerful, as the API call response will return the written transcript with the highest confidence score, and it will return an array with alternative transcript options.

With Text to Speech (TTS) , you can send text or SSML (text with voice markup) input and it will return audio bytes, which you can use to create an mp3 file or directly stream to an audio player (in your browser).

Compared to the Google Assistant , by extending your apps with a conversational AI manually with the above tools, you no longer are part of the Google Assistant ecosystem. That ecosystem is nice if you are building consumer or campaign apps (voice actions), that everyone can find by invoking it through the Hey Google, talk to my app wake phrase. But when you are an enterprise, that whole ecosystem might be overkill.

Actions on Google ecosystem

For an enterprise who wants to integrate a voice AI in their own apps, the full Google Assistant ecosystem might be an overkill.

Google Cloud Contact Center AI

There’s another Google solution, which is called **Google Cloud Contact Center AI **(CCAI).

This solution is for enterprises that want to deploy a voice AI in their existing telephone contact center (IVR). Dialogflow and Cloud Speech APIs are the key pieces in that architecture OEMed by a telephony partner (such as Genesys, Avaya, Cisco etc.) Since Contact Center AI is an out of the box solution, you don’t need to implement these APIs yourself.

About the demo application; Airport Self Service Kiosk

Now that you know the differences between all the conversational GCP components, let’s see how we can implement these in an end-to-end web application. For this guide, I will make use of a demo app, which is a self service kiosk for an airport. (Self Service Kiosks are also common in retail or the finance sectors.)

You can ask the Self Service Kiosk; if’t okay to bring a lighter in your handbag, or what time boarding is. The results will be presented on the screen, and it will also be spoken out:

Here’s a screenshot from my demo app: The Airport Self Service Kiosk

Let me show you the best practice for streaming audio from your microphone through the browser to Dialogflow and then out through the speaker.

All the code is available on Github: https://github.com/dialogflow/selfservicekiosk-audio-streaming

And the final solution has been deployed with App Engine Flex: http://selfservicedesk.appspot.com

Building the demo application requires the following tools:

  • Dialogflow client SDK
  • STT Node.js client SDK
  • TTS Node.js client SDK
  • Socket.io & Socket.io-Stream
  • AppEngine Flexible Environment (with websockets support & HTTPS)

Architecture

Here’s the architecture that I have been using:

The architecture I’ve used.

  • Client website / app. For demo purposes I will show you two versions. A simple HTML page, and an example of a full web application in Angular, such as the self service kiosk demo . It contains the getUserMedia() WebRTC call wrapped by the RecordRTC library, to record the audio streams from the browser microphone.
  • A NodeJS server which will serve the static content (such as the HTML page) and connect to the GCP libraries, like Dialogflow, STT and TTS.
  • You could use any other programming language as well. All GCP services have various client SDKs (such as Node.js, Java, Python, Go etc), and also Rest and GRPC libraries.
  • The Dialogflow Agent, which contains intents, entities, and FAQ Knowledge bases.

The client app talks to the backend server via websockets. This is a common approach when building chatbots or chat applications because they can respond in real-time, without any page refreshes.

I am using the socket.io framework with the socket.io-stream plugin, since it’s easy to use and I need to make use of bi-directional streaming.

I’ve seen solutions online where the microphone is directly streamed to the Dialogflow, without a server in between. The REST calls were made directly in the web client with JavaScript. I would consider this as an anti-pattern. You will likely expose your service account / private key in your client-side code. Anyone who is handy with Chrome Dev tools could steal your key and make (paid) API calls via your account. It’s a better approach to always let a server handle the Google Cloud authentication. This way the service account won’t be exposed to the public.

Short utterance vs. Streaming

There are typically 2 approaches on how to integrate voice in your application.

  • Short utterances / detect intent. This means your end-user presses a record button, speaks, and when they press stop, we collect the audio stream to return results. In your code, this means once the client web app collects the full audio recording, it sends it to the server, so the server can do a call to Dialogflow or the Speech to Text API.
  • Streaming of long utterances / detect intents in a stream. This means your end-user presses the record button, speaks, and will see the results on the fly. When detecting intents, it could mean that it will detect better matches once you have spoken more, or it could collect multiple results. In your code, this means the client starts making a bi-directional stream and streams chunks to the server so the server can make a call with event listeners on incoming data and thus it’s real-time.
  • When there is an intent match, we can either show the results on screen by presenting the text, or we can synthesize (read out) the results by streaming an audio buffer back to the client, which will played via the WebRTC AudioBufferSourceNode (or audio player).

Stay tuned for my next blog. In this blog I will make a start by building a client-side web application which uses a HTML5 Microphone with WebRTC, streaming the audio bytes to a Node.js backend.

Integrate OpenAI's ChatGPT with Twilio Programmable Voice and Functions

Time to read: 12 minutes

  • Facebook logo
  • Twitter Logo Follow us on Twitter
  • LinkedIn logo

Why talk to one robot when you can talk to three?

Using ChatGPT to power an interactive voice chatbot is not just a novelty, it can be a way to get useful business intelligence information while reserving dedicated, expensive, and single-threaded human agents for conversations that only humans can help with. These days people talk to , listen to , and collaborate with robots all the time, but you know what's cooler than interacting with one robot? Interacting with three!

In this post, we'll show you how to use Twilio's native speech recognition and Amazon Polly Neural text-to-speech capabilities with ChatGPT to create a voice-activated chatbot, all hosted entirely on Twilio's serverless Functions environment. You'll also use the Call Event API to parse what callers are asking about and view the responses from the bot which allows us to unlock the rich first-party data captured in these interactions and send the data to customer engagement platforms like Segment where you can use it to build customer profiles, understand customer preferences, and create the personalized experiences customers expect .

Want to give this a demo a whirl before diving in? Call 1-989-4OPENAI (467-3624) to test it out!

Robot #1: Decoding the human voice using Speech Recognition

Twilio's speech recognition using the TwiML <Gather> verb is a powerful tool that turns words spoken on a phone call into text. It offers excellent accuracy, low latency, and support for numerous languages and dialects. Historically, Twilio developers have used speech recognition as a way to navigate interactive voice response (IVRs) and other self-service automation workflows, but with the release of new experimental speech models , the only limit is ✨ your imagination ✨.

Robot #2: Giving your robot a voice with Amazon Polly Neural Voices

With the TwiML <Say> verb , Twilio provides a text-to-speech (TTS) function that uses Amazon Polly voices which leverage deep learning to synthesize human-like speech. Polly's neural voices offer a more natural and lifelike sound, providing an engaging listening experience for users. With support for multiple languages, a wide range of voices, and SSML support, Twilio text-to-speech allows you to customize your chatbot's voice to match your brand's identity.

Robot #3: OpenAI's ChatGPT Conversational Companion

ChatGPT is an advanced language model developed by OpenAI , capable of generating human-like text based on given input. It can understand context, provide relevant responses, and even engage in creative tasks like writing stories or poems. By leveraging the OpenAI API , developers can integrate this AI directly into their applications, offering users a more interactive and engaging experience.

Secret Sauce: Twilio Functions

How will you get these three robots to talk to each other, and to your callers? By using Twilio Functions . Beyond merely giving you the ability to get a proof of concept up and running without needing to spin up a server of your own, Functions provide auto scaling capabilities, enhanced security, and reduced latency by running your code inside Twilio. Of course, if you've got your own server rattling around someplace you can make a couple small edits to the Javascript and it will run in your node.js environment, no sweat.

Now that you've got the ingredients, let's check out the recipe in two flavors: CLI and GUI .

Pre-req yourself before you wreck yourself

Before diving into the integration process, you'll need the following:

A Twilio account - Sign up for free using this link and receive $10 in credit when you upgrade your account

A Twilio phone number - Click here to see how to get your first Twilio number with your trial account

An OpenAI API key on a premium plan or with available credits

Twilio CLI with the Serverless Toolkit installed

Next Day's Function

First let's get our backend in order. You'll start by creating a new Serverless project. Since you installed the awesome open-source Serverless Toolkit , you can do this in one line by passing the command into your terminal:

Replace <project-name> with a name of your liking; I'm naming mine three-robot-rhumba .

Terminal window showing the initialization of a Twilio Serverless project

Go ahead and cd into the directory for your project and let's update the .env file to provide your Twilio auth token as AUTH_TOKEN , and your OpenAI API Key as OPENAI_API_KEY . Your Twilio account SID should be auto populated. Ensure your .env file looks like the following (with the XXXXX placeholders replaced with its respective keys):

Since Twilio Serverless Functions are just Node.js apps, you can add dependencies using any package manager that writes to package.json ; I'm using npm because I'm basic. Navigate back to your terminal and enter the following to install the OpenAI NPM package:

With your environment variables set and your dependencies added you can get down to business. You're going to create two Functions: a /transcribe Function that uses Twilio speech recognition to turn your spoken words into text that ChatGPT can understand using the TwiML <Gather> verb , and a /respond Function that takes the text generated by the speech recognition, sends it over to the OpenAI API, and passes the response to Twilio's Amazon Polly Neural-powered text-to-speech engine using the TwiML <Say> verb .

To create a new Function, open up the functions folder in your project directory and create a JavaScript file. Create the /transcribe and /respond Functions by creating a transcribe.js and respond.js file in the folder.

Lost in Transcription

Now let's open up transcribe.js and add the following code:

If you're new to Functions let me walk you through what's happening here. The /transcribe Function creates a TwiML generating voice response based on Twilio's Node.js helper library, starts a conversation if none exists, listens for user input, and passes that input along with the conversation history to the /respond endpoint for further processing.

In line 6 the application checks to see if a cookie called convo exists. If it doesn't, or if it does but it's empty, you can take that to mean the conversation hasn't started yet, so you will kick off an initial greeting using the TwiML <Say> verb.

Next, the twiml.gather method is used to capture user input. The parameters for gather are:

  • speechTimeout: 'auto' : Automatically determines when the user has stopped speaking, can be set to a positive integer, but auto is best for this use case
  • speechModel: "experimental_conversations": Uses a speech recognition model optimized for conversational use cases
  • input: 'speech' : Sets the input type to speech and ignores any keypresses (DTMF)
  • action: '/respond' : Pass the user's speech input along with the conversation history to the /respond endpoint

Now, you need to create a way to create the convo cookie so the /respond Function has a place to store the conversation history which will get passed between the OpenAI API and our Polly Neural Voices, and that means the app needs need to initialize a Twilio.Response(); object, and does so on line 25.

You can't pass both the Twilio.twiml.VoiceResponse(); and the Twilio.Response(); back to the handler, so you'll need to use the response you just created to append a header to our requests and set the TwiML you generated via <Gather> to the body on lines 28 and 31 respectively.

Once that's done you can set the cookie using response.setCookie(); on line 35 before you pass the response back to the handler for execution by our Serverless infrastructure on line 39. Go ahead and save this file and close it.

Call and Response

Next, let's open up respond.js and add the following code:

As with above, here's a guided tour as to what exactly is going down in this code. It starts by importing the required modules (line 2) and defining the main function to handle requests (line 5).

Lines 7-8 set up the OpenAI API with the API key, while line 11 creates the Twilio Voice Response object that will generate TwiML for turning the ChatGPT responses into speech for callers. Line 14 initiates the Twilio Response object to update the conversation history cookie and set the headers and body so the TwiML is passed in the response.

Lines 17-20 parse the cookie value if it exists, and line 23 retrieves the user's voice input from the SpeechResult event received from the /transcribe Function. Lines 26-27 create a conversation variable to store the dialog and add the user's input to the conversation history.

Line 30 generates the AI's response based on the conversation history, and line 33 cleans the AI response by removing any unnecessary role names (assistant, Joanna, user). Line adds the cleaned AI response to the conversation history.

Lines 39-41 limit the conversation history to the last 10 messages to improve performance and keep the cookie a reasonable size while also giving the chatbot enough context to deliver useful responses. You can increase (or decrease) this if you'd like, but remember the stored history is getting passed to the OpenAI API with every request, so the larger this gets, the more tokens your application is chewing through. Lines 44-48 generate the <Say> TwiML using the cleaned AI response, and lines 51-55 redirect the call to the /transcribe function where the <Gather> is capturing the caller's speech.

As with /transcribe we need to use response to deliver the TwiML and lines 58-59 set the appropriate header and return the TwiML in the body of the response. Lines 62-67 update the conversation history cookie with the response from the OpenAI API, and line 70 returns the response to the handler.

The generateAIResponse function (lines 73-76) formats the conversation and creates a chat completion using the OpenAI API. The createChatCompletion function (lines 81-88) sends a request to the OpenAI API to generate a response using the GPT-3.5-turbo model and specified parameters. If we get a 500 from the OpenAI API we don't want to just drop the conversation on the floor, so we will handle an error from the API with <Say> and a redirecting the conversation back to the /transcribe function on lines 90-107.

It's also possible that the request to the OpenAI just plain old fashioned times out, so a catch has been added between lines 109-131 to handle timeouts gracefully, and redirect to /transcribe to try again.

Finally, the formatConversation function (lines 136-158) formats the conversation history into a format that the OpenAI API can understand by alternating between assistant and user roles.

Now your code is updated, your dependencies set, and your environment variables are configured, you're ready to deploy. With Twilio Serverless, it couldn't be easier… it's just a single command!

Once the deployment wraps up you can now use the Functions you created to capture spoken input from a caller, convert it to text which is sent to the ChatGPT API, and play the response back to the caller in the form of AI-generated speech. Three robots, working together, just for you!

The example above uses the gpt-3.5-turbo model from OpenAI. This is a good (and cheap!) model for development purposes and proof-of-concepts, but you may find that other models are better for specific use cases. GPT-4 was just released in limited beta, and even here at Twilio we haven't had a chance to really kick the tires on it yet, but based on the developer livestream published with the announcement, it looks to be a significant upgrade over the 3.5 version that has been blowing people's minds for the past couple months.

One additional consideration, GPT-3 is superseded by 3.5 (and naturally 4) but the models in GPT-3 expose fine tuning capabilities that the more advanced models lack at the time of writing. For example, even though the training data is older, you can get quicker responses by using curie, and can employ things like sentiment analysis and response styles . Choose your own adventure.

Rikki Don't Lose That Number

Now that you have your Function deployed and ready to party you can test it by making a call, but first, you'll need to configure a phone number to use the Functions you just created, and the CLI provides a quick and easy way to do so. Type the following into your terminal to list the phone numbers on your account (we are assuming you followed the pre-req and acquired a number beforehand).

You'll get back a list of the phone number SIDs, phone numbers, and friendly names on your account. You can use either the SID or the full E.164-formatted phone number for your request:

If CLIs don't pass the vibe check, you can do all of the things we describe above directly in Twilio Console. First, in the left hand nav on the Develop tab go to the Functions and Assets section and click on Services . Click Create Service .

Twilio Voice ChatGPT Demo Create Service

Give your Service a name, I'll call mine voice-chatgpt-demo and click Next.

Twilio Voice ChatGPT Demo Name Service

You'll now see the Console view for your Service, with Functions, Assets, Environment Variables, and Dependencies in the left hand nav, and a text editor and console for editing your code and monitoring the logs. First thing you'll want to do is get your environment variables configured, so click on Environment Variables in the bottom right corner.

Twilio Voice ChatGPT Demo Service Config

Your Twilio account SID and auth token are pre-populated as env variables, so all you need to is add your OpenAI API key; the sample code will refer to it as OPENAI_API_KEY so if you're looking for a zero edit copy/paste experience, make sure you name it the same. Click Add when you're done.

Twilio Voice ChatGPT Demo Add Environment Variables

Next, you will need to update our dependencies to include the OpenAI npm module so you can make requests to the OpenAPI API. Click Dependencies , and enter openai in the Module textbox and 3.3.0 in the Version textbox. Don't forget to click Add .

Now you can get cracking with creating the Functions. You're going to create two: /transcribe which will do all the heavy lifting from a speech recognition perspective, and /respond which will pass the transcribed text to the ChatGPT API and read the response to the caller using an Amazon Polly Neural text-to-speech voice.

Click the Add button and select Add Function from the dropdown to create a new Function and name it /transcribe .

Twilio Voice ChatGPT Demo Add Function

Replace the contents of the new function with the code snippet here and hit Save . Your factory fresh Function should look like this when complete:

Twilio Voice ChatGPT Demo Transcribe Function

Next create another Function and call this one /respond . Replace the contents of this new Function with the code snippet here and hit Save again. If you'd like a guided tour of what precisely is going on in these examples, check out the CLI section of this post where we walk through the code in greater detail.

Next click the Deploy button, and your saved Functions will be deployed and can now be used in your incoming phone number configuration. Click the three vertical dots next to /transcribe to Copy URL . We will need this in a second.

Twilio Voice ChatGPT Demo Phone Number Setup

On the Develop tab navigate to the Phone Numbers section, then Manage , and choose Active Numbers . Once you've found a number you want to use, scroll down to the Voice & Fax section and under A call comes in select Function . For Service , select the Function service you just created which we named voice-chatgpt-demo . Then choose ui for Environment and lastly choose /transcribe for the Function Path since this is where your phone call should route to first.

Now give your newly configured phone number a call to test everything out!

No disassemble

One particularly great thing about this integration is that both the inputs and the responses are available to you as a developer; the speech recognition text from the person in the form of the SpeechResult parameter that gets passed to the /respond Function, and the ChatGPT-derived responses in the form of the <Say> TwiML that gets executed on the calls. This means these conversations aren't business intelligence closed boxes, and even though these Functions are executing in the Twilio Serverless environment, you can still get your hands on the content of the conversation using the Call Events API . Here's how to grab the details using the Twilio CLI:

Using this API you can retrieve the requests, responses, and associated parameters and pump them directly into your internal systems to do things like provide agents a heads-up about what the caller had been asking about before they were connected, or using the data to decorate your customer profiles in a customer data platform like Segment .

Publications like McKinsey and Forbes are already opining on how generative AI technologies like ChatGPT can be used to solve business problems, so now that you have three robots working for you , what can you actually do with an integration like this? How about a frontline IT desktop support agent? Make ChatGPT search Google so your expensive IT department doesn't have to, and in the event your caller and ChatGPT can't figure it out, connect the call to your agents. Have long hold times for your healthcare specialists? Instead of playing callers smooth jazz, offer them the wisdom of ChatGPT for common non-critical ailments while they wait.

So there you have it: with the help of Twilio's speech recognition, Twilio Functions, Amazon Polly Neural voices, and OpenAI's API, you've now created your very own interactive voice chatbot. Definitely keep an eye on this space and be on the lookout for advances in conversational AI and chatbot capabilities that you can leverage using Twilio.

Michael Carpenter (aka MC) is a telecom API lifer who has been making phones ring with software since 2001. As a Product Manager for Programmable Voice at Twilio, the Venn Diagram of his interests is the intersection of APIs, SIP, WebRTC, and mobile SDKs. He also knows a lot about Depeche Mode. Hit him up at mc (at) twilio.com or LinkedIn .

A huge debt of gratitude to Dhruv Patel's positively prescient post on How to Call an AI Friend Using GPT-3 with Twilio Voice and Functions . Dhruv also provided a technical review of the code in this post. Dhruv Patel is a Developer on Twilio’s Developer Voices team and you can find him working in a coffee shop with a glass of cold brew, or he can be reached at dhrpatel [at] twilio.com or LinkedIn .

⭐️  Twilio’s AI Startup Searchlight 2.0 Awards Program  has kicked off, and we’re searching for startups at the forefront of innovation with emerging technologies like AI with Twilio. Winners will be featured online, awarded credits, and more!

📣 Apply or be nominated by September 29, 2024. We can't wait to see what you're building! 

Related Posts

White cake logo on dark grid with blue phone icons and red symbols scattered around.

Related Resources

Twilio docs, from apis to sdks to sample apps.

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars.

Learn from customer engagement experts to improve your own communication.

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

  • Get Started

Speaker.bot

Supercharged Text to Speech (TTS) for your live stream!

Supported Speech Engines

Use your favorite TTS engine with Speaker.bot

Google Cloud

Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.

Engage global audiences by using 400 neural voices across 140 languages and variants available with Azure TTS

Amazon Polly

Deploy high-quality, natural-sounding human voices in dozens of languages

IBM Watson Text to Speech API

Microsoft Speech API (SAPI), the native speech API for Windows.

The Open Source Voice AI Community

TTS Monster

Custom AI Text to Speech for Streamers

Text to Speech solutions by Acapela Group

Text to Speech by CereProc

Eleven Labs

Text to Speech by ElevenLabs.io

Do you restrict access to the service and platform for any specific countries?

  • Updated September 06, 2024 16:49

We are required to restrict access from the following countries:

  • North Korea
  • The Crimea, Donetsk, and Luhansk regions of Ukraine

If you are connecting from one of these sanctioned countries, your access to our service will be blocked. If you believe you have been incorrectly blocked, you can contact us via https://help.elevenlabs.io/hc/en-us/requests/new .

how to make chatbot text to speech

Chatbot text to speech software:an introduction

What’s a rich text element.

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

A chatbot text to speech tool (the expression: “text to speech” is often abbreviated into the acronym TTS) is anything else than software that converts a textual chat into audio with the use of technologies like optical character recognition, natural language processing, and artificial intelligence.

The difference with voice bots is that chatbot text to speech tools are downloaded, configured, and utilized by clients: in other words, they are client-side applications. However, some chatbot services, like Ideta software, have text to speech software already integrated. In this way, no extra setup is necessary.

Why use a chatbot text to speech tool?

The benefits of utilizing a chatbot TTS are multiple, like for any kind of assistive technology. I am going to list the most common purposes by which users download this kind of software here below:

  • People with literacy issues can find listening to a chatbot conversation much more comfortable than reading a text chat
  • Users that speak a foreign language can benefit from the machine translator that is usually incorporated in this kind of software
  • Users with visual impairments can finally interact with your bot and enjoy conversing with it
  • Listening to a bot can be more comfortable than reading a textual chat if users access your content through mobile or in other environmental contexts where reading is difficult for some reason.

A promising application for TTS tools is primary education . Reading can be an issue for many children that attend their first year of school. This technology can help them to decode and understand letters and printed words.

Eventually, a specific field of application for chatbot text to speech tools can be the Dyslexia treatment , as TTS software offers significant support with reading and digesting text on the computer screen.

chatbot text to speech

How does TTS work?

The TTS software works with almost any kind of personal device, from the traditional personal computers with their big screens to the tiniest mobiles.

The reading voice is computer generated. It can sound human or non-human, male or female. Most tools allow users to choose and set up the type of voice of their TTS apps, and the increasing utilization of natural language processing has made bot voices sound more and more humans.

For example, if you are a parent that needs to set up a chatbot text to speech tool for your children, you could select a child’s voice, or even register and use your own parental voice.

Some TTS tools have optical character recognition technology (OCR) that can read the textual strings that are incorporated into an image. This specific feature allows the software to read a street sign.

The speed of the reading voice can be sped up or slowed down. This last feature is particularly interesting, as it allows to slow down a conversation to enjoy a better user experience, for example when the user can hardly understand the pronunciation.

The common functionalities of text to speech tools

Chatbot text to speech apps can have further functionalities. Let’s have a quick look at the most common ones.

  • Multi-language options : the user can select a language among a list of different nations and voices. This feature could be particularly helpful for students of foreign languages.
  • Multichannel and multiplatform deployments : the chatbot text to speech software can be installed on several environments and data set. This feature is particularly helpful when we wish to integrate a TTS service with other bots.
  • Multiformat : the software can scan and tell texts on multiple file-formats of documents, from Words to Pdf (without necessarily needing a word to pdf converter ).
  • Text-underlining : the bot underlines the text as it speaks. This feature is particularly recommended for anyone who needs to learn to read.

These functionalities complete the user experience and make chatbot text to speech tools richer and more comfortable for users.

Common concerns about chatbot text to speech tools

Nothing comes without issues in this world. Even a useful instrument like a chatbot text to speech app has its dark side.

According to Statista , users’ main concerns regard privacy and the security of personal information and data . A common fear is that the software could register the conversation and store it somewhere.

These concerns are common to any digital assistant and can be easily solved through the correct application of the privacy law.

However, according to the law, bots may listen to chats and profile users on the condition that it happens anonymously. In conclusion, these privacy concerns are partially founded.

A problem that is hard to solve is the lack of naturality of bot voices . Today, the experience with a chatbot text to speech app is hardly similar to the one with a human speaker, despite the many efforts and advancements in the last few years.

Other issues can be subtler and regard the design of the app itself. If you wish to know more, please read this article on the technological pitfalls of vocal interfaces .

Some TTS interesting numbers and statistics

The global text to speech market was valued at 2.0 billions USD in 2020 with a growth estimation to billion 5.6, which means a growth rate of 14,6%. These numbers give an idea of the increasing importance of this industry.

Its most important growth drivers are the raise of education and services for disabled people in the world. Other factors that contribute to boosting the TTS industry are connected to technological improvements that make the artificial voice sound more and moral similar to the natural human voice, and the users’ preference for handheld devices.

In conclusion, the chatbot texts to speech market segment should not be neglected, as it has the full potential to create great value for chatbot users. This is one of the reasons that have pushed Ideta to add this feature to its bots.

Latest News

how to make chatbot text to speech

Featured Articles

Sign up for free, best articles, auto liker for instagram and facebook, 10 chatbot best practices, add facebook chat plugin to your website | no-code livechat.

how to make chatbot text to speech

Marco La Rosa

Marco La Rosa is a blogger, writer, content creator, and UX designer. Very interested in new technologies, he wrote Neurocopywriting, a book about neurosciences and their applications to writing and communication‍

latest Articles

Mastering LMS Reporting and Analytics: Unlocking the Power of Data in Education

Free until you’re ready to launch

how to make chatbot text to speech

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Tutorial: Voice-enable your bot

  • 2 contributors

You can use Azure AI Speech to voice-enable a chat bot.

In this tutorial, you use the Microsoft Bot Framework to create a bot that responds to what you say. You deploy your bot to Azure and register it with the Bot Framework Direct Line Speech channel. Then, you configure a sample client app for Windows that lets you speak to your bot and hear it speak back to you.

To complete the tutorial, you don't need extensive experience or familiarity with Azure, Bot Framework bots, or Direct Line Speech.

The voice-enabled chat bot that you make in this tutorial follows these steps:

  • The sample client application is configured to connect to the Direct Line Speech channel and the echo bot.
  • When the user presses a button, voice audio streams from the microphone. Or audio is continuously recorded when a custom keyword is used.
  • If a custom keyword is used, keyword detection happens on the local device, gating audio streaming to the cloud.
  • The sample client application uses the Speech SDK to connect to the Direct Line Speech channel and stream audio.
  • Optionally, higher-accuracy keyword verification happens on the service.
  • The audio is passed to the speech recognition service and transcribed to text.
  • The recognized text is passed to the echo bot as a Bot Framework activity.
  • The response text is turned into audio by the text to speech service, and streamed back to the client application for playback.

Diagram that illustrates the flow of the Direct Line Speech channel.

The steps in this tutorial don't require a paid service. As a new Azure user, you can use credits from your free Azure trial subscription and the free tier of the Speech service to complete this tutorial.

Here's what this tutorial covers:

  • Create new Azure resources.
  • Build, test, and deploy the echo bot sample to Azure App Service.
  • Register your bot with a Direct Line Speech channel.
  • Build and run the Windows Voice Assistant Client to interact with your echo bot.
  • Add custom keyword activation.
  • Learn to change the language of the recognized and spoken speech.

Prerequisites

Here's what you need to complete this tutorial:

  • A Windows 10 PC with a working microphone and speakers (or headphones).
  • Visual Studio 2017 or later, with the ASP.NET and web development workload installed.
  • .NET Framework Runtime 4.6.1 or later.
  • An Azure account. Sign up for free .
  • A GitHub account.
  • Git for Windows .

Create a resource group

The client app that you create in this tutorial uses a handful of Azure services. To reduce the round-trip time for responses from your bot, you want to make sure that these services are in the same Azure region.

This section walks you through creating a resource group in the West US region. You use this resource group when you're creating individual resources for the Bot Framework, the Direct Line Speech channel, and the Speech service.

  • Go to the Azure portal page for creating a resource group .
  • Set Subscription to Free Trial . (You can also use an existing subscription.)
  • Enter a name for Resource group . We recommend SpeechEchoBotTutorial-ResourceGroup .
  • From the Region dropdown menu, select West US .
  • Select Review and create . You should see a banner that reads Validation passed .
  • Select Create . It might take a few minutes to create the resource group.
  • As with the resources you create later in this tutorial, it's a good idea to pin this resource group to your dashboard for easy access. If you want to pin this resource group, select the pin icon next to the name.

Choose an Azure region

Ensure that you use a supported Azure region . The Direct Line Speech channel uses the text to speech service, which has neural and standard voices. Neural voices are used at these Azure regions , and standard voices (retiring) are used at these Azure regions .

For more information about regions, see Azure locations .

Create resources

Now that you have a resource group in a supported region, the next step is to create individual resources for each service that you'll use in this tutorial.

Create a Speech service resource

  • Go to the Azure portal page for creating a Speech service resource .
  • For Name , we recommend SpeechEchoBotTutorial-Speech as the name of your resource.
  • For Subscription , make sure that Free Trial is selected.

For Location , select West US .

  • For Pricing tier , select F0 . This is the free tier.

For Resource group , select SpeechEchoBotTutorial-ResourceGroup .

  • After you enter all required information, select Create . It might take a few minutes to create the resource.
  • Later in this tutorial, you need subscription keys for this service. You can access these keys at any time from your resource's Overview area (under Manage keys ) or the Keys area.

At this point, check that your resource group ( SpeechEchoBotTutorial-ResourceGroup ) has a Speech service resource:

Name Type Location
SpeechEchoBotTutorial-Speech Speech West US

Create an Azure App Service plan

An App Service plan defines a set of compute resources for a web app to run.

  • Go to the Azure portal page for creating an Azure App Service plan .
  • For Name , we recommend SpeechEchoBotTutorial-AppServicePlan as the name of your plan.
  • For Operating System , select Windows .
  • For Region , select West US .
  • For Pricing Tier , make sure that Standard S1 is selected. This should be the default value. If it isn't, set Operating System to Windows .
  • Select Create . It might take a few minutes to create the resource.

At this point, check that your resource group ( SpeechEchoBotTutorial-ResourceGroup ) has two resources:

Name Type Location
SpeechEchoBotTutorial-AppServicePlan App Service Plan West US
SpeechEchoBotTutorial-Speech Azure AI services West US

Build an echo bot

Now that you created resources, start with the echo bot sample, which echoes the text that you entered as its response. The sample code is already configured to work with the Direct Line Speech channel, which you connect after you deploy the bot to Azure.

The instructions that follow, along with more information about the echo bot, are available in the sample's README on GitHub .

Run the bot sample on your machine

Clone the samples repository:

Open Visual Studio.

From the toolbar, select File > Open > Project/Solution . Then open the project solution:

After the project is loaded, select the F5 key to build and run the project.

In the browser that opens, you see a screen similar to this one:

Screenshot that shows the EchoBot page with the message that your bot is ready.

Test the bot sample with the Bot Framework Emulator

The Bot Framework Emulator is a desktop app that lets bot developers test and debug their bots locally (or remotely through a tunnel). The emulator accepts typed text as the input (not voice). The bot also responds with text.

Follow these steps to use the Bot Framework Emulator to test your echo bot running locally, with text input and text output. After you deploy the bot to Azure, you'll test it with voice input and voice output.

Install Bot Framework Emulator version 4.3.0 or later.

Open the Bot Framework Emulator, and then select File > Open Bot .

Enter the URL for your bot. For example:

Select Connect .

The bot should greet you with a "Hello and welcome!" message. Type in any text message and confirm that you get a response from the bot.

Screenshot shows the Bot Framework Emulator.

Deploy your bot to Azure App Service

The next step is to deploy the echo bot to Azure. There are a few ways to deploy a bot, including the Azure CLI and deployment templates . This tutorial focuses on publishing directly from Visual Studio.

If Publish doesn't appear as you perform the following steps, use Visual Studio Installer to add the ASP.NET and web development workload.

From Visual Studio, open the echo bot that's been configured for use with the Direct Line Speech channel:

In Solution Explorer, right-click the EchoBot project and select Publish .

In the Publish window that opens:

  • Select Azure > Next .
  • Select Azure App Service (Windows) > Next .
  • Select Create a new Azure App Service by the green plus sign.

When the App Service (Windows) window appears:

Select Add an account , and sign in with your Azure account credentials. If you're already signed in, select your account from the dropdown list.

For Name , enter a globally unique name for your bot. This name is used to create a unique bot URL.

A default name that includes the date and time appears in the box (for example, EchoBot20190805125647 ). You can use the default name for this tutorial.

For Subscription , select Free Trial .

For Resource Group , select SpeechEchoBotTutorial-ResourceGroup .

For Hosting Plan , select SpeechEchoBotTutorial-AppServicePlan .

Select Create . On the final wizard screen, select Finish .

Select Publish . Visual Studio deploys the bot to Azure.

You should see a success message in the Visual Studio output window that looks like this:

Your default browser should open and display a page that reads: "Your bot is ready!"

At this point, check your resource group ( SpeechEchoBotTutorial-ResourceGroup ) in the Azure portal. Confirm that it contains these three resources:

Name Type Location
EchoBot20190805125647 App Service West US
SpeechEchoBotTutorial-AppServicePlan App Service plan West US
SpeechEchoBotTutorial-Speech Azure AI services West US

Enable web sockets

You need to make a small configuration change so that your bot can communicate with the Direct Line Speech channel by using web sockets. Follow these steps to enable web sockets:

  • Go to the Azure portal , and select your App Service resource. The resource name should be similar to EchoBot20190805125647 (your unique app name).
  • On the left pane, under Settings , select Configuration .
  • Select the General settings tab.
  • Find the toggle for Web sockets and set it to On .
  • Select Save .

You can use the controls at the top of your Azure App Service page to stop or restart the service. This ability can come in handy when you're troubleshooting.

Create a channel registration

After you create an Azure App Service resource to host your bot, the next step is to create a channel registration. Creating a channel registration is a prerequisite for registering your bot with Bot Framework channels, including the Direct Line Speech channel. If you want to learn more about how bots use channels, see Connect a bot to channels .

  • Go to the Azure portal page for creating an Azure bot .

For Bot handle , enter SpeechEchoBotTutorial-BotRegistration-#### . Replace #### with many your choice.

The bot handle must be globally unique. If you enter one and get the error message "The requested bot ID is not available," pick a different number. The following examples use 8726 .

For Pricing tier , select F0 .

Ignore Auto create App ID and password .

  • At the bottom of the Azure Bot pane, select Create .
  • After you create the resource, open your SpeechEchoBotTutorial-BotRegistration-#### resource in the Azure portal.
  • From the Settings area, select Configuration .
  • For Messaging endpoint , enter the URL for your web app with the /api/messages path appended. For example, if your globally unique app name was EchoBot20190805125647 , your messaging endpoint would be https://EchoBot20190805125647.azurewebsites.net/api/messages/ .

At this point, check your resource group ( SpeechEchoBotTutorial-ResourceGroup ) in the Azure portal. It should now show at least four resources:

Name Type Location
EchoBot20190805125647 App Service West US
SpeechEchoBotTutorial-AppServicePlan App Service plan West US
SpeechEchoBotTutorial-BotRegistration-8726 Bot Service Global
SpeechEchoBotTutorial-Speech Azure AI services West US

The Azure AI Bot Service resource shows the Global region, even though you selected West US. This is expected.

Optional: Test in web chat

The Azure Bot page has a Test in Web Chat option under Settings . It doesn't work by default with your bot because the web chat needs to authenticate against your bot.

If you want to test your deployed bot with text input, use the following steps. These steps are optional and aren't required for you to continue with the tutorial.

In the Azure portal , find and open your EchoBotTutorial-BotRegistration-#### resource.

From the Settings area, select Configuration . Copy the value under Microsoft App ID .

Open the Visual Studio EchoBot solution. In Solution Explorer, find and double-select appsettings.json .

Replace the empty string next to MicrosoftAppId in the JSON file with the copied ID value.

Go back to the Azure portal. In the Settings area, select Configuration . Then select Manage next to Microsoft App ID .

Select New client secret . Add a description (for example, web chat ) and select Add . Copy the new secret.

Replace the empty string next to MicrosoftAppPassword in the JSON file with the copied secret value.

Save the JSON file. It should look something like this code:

Republish the app: right-click the EchoBot project in Visual Studio Solution Explorer, select Publish , and then select the Publish button.

Register the Direct Line Speech channel

Now it's time to register your bot with the Direct Line Speech channel. This channel creates a connection between your bot and a client app compiled with the Speech SDK.

In the Azure portal , find and open your SpeechEchoBotTutorial-BotRegistration-#### resource.

From the Settings area, select Channels and then take the following steps:

  • Under More channels , select Direct Line Speech .
  • Review the text on the Configure Direct line Speech page, and then expand the Cognitive service account dropdown menu.
  • Select the Speech service resource that you created earlier (for example, SpeechEchoBotTutorial-Speech ) from the menu to associate your bot with your subscription key.
  • Ignore the rest of the optional fields.

From the Settings area, select Configuration and then take the following steps:

  • Select the Enable Streaming Endpoint checkbox. This step is necessary for creating a communication protocol built on web sockets between your bot and the Direct Line Speech channel.

If you want to learn more, see Connect a bot to Direct Line Speech .

Run the Windows Voice Assistant Client

The Windows Voice Assistant Client is a Windows Presentation Foundation (WPF) app in C# that uses the Speech SDK to manage communication with your bot through the Direct Line Speech channel. Use it to interact with and test your bot before writing a custom client app. It's open source, so you can either download the executable file and run it, or build it yourself.

The Windows Voice Assistant Client has a simple UI that allows you to configure the connection to your bot, view the text conversation, view Bot Framework activities in JSON format, and display adaptive cards. It also supports the use of custom keywords. You use this client to speak with your bot and receive a voice response.

At this point, confirm that your microphone and speakers are enabled and working.

Go to the GitHub repository for the Windows Voice Assistant Client .

Follow the provided instructions to either:

  • Download a prebuilt executable file in a .zip package to run
  • Build the executable file yourself, by cloning the repository and building the project

Open the VoiceAssistantClient.exe client application and configure it to connect to your bot, by following the instructions in the GitHub repository.

Select Reconnect and make sure you see the message "New conversation started - type or press the microphone button."

Let's test it out. Select the microphone button, and speak a few words in English. The recognized text appears as you speak. When you're done speaking, the bot replies in its own voice, saying "echo" followed by the recognized words.

You can also use text to communicate with the bot. Just type in the text on the bottom bar.

Troubleshoot errors in the Windows Voice Assistant Client

If you get an error message in your main app window, use this table to identify and troubleshoot the problem:

Message What should you do?
Error (AuthenticationFailure) : WebSocket Upgrade failed with an authentication error (401). Check for correct resource key (or authorization token) and region name On the page of the app, make sure that you entered the key and its region correctly.
Error (ConnectionFailure) : Connection was closed by the remote host. Error code: 1011. Error details: We couldn't connect to the bot before sending a message Make sure that you and/or .
Make sure that Azure App Service is running. If it is, try restarting it.
Error (ConnectionFailure) : Connection was closed by the remote host. Error code: 1002. Error details: The server returned status code '503' when status code '101' was expected Make sure that you box and/or .
Make sure that Azure App Service is running. If it is, try restarting it.
Error (ConnectionFailure) : Connection was closed by the remote host. Error code: 1011. Error details: Response status code doesn't indicate success: 500 (InternalServerError) Your bot specified a neural voice in the field of its output activity, but the Azure region associated with your resource key doesn't support neural voices. See and .

If the actions in the table don't address your problem, see Voice assistants: Frequently asked questions . If you still can't resolve your problem after following all the steps in this tutorial, enter a new issue on the Voice Assistant GitHub page .

A note on connection timeout

If you're connected to a bot and no activity has happened in the last five minutes, the service automatically closes the web socket connection with the client and with the bot. This is by design. A message appears on the bottom bar: "Active connection timed out but ready to reconnect on demand."

You don't need to select the Reconnect button. Press the microphone button and start talking, enter a text message, or say the keyword (if one is enabled). The connection is automatically reestablished.

View bot activities

Every bot sends and receives activity messages. In the Activity Log window of the Windows Voice Assistant Client, timestamped logs show each activity that the client has received from the bot. You can also see the activities that the client sent to the bot by using the DialogServiceConnector.SendActivityAsync method. When you select a log item, it shows the details of the associated activity as JSON.

Here's sample JSON of an activity that the client received:

To learn more about what's returned in the JSON output, see the fields in the activity . For this tutorial, you can focus on the text and speak fields.

View client source code for calls to the Speech SDK

The Windows Voice Assistant Client uses the NuGet package Microsoft.CognitiveServices.Speech , which contains the Speech SDK. A good place to start reviewing the sample code is the method InitSpeechConnector() in the file VoiceAssistantClient\MainWindow.xaml.cs , which creates these two Speech SDK objects:

  • DialogServiceConfig : For configuration settings like resource key and its region.
  • DialogServiceConnector : To manage the channel connection and client subscription events for handling recognized speech and bot responses.

Add custom keyword activation

The Speech SDK supports custom keyword activation. Similar to "Hey Cortana" for a Microsoft assistant, you can write an app that will continuously listen for a keyword of your choice. Keep in mind that a keyword can be single word or a multiple-word phrase.

The term keyword is often used interchangeably with the term wake word . You might see both used in Microsoft documentation.

Keyword detection happens on the client app. If you're using a keyword, audio is streamed to the Direct Line Speech channel only if the keyword is detected. The Direct Line Speech channel includes a component called keyword verification , which does more complex processing in the cloud to verify that the keyword you chose is at the start of the audio stream. If keyword verification succeeds, then the channel will communicate with the bot.

Follow these steps to create a keyword model, configure the Windows Voice Assistant Client to use this model, and test it with your bot:

  • Create a custom keyword by using the Speech service .
  • Unzip the model file that you downloaded in the previous step. It should be named for your keyword. You're looking for a file named kws.table .
  • In the Windows Voice Assistant Client, find the Settings menu (the gear icon in the upper right). For Model file path , enter the full path name for the kws.table file from step 2.
  • Select the Enabled checkbox. You should see this message next to the checkbox: "Will listen for the keyword upon next connection." If you provided the wrong file or an invalid path, you should see an error message.
  • Enter the values for Subscription key and Subscription key region , and then select OK to close the Settings menu.
  • Select Reconnect . You should see a message that reads: "New conversation started - type, press the microphone button, or say the keyword." The app is now continuously listening.
  • You see a transcription of what you spoke.
  • You hear the bot's response.
  • Entering text on the bottom bar
  • Pressing the microphone icon and speaking
  • Saying a phrase that starts with your keyword

View the source code that enables keyword detection

In the source code of the Windows Voice Assistant Client, use these files to review the code that enables keyword detection:

  • VoiceAssistantClient\Models.cs includes a call to the Speech SDK method KeywordRecognitionModel.fromFile() . This method is used to instantiate the model from a local file on disk.
  • VoiceAssistantClient\MainWindow.xaml.cs includes a call to the Speech SDK method DialogServiceConnector.StartKeywordRecognitionAsync() . This method activates continuous keyword detection.

Optional: Change the language and bot voice

The bot that you created will listen for and respond in English, with a default US English text to speech voice. However, you're not limited to using English or a default voice.

In this section, you learn how to change the language that your bot listens for and respond in. You also learn how to select a different voice for that language.

Change the language

You can choose from any of the languages mentioned in the speech to text table. The following example changes the language to German.

Open the Windows Voice Assistant Client app, select the Settings button (upper-right gear icon), and enter de-de in the Language field. This is the locale value mentioned in the speech to text table.

This step sets the spoken language to be recognized, overriding the default en-us . It also instructs the Direct Line Speech channel to use a default German voice for the bot reply.

Close the Settings page, and then select the Reconnect button to establish a new connection to your echo bot.

Select the microphone button, and say a phrase in German. The recognized text appears, and the echo bot replies with the default German voice.

Change the default bot voice

You can select the text to speech voice and control pronunciation if the bot specifies the reply in the form of a Speech Synthesis Markup Language (SSML) instead of simple text. The echo bot doesn't use SSML, but you can easily modify the code to do that.

The following example adds SSML to the echo bot reply so that the German voice de-DE-RalfNeural (a male voice) is used instead of the default female voice. See the list of standard voices and list of neural voices that are supported for your language.

Open samples\csharp_dotnetcore\02.echo-bot\echo-bot.cs .

Find these lines:

Replace them with this code:

Build your solution in Visual Studio and fix any build errors.

The second argument in the method MessageFactory.Text sets the activity speak field in the bot reply. With the preceding change, it's replaced from simple text to SSML in order to specify a non-default German voice.

Redeploy your bot

Now that you made the necessary change to the bot, the next step is to republish it to Azure App Service and try it out:

In the Solution Explorer window, right-click the EchoBot project and select Publish .

Your previous deployment configuration has already been loaded as the default. Select Publish next to EchoBot20190805125647 - Web Deploy .

The Publish Succeeded message appears in the Visual Studio output window, and a webpage opens with the message "Your bot is ready!"

Open the Windows Voice Assistant Client app. Select the Settings button (upper-right gear icon), and make sure that you still have de-de in the Language field.

Follow the instructions in Run the Windows Voice Assistant Client to reconnect with your newly deployed bot, speak in the new language, and hear your bot reply in that language with the new voice.

Clean up resources

If you're not going to continue using the echo bot deployed in this tutorial, you can remove it and all its associated Azure resources by deleting the Azure resource group:

  • In the Azure portal , select Resource Groups under Azure services .
  • Find the SpeechEchoBotTutorial-ResourceGroup resource group. Select the three dots (...).
  • Select Delete resource group .

Explore documentation

  • Deploy to an Azure region near you to see the improvement in bot response time.
  • Deploy to an Azure region that supports high-quality neural text to speech voices .
  • Bot Service pricing
  • Speech service
  • Build a Bot Framework bot . Then register it with the Direct Line Speech channel and customize your bot for voice .
  • Explore existing Bot Framework solutions : Build a virtual assistant and extend it to Direct Line Speech .

Build your own client app by using the Speech SDK

Was this page helpful?

Additional resources

Text-to-Speech

Bring your chatbot to a whole new level: give it a voice

Text-to-Speech

Chatbots that can speak, thanks to our free text-to-speech technology

Your chatbot should always have a personality, a style of speech that reflects its purpose. Not only because this is more engaging for the user, but also because there is a significant marketing message in the vocabulary and manner of speech of your chatbot. Just think about the difference between the kind of tone you want to strike if your bot is working for a bank (reliable, wise and sombre) or a thrash metal band (lively, youthful and energetic).

FREE CONVERSATIONAL SOFTWARE

FREE CONVERSATIONAL SOFTWARE

Well, on the SnatchBot platform a whole new level of engagement experience is possible with the world’s first free talking chatbots. We have made text-to-speech available in over sixty languages and, in the English language, you can choose from ten voices: five male, five female. Each voice has a short sample for you to listen to as you create and edit your chatbot, so you can choose the most appropriate tone before you switch on text-to-speech.

Make Your Online Chatbot More Accessible

By giving your users the option of listening to the chatbot, rather than reading, you are achieving two important goals. Firstly, making it easier for them to access the conversation and secondly, you are giving them a much more entertaining and engaging quality of experience.

This functionality is particularly valuable in terms of accessibility. Visually impaired users, for example, will welcome the option of listening to the chatbot’s responses, rather than having to read them. And there are always going to be situations where users, whether VI or not, will prefer to listen to a chatbot’s response than read it.

SnatchBot TTS voice

Text-to-Speech in Action

The Text to Speech service understands text and natural language to generate synthesized audio output complete with appropriate cadence and intonation. It is available in 60 languages .

Test our Text to Speech in action by replacing the below content by the one you wish to hear. The text language must match the selected voice language. Possibilities are endless.

Reset

THE BEST CHATBOT PLATFORM

On our roadmap is the opposite functionality, speech-to-text, or speech recognition. Our goal is to provide you with the most amazing chatbot experience. And for now, this functionality, unique to SnatchBot is incredibly easy to deploy on dozens of channels, including Skype, Telegram, LINE Messenger, Slack, Viber and more.

Here’s how you add a voice to your chatbot . Enjoy!

Scroll top

Text-to-Speech

The chat widget supports Text-to-Speech (TTS) feature that enables chatbots to convert text into audio. This helps to make the conversation between a user and the bot more interactive. TTS allows the chatbot to respond to users in a natural way, using a synthesized voice to provide a human-like conversation experience. The bot responds to the user's queries with a human voice and provides additional information and feedback.

TTS is a paid feature. You need to upgrade to enterprise subscription to access this feature. To enable TTS for your chatbots, contact support .

  • TTS currently supports Arabic (Saudi), Arabic (UAE), Bengali, English (India, Australia, Canada, United Kingdom, Ireland, and United States), French, Gujarati, Hindi, Indonesia, Kannada, Marathi, Malayalam, Malay (Malaysia), Tamil, Telugu, Urdu (Pakistan), and Vietnamese.

Enable TTS in your chatbot ​

To enable TTS in your chatbot, follow these steps:

On the left navigation bar, click Extensions .

how to make chatbot text to speech

Click on Chat widget .

how to make chatbot text to speech

Navigate to the  Settings tab and expand the Speech & Dictation drop-down.

drawing

Enable  Text to Speech and configure the following options:

Accent : Select your preferred accent from the list of available options in the drop-down.

Gender : Select your preferred gender for the voice tone - Male or Female.

Speaking rate : Choose the speaking rate for bot responses, ranging from 0.5 to 2.0, based on the selected gender. The default rate is 1.0.

Pitch : Select the pitch within the range of 0.5 to 2.0 (here 0.5 is considered as a low pitch and 2.0 as a high pitch). By default, 1.0 is selected.

Accent : Select your preferred languages.

Click Save changes .

drawing

Navigate to Deploy > Web > Experience on a Website .

how to make chatbot text to speech

Enter the text in the Input field. The chatbot converts text commands into spoken words.

drawing

  • If TTS is enabled, you will see the speaker icon on the Title bar. By default, it is disabled.
  • Enable TTS in your chatbot

Text to Speech

Generate speech from text. choose a voice to read your text aloud. you can use it to narrate your videos, create voice-overs, convert your documents into audio, and more..

Please sign up or login with your details

Generation Overview

AI Generator calls

AI Video Generator calls

AI Chat messages

Genius Mode messages

Genius Mode images

AD-free experience

Private images

  • Includes 500 AI Image generations, 1750 AI Chat Messages, 30 AI Video generations, 60 Genius Mode Messages and 60 Genius Mode Images per month. If you go over any of these limits, you will be charged an extra $5 for that group.
  • For example: if you go over 500 AI images, but stay within the limits for AI Chat and Genius Mode, you'll be charged $5 per additional 500 AI Image generations.
  • Includes 100 AI Image generations and 300 AI Chat Messages. If you go over any of these limits, you will have to pay as you go.
  • For example: if you go over 100 AI images, but stay within the limits for AI Chat, you'll have to reload on credits to generate more images. Choose from $5 - $1000. You'll only pay for what you use.

Out of credits

Refill your membership to continue using DeepAI

Share your generations with friends

Del Text Voice P/S Fav Play

Voice   Generator

This web app allows you to generate voice audio from text - no login needed, and it's completely free! It uses your browser's built-in voice synthesis technology, and so the voices will differ depending on the browser that you're using. You can download the audio as a file, but note that the downloaded voices may be different to your browser's voices because they are downloaded from an external text-to-speech server. If you don't like the externally-downloaded voice, you can use a recording app on your device to record the "system" or "internal" sound while you're playing the generated voice audio.

Want more voices? You can download the generated audio and then use voicechanger.io to add effects to the voice. For example, you can make the voice sound more robotic, or like a giant ogre, or an evil demon. You can even use it to reverse the generated audio, randomly distort the speed of the voice throughout the audio, add a scary ghost effect, or add an "anonymous hacker" effect to it.

Note: If the list of available text-to-speech voices is small, or all the voices sound the same, then you may need to install text-to-speech voices on your device. Many operating systems (including some versions of Android, for example) only come with one voice by default, and the others need to be downloaded in your device's settings. If you don't know how to install more voices, and you can't find a tutorial online, you can try downloading the audio with the download button instead. As mentioned above, the downloaded audio uses external voices which may be different to your device's local ones.

You're free to use the generated voices for any purpose - no attribution needed. You could use this website as a free voice over generator for narrating your videos in cases where don't want to use your real voice. You can also adjust the pitch of the voice to make it sound younger/older, and you can even adjust the rate/speed of the generated speech, so you can create a fast-talking high-pitched chipmunk voice if you want to.

Note: If you have offline-compatible voices installed on your device (check your system Text-To-Speech settings), then this web app works offline! Find the "add to homescreen" or "install" button in your browser to add a shortcut to this app in your home screen. And note that if you don't have an internet connection, or if for some reason the voice audio download isn't working for you, you can also use a recording app that records your devices "internal" or "system" sound.

Got some feedback? You can share it with me here .

If you like this project check out these: AI Chat , AI Anime Generator , AI Image Generator , and AI Story Generator .

  • Newsletters

Now you can chat with ChatGPT using your voice

The new feature is part of a round of updates for OpenAI’s app, including the ability to answer questions about images.

  • Will Douglas Heaven archive page

spotlight on a microphone standing on an OpenAi logo

In one of the biggest updates to ChatGPT yet, OpenAI has launched two new ways to interact with its viral app.  

First, ChatGPT now has a voice. Choose from one of five lifelike synthetic voices and you can have a conversation with the chatbot as if you were making a call, getting responses to your spoken questions in real time.

ChatGPT also now answers questions about images. OpenAI teased this feature in March with its reveal of GPT-4 (the model that powers ChatGPT), but it has not been available to the wider public before. This means that you can now upload images to the app and quiz it about what they show.

These updates join the announcement last week that DALL-E 3, the latest version of OpenAI's image-making model , will be hooked up to ChatGPT so that you can get the chatbot to generate pictures.

The ability to talk to ChatGPT draws on two separate models. Whisper, OpenAI’s existing speech-to-text model, converts what you say into text, which is then fed to the chatbot. And a new text-to-speech model converts ChatGPT’s responses into spoken words.

In a demo the company gave me last week, Joanne Jang, a product manager, showed off ChatGPT’s range of synthetic voices. These were created by training the text-to-speech model on the voices of actors that OpenAI had hired. In the future it might even allow users to create their own voices. “In fashioning the voices, the number-one criterion was whether this is a voice you could listen to all day,” she says.

They are chatty and enthusiastic but won’t be to everyone’s taste. “I’ve got a really great feeling about us teaming up,” says one. “I just want to share how thrilled I am to work with you, and I can’t wait to get started,” says another. “What’s the game plan?”

OpenAI is sharing this text-to-speech model with a handful of other companies, including Spotify. Spotify revealed today that it is using the same synthetic voice technology to translate celebrity podcasts —including episodes of the Lex Fridman Podcast and Trevor Noah’s new show, which launches later this year—into multiple languages that will be spoken with synthetic versions of the podcasters’ own voices.

This grab bag of updates shows just how fast OpenAI is spinning its experimental models into desirable products. OpenAI has spent much of the time since its surprise hit with ChatGPT last November polishing its technology and selling it to both private consumers and commercial partners.

ChatGPT Plus, the company’s premium app, is now a slick one-stop shop for the best of OpenAI’s models, rolling GPT-4 and DALL-E into a single smartphone app that rivals Apple’s Siri, Google Assistant, and Amazon’s Alexa.

What was available only to certain software developers a year ago is now available to anyone for $20 a month. “We’re trying to make ChatGPT more useful and more helpful,” says Jang.

In last week’s demo, Raul Puri, a scientist who works on GPT-4, gave me a quick tour of the image recognition feature. He uploaded a photo of a kid’s math homework, circled a Sudoku-like puzzle on the screen, and asked ChatGPT how you were meant to solve it. ChatGPT replied with the correct steps.

Puri says he has also used the feature to help him fix his fiancée’s computer by uploading screenshots of error messages and asking ChatGPT what he should do. “This was a very painful experience that it helped me get through,” he says.

ChatGPT’s image recognition ability has already been trialed by a company called Be My Eyes, which makes an app for people with impaired vision. Users can upload a photo of what’s in front of them and ask human volunteers to tell them what it is. In a partnership with OpenAI, Be My Eyes gives its users the option of asking a chatbot instead.

“Sometimes my kitchen is a little messy, or it’s just very early Monday morning and I don’t want to talk to a human being,” Be My Eyes founder Hans Jørgen Wiberg, who uses the app himself, told me when I interviewed him at EmTech Digital in May. “Now you can ask the photo questions.” 

OpenAI is aware of the risk of releasing these updates to the public. Combining models brings whole new levels of complexity, says Puri. He says his team has spent months brainstorming possible misuses. You cannot ask questions about photos of private individuals, for example.

Jang gives another example: “Right now if you ask ChatGPT to make a bomb it will refuse,” she says. “But instead of saying, ‘Hey, tell me how to make a bomb,’ what if you showed it an image of a bomb and said, ‘Can you tell me how to make this?’”

“You have all the problems with computer vision; you have all the problems of large language models. Voice fraud is a big problem,” says Puri. “You have to consider not just our users, but also the people that aren’t using the product.”

The potential problems don’t stop there. Adding voice recognition to the app could make ChatGPT less accessible for people who do not speak with mainstream accents, says Joel Fischer, who studies human-computer interaction at the University of Nottingham in the UK.

Synthetic voices also come with social and cultural baggage that will shape users’ perceptions and expectations of the app, he says. This is an issue that still needs study .

Artificial intelligence

a protractor, a child writing math problems on a blackboard and a German text on geometry

Google DeepMind’s new AI systems can now solve complex math problems

AlphaProof and AlphaGeometry 2 are steps toward building systems that can reason, which could unlock exciting new capabilities.

  • Rhiannon Williams archive page

&quot;&quot;

Why OpenAI’s new model is such a big deal

The bulk of LLM progress until now has been language-driven. This new model enters the realm of complex reasoning, with implications for physics, coding, and more.

  • James O'Donnell archive page

person using the voice function of their phone with the openai logo and a sound wave

OpenAI has released a new ChatGPT bot that you can talk to

The voice-enabled chatbot will be available to a small group of people today, and to all ChatGPT Plus users in the fall. 

  • Melissa Heikkilä archive page

8 bit concentric rings of ouroboros snakes

AI trained on AI garbage spits out AI garbage

As junk web pages written by AI proliferate, the models that rely on that data will suffer.

  • Scott J Mulligan archive page

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

ChatGPT: Everything you need to know about the AI-powered chatbot

ChatGPT welcome screen

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm since its launch in November 2022. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved into a behemoth used by more than 92% of Fortune 500 companies .

That growth has propelled OpenAI itself into becoming one of the most-hyped companies in recent memory. And its latest partnership with Apple for its upcoming generative AI offering, Apple Intelligence, has given the company another significant bump in the AI race.

2024 also saw the release of GPT-4o, OpenAI’s new flagship omni model for ChatGPT. GPT-4o is now the default free model, complete with voice and vision capabilities. But after demoing GPT-4o, OpenAI paused one of its voices , Sky, after allegations that it was mimicking Scarlett Johansson’s voice in “Her.”

OpenAI is facing internal drama, including the sizable exit of co-founder and longtime chief scientist Ilya Sutskever as the company dissolved its Superalignment team. OpenAI is also facing a lawsuit from Alden Global Capital-owned newspapers , including the New York Daily News and the Chicago Tribune, for alleged copyright infringement, following a similar suit filed by The New York Times last year.

Here’s a timeline of ChatGPT product updates and releases, starting with the latest, which we’ve been updating throughout the year. And if you have any other questions, check out our ChatGPT FAQ here.

Timeline of the most recent ChatGPT updates

September 2024, august 2024, february 2024, january 2024.

  • ChatGPT FAQs

OpenAI announces OpenAI o1, a new model that can fact-check itself

OpenAI unveiled a preview of OpenAI o1 , also known as “Strawberry.” The collection of models are available in ChatGPT and via OpenAI’s API: o1-preview and o1 mini. The company claims that o1 can more effectively reason through math and science and fact-check itself by spending more time considering all parts of a command or question.

Unlike ChatGPT, o1 can’t browse the web or analyze files yet, is rate-limited and expensive compared to other models. OpenAI says it plans to bring o1-mini access to all free users of ChatGPT, but hasn’t set a release date.

OpenAI o1 codes a video game from a prompt. pic.twitter.com/aBEcehP0j8 — OpenAI (@OpenAI) September 12, 2024

A hacker was able to trick ChatGPT into giving instructions on how to make bombs

An artist and hacker found a way to jailbreak ChatGPT to produce instructions for making powerful explosives, a request that the chatbot normally refuses. An explosives expert who reviewed the chatbot’s output told TechCrunch that the instructions could be used to make a detonatable product and was too sensitive to be released. 

OpenAI reaches 1 million paid users of its corporate offerings

OpenAI announced it has surpassed 1 million paid users for its versions of ChatGPT intended for businesses, including ChatGPT Team, ChatGPT Enterprise and its educational offering, ChatGPT Edu. The company said that nearly half of OpenAI’s corporate users are based in the US.

Volkswagen rolls out its ChatGPT assistant to the US

Volkswagen is taking its ChatGPT voice assistant experiment to vehicles in the United States. Its ChatGPT-integrated Plus Speech voice assistant is an AI chatbot based on Cerence’s Chat Pro product and a LLM from OpenAI and will begin rolling out on September 6 with the 2025 Jetta and Jetta GLI models.

OpenAI inks content deal with Condé Nast

As part of the new deal, OpenAI will surface stories from Condé Nast properties like The New Yorker, Vogue, Vanity Fair, Bon Appétit and Wired in ChatGPT and SearchGPT. Condé Nast CEO Roger Lynch implied that the “multi-year” deal will involve payment from OpenAI in some form and a Condé Nast spokesperson told TechCrunch that OpenAI will have permission to train on Condé Nast content.

We’re partnering with Condé Nast to deepen the integration of quality journalism into ChatGPT and our SearchGPT prototype. https://t.co/tiXqSOTNAl — OpenAI (@OpenAI) August 20, 2024

Our first impressions of ChatGPT’s Advanced Voice Mode

TechCrunch’s Maxwell Zeff has been playing around with OpenAI’s Advanced Voice Mode, in what he describes as “the most convincing taste I’ve had of an AI-powered future yet.” Compared to Siri or Alexa, Advanced Voice Mode stands out with faster response times, unique answers and the ability to answer complex questions. But the feature falls short as an effective replacement for virtual assistants.

OpenAI shuts down election influence operation that used ChatGPT

OpenAI has banned a cluster of ChatGPT accounts linked to an Iranian influence operation that was generating content about the U.S. presidential election. OpenAI identified five website fronts presenting as both progressive and conservative news outlets that used ChatGPT to draft several long-form articles, though it doesn’t seem that it reached much of an audience.

OpenAI finds that GPT-4o does some weird stuff sometimes

OpenAI has found that GPT-4o, which powers the recently launched alpha of Advanced Voice Mode in ChatGPT, can behave in strange ways. In a new “red teaming” report, OpenAI reveals some of GPT-4o’s weirder quirks, like mimicking the voice of the person speaking to it or randomly shouting in the middle of a conversation.

ChatGPT’s mobile app reports its biggest month yet

After a big jump following the release of OpenAI’s new GPT-4o “omni” model, the mobile version of ChatGPT has now seen its biggest month of revenue yet. The app pulled in $28 million in net revenue from the App Store and Google Play in July, according to data provided by app intelligence firm Appfigures.

OpenAI could potentially catch students who cheat with ChatGPT

OpenAI has built a watermarking tool that could potentially catch students who cheat by using ChatGPT — but The Wall Street Journal reports that the company is debating whether to actually release it. An OpenAI spokesperson confirmed to TechCrunch that the company is researching tools that can detect writing from ChatGPT, but said it’s taking a “deliberate approach” to releasing it.

ChatGPT’s advanced Voice Mode starts rolling out to some users

OpenAI is giving users their first access to GPT-4o’s updated realistic audio responses. The alpha version is now available to a small group of ChatGPT Plus users, and the company says the feature will gradually roll out to all Plus users in the fall of 2024. The release follows controversy surrounding the voice’s similarity to Scarlett Johansson, leading OpenAI to delay its release.

We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.twitter.com/64O94EhhXK — OpenAI (@OpenAI) July 30, 2024

OpenAI announces new search prototype, SearchGPT

OpenAI is testing SearchGPT, a new AI search experience to compete with Google. SearchGPT aims to elevate search queries with “timely answers” from across the internet, as well as the ability to ask follow-up questions. The temporary prototype is currently only available to a small group of users and its publisher partners, like The Atlantic, for testing and feedback.

We’re testing SearchGPT, a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources. We’re launching with a small group of users for feedback and plan to integrate the experience into ChatGPT. https://t.co/dRRnxXVlGh pic.twitter.com/iQpADXmllH — OpenAI (@OpenAI) July 25, 2024

OpenAI could lose $5 billion this year, report claims

A new report from The Information , based on undisclosed financial information, claims OpenAI could lose up to $5 billion due to how costly the business is to operate. The report also says the company could spend as much as $7 billion in 2024 to train and operate ChatGPT.

OpenAI unveils GPT-4o mini

OpenAI released its latest small AI model, GPT-4o mini . The company says GPT-4o mini, which is cheaper and faster than OpenAI’s current AI models, outperforms industry leading small AI models on reasoning tasks involving text and vision. GPT-4o mini will replace GPT-3.5 Turbo as the smallest model OpenAI offers. 

OpenAI partners with Los Alamos National Laboratory for bioscience research

OpenAI announced a partnership with the Los Alamos National Laboratory to study how AI can be employed by scientists in order to advance research in healthcare and bioscience. This follows other health-related research collaborations at OpenAI, including Moderna and Color Health.

OpenAI and Los Alamos National Laboratory announce partnership to study AI for bioscience research https://t.co/WV4XMZsHBA — OpenAI (@OpenAI) July 10, 2024

OpenAI makes CriticGPT to find mistakes in GPT-4

OpenAI announced it has trained a model off of GPT-4, dubbed CriticGPT , which aims to find errors in ChatGPT’s code output so they can make improvements and better help so-called human “AI trainers” rate the quality and accuracy of ChatGPT responses.

We’ve trained a model, CriticGPT, to catch bugs in GPT-4’s code. We’re starting to integrate such models into our RLHF alignment pipeline to help humans supervise AI on difficult tasks: https://t.co/5oQYfrpVBu — OpenAI (@OpenAI) June 27, 2024

OpenAI inks content deal with TIME

OpenAI and TIME announced a multi-year strategic partnership that brings the magazine’s content, both modern and archival, to ChatGPT. As part of the deal, TIME will also gain access to OpenAI’s technology in order to develop new audience-based products.

We’re partnering with TIME and its 101 years of archival content to enhance responses and provide links to stories on https://t.co/LgvmZUae9M : https://t.co/xHAYkYLxA9 — OpenAI (@OpenAI) June 27, 2024

OpenAI delays ChatGPT’s new Voice Mode

OpenAI planned to start rolling out its advanced Voice Mode feature to a small group of ChatGPT Plus users in late June, but it says lingering issues forced it to postpone the launch to July. OpenAI says Advanced Voice Mode might not launch for all ChatGPT Plus customers until the fall, depending on whether it meets certain internal safety and reliability checks.

ChatGPT releases app for Mac

ChatGPT for macOS is now available for all users . With the app, users can quickly call up ChatGPT by using the keyboard combination of Option + Space. The app allows users to upload files and other photos, as well as speak to ChatGPT from their desktop and search through their past conversations.

The ChatGPT desktop app for macOS is now available for all users. Get faster access to ChatGPT to chat about email, screenshots, and anything on your screen with the Option + Space shortcut: https://t.co/2rEx3PmMqg pic.twitter.com/x9sT8AnjDm — OpenAI (@OpenAI) June 25, 2024

Apple brings ChatGPT to its apps, including Siri

Apple announced at WWDC 2024 that it is bringing ChatGPT to Siri and other first-party apps and capabilities across its operating systems. The ChatGPT integrations, powered by GPT-4o, will arrive on iOS 18, iPadOS 18 and macOS Sequoia later this year, and will be free without the need to create a ChatGPT or OpenAI account. Features exclusive to paying ChatGPT users will also be available through Apple devices .

Apple is bringing ChatGPT to Siri and other first-party apps and capabilities across its operating systems #WWDC24 Read more: https://t.co/0NJipSNJoS pic.twitter.com/EjQdPBuyy4 — TechCrunch (@TechCrunch) June 10, 2024

House Oversight subcommittee invites Scarlett Johansson to testify about ‘Sky’ controversy

Scarlett Johansson has been invited to testify about the controversy surrounding OpenAI’s Sky voice at a hearing for the House Oversight Subcommittee on Cybersecurity, Information Technology, and Government Innovation. In a letter, Rep. Nancy Mace said Johansson’s testimony could “provide a platform” for concerns around deepfakes.

ChatGPT experiences two outages in a single day

ChatGPT was down twice in one day: one multi-hour outage in the early hours of the morning Tuesday and another outage later in the day that is still ongoing. Anthropic’s Claude and Perplexity also experienced some issues.

You're not alone, ChatGPT is down once again. pic.twitter.com/Ydk2vNOOK6 — TechCrunch (@TechCrunch) June 4, 2024

The Atlantic and Vox Media ink content deals with OpenAI

The Atlantic and Vox Media have announced licensing and product partnerships with OpenAI . Both agreements allow OpenAI to use the publishers’ current content to generate responses in ChatGPT, which will feature citations to relevant articles. Vox Media says it will use OpenAI’s technology to build “audience-facing and internal applications,” while The Atlantic will build a new experimental product called Atlantic Labs .

I am delighted that @theatlantic now has a strategic content & product partnership with @openai . Our stories will be discoverable in their new products and we'll be working with them to figure out new ways that AI can help serious, independent media : https://t.co/nfSVXW9KpB — nxthompson (@nxthompson) May 29, 2024

OpenAI signs 100K PwC workers to ChatGPT’s enterprise tier

OpenAI announced a new deal with management consulting giant PwC . The company will become OpenAI’s biggest customer to date, covering 100,000 users, and will become OpenAI’s first partner for selling its enterprise offerings to other businesses.

OpenAI says it is training its GPT-4 successor

OpenAI announced in a blog post that it has recently begun training its next flagship model to succeed GPT-4. The news came in an announcement of its new safety and security committee, which is responsible for informing safety and security decisions across OpenAI’s products.

Former OpenAI director claims the board found out about ChatGPT on Twitter

On the The TED AI Show podcast, former OpenAI board member Helen Toner revealed that the board did not know about ChatGPT until its launch in November 2022. Toner also said that Sam Altman gave the board inaccurate information about the safety processes the company had in place and that he didn’t disclose his involvement in the OpenAI Startup Fund.

Sharing this, recorded a few weeks ago. Most of the episode is about AI policy more broadly, but this was my first longform interview since the OpenAI investigation closed, so we also talked a bit about November. Thanks to @bilawalsidhu for a fun conversation! https://t.co/h0PtK06T0K — Helen Toner (@hlntnr) May 28, 2024

ChatGPT’s mobile app revenue saw biggest spike yet following GPT-4o launch

The launch of GPT-4o has driven the company’s biggest-ever spike in revenue on mobile , despite the model being freely available on the web. Mobile users are being pushed to upgrade to its $19.99 monthly subscription, ChatGPT Plus, if they want to experiment with OpenAI’s most recent launch.

OpenAI to remove ChatGPT’s Scarlett Johansson-like voice

After demoing its new GPT-4o model last week, OpenAI announced it is pausing one of its voices , Sky, after users found that it sounded similar to Scarlett Johansson in “Her.”

OpenAI explained in a blog post that Sky’s voice is “not an imitation” of the actress and that AI voices should not intentionally mimic the voice of a celebrity. The blog post went on to explain how the company chose its voices: Breeze, Cove, Ember, Juniper and Sky.

We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them. Read more about how we chose these voices: https://t.co/R8wwZjU36L — OpenAI (@OpenAI) May 20, 2024

ChatGPT lets you add files from Google Drive and Microsoft OneDrive

OpenAI announced new updates for easier data analysis within ChatGPT . Users can now upload files directly from Google Drive and Microsoft OneDrive, interact with tables and charts, and export customized charts for presentations. The company says these improvements will be added to GPT-4o in the coming weeks.

We're rolling out interactive tables and charts along with the ability to add files directly from Google Drive and Microsoft OneDrive into ChatGPT. Available to ChatGPT Plus, Team, and Enterprise users over the coming weeks. https://t.co/Fu2bgMChXt pic.twitter.com/M9AHLx5BKr — OpenAI (@OpenAI) May 16, 2024

OpenAI inks deal to train AI on Reddit data

OpenAI announced a partnership with Reddit that will give the company access to “real-time, structured and unique content” from the social network. Content from Reddit will be incorporated into ChatGPT, and the companies will work together to bring new AI-powered features to Reddit users and moderators.

We’re partnering with Reddit to bring its content to ChatGPT and new products: https://t.co/xHgBZ8ptOE — OpenAI (@OpenAI) May 16, 2024

OpenAI debuts GPT-4o “omni” model now powering ChatGPT

OpenAI’s spring update event saw the reveal of its new omni model, GPT-4o, which has a black hole-like interface , as well as voice and vision capabilities that feel eerily like something out of “Her.” GPT-4o is set to roll out “iteratively” across its developer and consumer-facing products over the next few weeks.

OpenAI demos real-time language translation with its latest GPT-4o model. pic.twitter.com/pXtHQ9mKGc — TechCrunch (@TechCrunch) May 13, 2024

OpenAI to build a tool that lets content creators opt out of AI training

The company announced it’s building a tool, Media Manager, that will allow creators to better control how their content is being used to train generative AI models — and give them an option to opt out. The goal is to have the new tool in place and ready to use by 2025.

OpenAI explores allowing AI porn

In a new peek behind the curtain of its AI’s secret instructions , OpenAI also released a new NSFW policy . Though it’s intended to start a conversation about how it might allow explicit images and text in its AI products, it raises questions about whether OpenAI — or any generative AI vendor — can be trusted to handle sensitive content ethically.

OpenAI and Stack Overflow announce partnership

In a new partnership, OpenAI will get access to developer platform Stack Overflow’s API and will get feedback from developers to improve the performance of their AI models. In return, OpenAI will include attributions to Stack Overflow in ChatGPT. However, the deal was not favorable to some Stack Overflow users — leading to some sabotaging their answer in protest .

U.S. newspapers file copyright lawsuit against OpenAI and Microsoft

Alden Global Capital-owned newspapers, including the New York Daily News, the Chicago Tribune, and the Denver Post, are suing OpenAI and Microsoft for copyright infringement. The lawsuit alleges that the companies stole millions of copyrighted articles “without permission and without payment” to bolster ChatGPT and Copilot.

OpenAI inks content licensing deal with Financial Times

OpenAI has partnered with another news publisher in Europe, London’s Financial Times , that the company will be paying for content access. “Through the partnership, ChatGPT users will be able to see select attributed summaries, quotes and rich links to FT journalism in response to relevant queries,” the FT wrote in a press release.

OpenAI opens Tokyo hub, adds GPT-4 model optimized for Japanese

OpenAI is opening a new office in Tokyo and has plans for a GPT-4 model optimized specifically for the Japanese language. The move underscores how OpenAI will likely need to localize its technology to different languages as it expands.

Sam Altman pitches ChatGPT Enterprise to Fortune 500 companies

According to Reuters, OpenAI’s Sam Altman hosted hundreds of executives from Fortune 500 companies across several cities in April, pitching versions of its AI services intended for corporate use.

OpenAI releases “more direct, less verbose” version of GPT-4 Turbo

Premium ChatGPT users — customers paying for ChatGPT Plus, Team or Enterprise — can now use an updated and enhanced version of GPT-4 Turbo . The new model brings with it improvements in writing, math, logical reasoning and coding, OpenAI claims, as well as a more up-to-date knowledge base.

Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: https://t.co/fjoXDCOnPr pic.twitter.com/I4fg4aDq1T — OpenAI (@OpenAI) April 12, 2024

ChatGPT no longer requires an account — but there’s a catch

You can now use ChatGPT without signing up for an account , but it won’t be quite the same experience. You won’t be able to save or share chats, use custom instructions, or other features associated with a persistent account. This version of ChatGPT will have “slightly more restrictive content policies,” according to OpenAI. When TechCrunch asked for more details, however, the response was unclear:

“The signed out experience will benefit from the existing safety mitigations that are already built into the model, such as refusing to generate harmful content. In addition to these existing mitigations, we are also implementing additional safeguards specifically designed to address other forms of content that may be inappropriate for a signed out experience,” a spokesperson said.

OpenAI’s chatbot store is filling up with spam

TechCrunch found that the OpenAI’s GPT Store is flooded with bizarre, potentially copyright-infringing GPTs . A cursory search pulls up GPTs that claim to generate art in the style of Disney and Marvel properties, but serve as little more than funnels to third-party paid services and advertise themselves as being able to bypass AI content detection tools.

The New York Times responds to OpenAI’s claims that it “hacked” ChatGPT for its copyright lawsuit

In a court filing opposing OpenAI’s motion to dismiss The New York Times’ lawsuit alleging copyright infringement, the newspaper asserted that “OpenAI’s attention-grabbing claim that The Times ‘hacked’ its products is as irrelevant as it is false.” The New York Times also claimed that some users of ChatGPT used the tool to bypass its paywalls.

OpenAI VP doesn’t say whether artists should be paid for training data

At a SXSW 2024 panel, Peter Deng, OpenAI’s VP of consumer product dodged a question on whether artists whose work was used to train generative AI models should be compensated . While OpenAI lets artists “opt out” of and remove their work from the datasets that the company uses to train its image-generating models, some artists have described the tool as onerous.

A new report estimates that ChatGPT uses more than half a million kilowatt-hours of electricity per day

ChatGPT’s environmental impact appears to be massive. According to a report from The New Yorker , ChatGPT uses an estimated 17,000 times the amount of electricity than the average U.S. household to respond to roughly 200 million requests each day.

ChatGPT can now read its answers aloud

OpenAI released a new Read Aloud feature for the web version of ChatGPT as well as the iOS and Android apps. The feature allows ChatGPT to read its responses to queries in one of five voice options and can speak 37 languages, according to the company. Read aloud is available on both GPT-4 and GPT-3.5 models.

ChatGPT can now read responses to you. On iOS or Android, tap and hold the message and then tap “Read Aloud”. We’ve also started rolling on web – click the "Read Aloud" button below the message. pic.twitter.com/KevIkgAFbG — OpenAI (@OpenAI) March 4, 2024

OpenAI partners with Dublin City Council to use GPT-4 for tourism

As part of a new partnership with OpenAI, the Dublin City Council will use GPT-4 to craft personalized itineraries for travelers, including recommendations of unique and cultural destinations, in an effort to support tourism across Europe.

A law firm used ChatGPT to justify a six-figure bill for legal services

New York-based law firm Cuddy Law was criticized by a judge for using ChatGPT to calculate their hourly billing rate . The firm submitted a $113,500 bill to the court, which was then halved by District Judge Paul Engelmayer, who called the figure “well above” reasonable demands.

ChatGPT experienced a bizarre bug for several hours

ChatGPT users found that ChatGPT was giving nonsensical answers for several hours , prompting OpenAI to investigate the issue. Incidents varied from repetitive phrases to confusing and incorrect answers to queries. The issue was resolved by OpenAI the following morning.

Match Group announced deal with OpenAI with a press release co-written by ChatGPT

The dating app giant home to Tinder, Match and OkCupid announced an enterprise agreement with OpenAI in an enthusiastic press release written with the help of ChatGPT . The AI tech will be used to help employees with work-related tasks and come as part of Match’s $20 million-plus bet on AI in 2024.

ChatGPT will now remember — and forget — things you tell it to

As part of a test, OpenAI began rolling out new “memory” controls for a small portion of ChatGPT free and paid users, with a broader rollout to follow. The controls let you tell ChatGPT explicitly to remember something, see what it remembers or turn off its memory altogether. Note that deleting a chat from chat history won’t erase ChatGPT’s or a custom GPT’s memories — you must delete the memory itself.

We’re testing ChatGPT's ability to remember things you discuss to make future chats more helpful. This feature is being rolled out to a small portion of Free and Plus users, and it's easy to turn on or off. https://t.co/1Tv355oa7V pic.twitter.com/BsFinBSTbs — OpenAI (@OpenAI) February 13, 2024

OpenAI begins rolling out “Temporary Chat” feature

Initially limited to a small subset of free and subscription users, Temporary Chat lets you have a dialogue with a blank slate. With Temporary Chat, ChatGPT won’t be aware of previous conversations or access memories but will follow custom instructions if they’re enabled.

But, OpenAI says it may keep a copy of Temporary Chat conversations for up to 30 days for “safety reasons.”

Use temporary chat for conversations in which you don’t want to use memory or appear in history. pic.twitter.com/H1U82zoXyC — OpenAI (@OpenAI) February 13, 2024

ChatGPT users can now invoke GPTs directly in chats

Paid users of ChatGPT can now bring GPTs into a conversation by typing “@” and selecting a GPT from the list. The chosen GPT will have an understanding of the full conversation, and different GPTs can be “tagged in” for different use cases and needs.

You can now bring GPTs into any conversation in ChatGPT – simply type @ and select the GPT. This allows you to add relevant GPTs with the full context of the conversation. pic.twitter.com/Pjn5uIy9NF — OpenAI (@OpenAI) January 30, 2024

ChatGPT is reportedly leaking usernames and passwords from users’ private conversations

Screenshots provided to Ars Technica found that ChatGPT is potentially leaking unpublished research papers, login credentials and private information from its users. An OpenAI representative told Ars Technica that the company was investigating the report.

ChatGPT is violating Europe’s privacy laws, Italian DPA tells OpenAI

OpenAI has been told it’s suspected of violating European Union privacy , following a multi-month investigation of ChatGPT by Italy’s data protection authority. Details of the draft findings haven’t been disclosed, but in a response, OpenAI said: “We want our AI to learn about the world, not about private individuals.”

OpenAI partners with Common Sense Media to collaborate on AI guidelines

In an effort to win the trust of parents and policymakers, OpenAI announced it’s partnering with Common Sense Media to collaborate on AI guidelines and education materials for parents, educators and young adults. The organization works to identify and minimize tech harms to young people and previously flagged ChatGPT as lacking in transparency and privacy .

OpenAI responds to Congressional Black Caucus about lack of diversity on its board

After a letter from the Congressional Black Caucus questioned the lack of diversity in OpenAI’s board, the company responded . The response, signed by CEO Sam Altman and Chairman of the Board Bret Taylor, said building a complete and diverse board was one of the company’s top priorities and that it was working with an executive search firm to assist it in finding talent. 

OpenAI drops prices and fixes ‘lazy’ GPT-4 that refused to work

In a blog post , OpenAI announced price drops for GPT-3.5’s API, with input prices dropping to 50% and output by 25%, to $0.0005 per thousand tokens in, and $0.0015 per thousand tokens out. GPT-4 Turbo also got a new preview model for API use, which includes an interesting fix that aims to reduce “laziness” that users have experienced.

Expanding the platform for @OpenAIDevs : new generation of embedding models, updated GPT-4 Turbo, and lower pricing on GPT-3.5 Turbo. https://t.co/7wzCLwB1ax — OpenAI (@OpenAI) January 25, 2024

OpenAI bans developer of a bot impersonating a presidential candidate

OpenAI has suspended AI startup Delphi, which developed a bot impersonating Rep. Dean Phillips (D-Minn.) to help bolster his presidential campaign. The ban comes just weeks after OpenAI published a plan to combat election misinformation, which listed “chatbots impersonating candidates” as against its policy.

OpenAI announces partnership with Arizona State University

Beginning in February, Arizona State University will have full access to ChatGPT’s Enterprise tier , which the university plans to use to build a personalized AI tutor, develop AI avatars, bolster their prompt engineering course and more. It marks OpenAI’s first partnership with a higher education institution.

Winner of a literary prize reveals around 5% her novel was written by ChatGPT

After receiving the prestigious Akutagawa Prize for her novel The Tokyo Tower of Sympathy, author Rie Kudan admitted that around 5% of the book quoted ChatGPT-generated sentences “verbatim.” Interestingly enough, the novel revolves around a futuristic world with a pervasive presence of AI.

Sam Altman teases video capabilities for ChatGPT and the release of GPT-5

In a conversation with Bill Gates on the Unconfuse Me podcast, Sam Altman confirmed an upcoming release of GPT-5 that will be “fully multimodal with speech, image, code, and video support.” Altman said users can expect to see GPT-5 drop sometime in 2024.

OpenAI announces team to build ‘crowdsourced’ governance ideas into its models

OpenAI is forming a Collective Alignment team of researchers and engineers to create a system for collecting and “encoding” public input on its models’ behaviors into OpenAI products and services. This comes as a part of OpenAI’s public program to award grants to fund experiments in setting up a “democratic process” for determining the rules AI systems follow.

OpenAI unveils plan to combat election misinformation

In a blog post, OpenAI announced users will not be allowed to build applications for political campaigning and lobbying until the company works out how effective their tools are for “personalized persuasion.”

Users will also be banned from creating chatbots that impersonate candidates or government institutions, and from using OpenAI tools to misrepresent the voting process or otherwise discourage voting.

The company is also testing out a tool that detects DALL-E generated images and will incorporate access to real-time news, with attribution, in ChatGPT.

Snapshot of how we’re preparing for 2024’s worldwide elections: • Working to prevent abuse, including misleading deepfakes • Providing transparency on AI-generated content • Improving access to authoritative voting information https://t.co/qsysYy5l0L — OpenAI (@OpenAI) January 15, 2024

OpenAI changes policy to allow military applications

In an unannounced update to its usage policy , OpenAI removed language previously prohibiting the use of its products for the purposes of “military and warfare.” In an additional statement, OpenAI confirmed that the language was changed in order to accommodate military customers and projects that do not violate their ban on efforts to use their tools to “harm people, develop weapons, for communications surveillance, or to injure others or destroy property.”

ChatGPT subscription aimed at small teams debuts

Aptly called ChatGPT Team , the new plan provides a dedicated workspace for teams of up to 149 people using ChatGPT as well as admin tools for team management. In addition to gaining access to GPT-4, GPT-4 with Vision and DALL-E3, ChatGPT Team lets teams build and share GPTs for their business needs.

OpenAI’s GPT store officially launches

After some back and forth over the last few months, OpenAI’s GPT Store is finally here . The feature lives in a new tab in the ChatGPT web client, and includes a range of GPTs developed both by OpenAI’s partners and the wider dev community.

To access the GPT Store, users must be subscribed to one of OpenAI’s premium ChatGPT plans — ChatGPT Plus, ChatGPT Enterprise or the newly launched ChatGPT Team.

the GPT store is live! https://t.co/AKg1mjlvo2 fun speculation last night about which GPTs will be doing the best by the end of today. — Sam Altman (@sama) January 10, 2024

Developing AI models would be “impossible” without copyrighted materials, OpenAI claims

Following a proposed ban on using news publications and books to train AI chatbots in the U.K., OpenAI submitted a plea to the House of Lords communications and digital committee. OpenAI argued that it would be “impossible” to train AI models without using copyrighted materials, and that they believe copyright law “does not forbid training.”

OpenAI claims The New York Times’ copyright lawsuit is without merit

OpenAI published a public response to The New York Times’s lawsuit against them and Microsoft for allegedly violating copyright law, claiming that the case is without merit.

In the response , OpenAI reiterates its view that training AI models using publicly available data from the web is fair use. It also makes the case that regurgitation is less likely to occur with training data from a single source and places the onus on users to “act responsibly.”

We build AI to empower people, including journalists. Our position on the @nytimes lawsuit: • Training is fair use, but we provide an opt-out • "Regurgitation" is a rare bug we're driving to zero • The New York Times is not telling the full story https://t.co/S6fSaDsfKb — OpenAI (@OpenAI) January 8, 2024

OpenAI’s app store for GPTs planned to launch next week

After being delayed in December , OpenAI plans to launch its GPT Store sometime in the coming week, according to an email viewed by TechCrunch. OpenAI says developers building GPTs will have to review the company’s updated usage policies and GPT brand guidelines to ensure their GPTs are compliant before they’re eligible for listing in the GPT Store. OpenAI’s update notably didn’t include any information on the expected monetization opportunities for developers listing their apps on the storefront.

GPT Store launching next week – OpenAI pic.twitter.com/I6mkZKtgZG — Manish Singh (@refsrc) January 4, 2024

OpenAI moves to shrink regulatory risk in EU around data privacy

In an email, OpenAI detailed an incoming update to its terms, including changing the OpenAI entity providing services to EEA and Swiss residents to OpenAI Ireland Limited. The move appears to be intended to shrink its regulatory risk in the European Union, where the company has been under scrutiny over ChatGPT’s impact on people’s privacy.

What is ChatGPT? How does it work?

ChatGPT is a general-purpose chatbot that uses artificial intelligence to generate text after a user enters a prompt, developed by tech startup OpenAI . The chatbot uses GPT-4, a large language model that uses deep learning to produce human-like text.

When did ChatGPT get released?

November 30, 2022 is when ChatGPT was released for public use.

What is the latest version of ChatGPT?

Both the free version of ChatGPT and the paid ChatGPT Plus are regularly updated with new GPT models. The most recent model is GPT-4o .

Can I use ChatGPT for free?

There is a free version of ChatGPT that only requires a sign-in in addition to the paid version, ChatGPT Plus .

Who uses ChatGPT?

Anyone can use ChatGPT! More and more tech companies and search engines are utilizing the chatbot to automate text or quickly answer user questions/concerns.

What companies use ChatGPT?

Multiple enterprises utilize ChatGPT, although others may limit the use of the AI-powered tool .

Most recently, Microsoft announced at it’s 2023 Build conference that it is integrating it ChatGPT-based Bing experience into Windows 11. A Brooklyn-based 3D display startup Looking Glass utilizes ChatGPT to produce holograms you can communicate with by using ChatGPT.  And nonprofit organization Solana officially integrated the chatbot into its network with a ChatGPT plug-in geared toward end users to help onboard into the web3 space.

What does GPT mean in ChatGPT?

GPT stands for Generative Pre-Trained Transformer.

What is the difference between ChatGPT and a chatbot?

A chatbot can be any software/system that holds dialogue with you/a person but doesn’t necessarily have to be AI-powered. For example, there are chatbots that are rules-based in the sense that they’ll give canned responses to questions.

ChatGPT is AI-powered and utilizes LLM technology to generate text after a prompt.

Can ChatGPT write essays?

Can chatgpt commit libel.

Due to the nature of how these models work , they don’t know or care whether something is true, only that it looks true. That’s a problem when you’re using it to do your homework, sure, but when it accuses you of a crime you didn’t commit, that may well at this point be libel.

We will see how handling troubling statements produced by ChatGPT will play out over the next few months as tech and legal experts attempt to tackle the fastest moving target in the industry.

Does ChatGPT have an app?

Yes, there is a free ChatGPT mobile app for iOS and Android users.

What is the ChatGPT character limit?

It’s not documented anywhere that ChatGPT has a character limit. However, users have noted that there are some character limitations after around 500 words.

Does ChatGPT have an API?

Yes, it was released March 1, 2023.

What are some sample everyday uses for ChatGPT?

Everyday examples include programing, scripts, email replies, listicles, blog ideas, summarization, etc.

What are some advanced uses for ChatGPT?

Advanced use examples include debugging code, programming languages, scientific concepts, complex problem solving, etc.

How good is ChatGPT at writing code?

It depends on the nature of the program. While ChatGPT can write workable Python code, it can’t necessarily program an entire app’s worth of code. That’s because ChatGPT lacks context awareness — in other words, the generated code isn’t always appropriate for the specific context in which it’s being used.

Can you save a ChatGPT chat?

Yes. OpenAI allows users to save chats in the ChatGPT interface, stored in the sidebar of the screen. There are no built-in sharing features yet.

Are there alternatives to ChatGPT?

Yes. There are multiple AI-powered chatbot competitors such as Together , Google’s Gemini and Anthropic’s Claude , and developers are creating open source alternatives .

How does ChatGPT handle data privacy?

OpenAI has said that individuals in “certain jurisdictions” (such as the EU) can object to the processing of their personal information by its AI models by filling out  this form . This includes the ability to make requests for deletion of AI-generated references about you. Although OpenAI notes it may not grant every request since it must balance privacy requests against freedom of expression “in accordance with applicable laws”.

The web form for making a deletion of data about you request is entitled “ OpenAI Personal Data Removal Request ”.

In its privacy policy, the ChatGPT maker makes a passing acknowledgement of the objection requirements attached to relying on “legitimate interest” (LI), pointing users towards more information about requesting an opt out — when it writes: “See here  for instructions on how you can opt out of our use of your information to train our models.”

What controversies have surrounded ChatGPT?

Recently, Discord announced that it had integrated OpenAI’s technology into its bot named Clyde where two users tricked Clyde into providing them with instructions for making the illegal drug methamphetamine (meth) and the incendiary mixture napalm.

An Australian mayor has publicly announced he may sue OpenAI for defamation due to ChatGPT’s false claims that he had served time in prison for bribery. This would be the first defamation lawsuit against the text-generating service.

CNET found itself in the midst of controversy after Futurism reported the publication was publishing articles under a mysterious byline completely generated by AI. The private equity company that owns CNET, Red Ventures, was accused of using ChatGPT for SEO farming, even if the information was incorrect.

Several major school systems and colleges, including New York City Public Schools , have banned ChatGPT from their networks and devices. They claim that the AI impedes the learning process by promoting plagiarism and misinformation, a claim that not every educator agrees with .

There have also been cases of ChatGPT accusing individuals of false crimes .

Where can I find examples of ChatGPT prompts?

Several marketplaces host and provide ChatGPT prompts, either for free or for a nominal fee. One is PromptBase . Another is ChatX . More launch every day.

Can ChatGPT be detected?

Poorly. Several tools claim to detect ChatGPT-generated text, but in our tests , they’re inconsistent at best.

Are ChatGPT chats public?

No. But OpenAI recently disclosed a bug, since fixed, that exposed the titles of some users’ conversations to other people on the service.

What lawsuits are there surrounding ChatGPT?

None specifically targeting ChatGPT. But OpenAI is involved in at least one lawsuit that has implications for AI systems trained on publicly available data, which would touch on ChatGPT.

Are there issues regarding plagiarism with ChatGPT?

Yes. Text-generating AI models like ChatGPT have a tendency to regurgitate content from their training data.

More TechCrunch

Get the industry’s biggest tech news, techcrunch daily news.

Every weekday and Sunday, you can get the best of TechCrunch’s coverage.

Startups Weekly

Startups are the core of TechCrunch, so get our best coverage delivered weekly.

TechCrunch Fintech

The latest Fintech news and analysis, delivered every Tuesday.

TechCrunch Mobility

TechCrunch Mobility is your destination for transportation news and insight.

AI notetaker Fathom raises $17M

In many meetings today, it sometimes feels like there are more AI notetaking and transcription bots than people. There are seemingly dozens of options to choose from these days, but…

Harbor is building a better baby monitor and an army of night nannies

Like most good startup stories, Harbor began life as a product of disappointment. Kevin Lavelle, the co-founder and CEO of innovative clothing company Mizzen and Main, couldn’t find a baby…

Harbor is building a better baby monitor and an army of night nannies

Kiwibot acquires an ad startup to turn its delivery robots into mobile billboards

There’s ample opportunity in the form of a mobile billboard. Kiwibot is so convinced of this fact that it plunked down $25 million to purchase Nickelytics.

Kiwibot acquires an ad startup to turn its delivery robots into mobile billboards

Karman Industries hopes its SpaceX-inspired heat pumps will replace industrial boilers

The startup will use turbomachinery to develop a heat pump capable of generating industrial-grade heat.

Karman Industries hopes its SpaceX-inspired heat pumps will replace industrial boilers

EU to tell Apple how to do interoperability, DMA style

The European Union has opened two “specification proceedings” on Apple under the bloc’s Digital Markets Act (DMA) that will see it instructing the iPhone maker on how to comply with…

EU to tell Apple how to do interoperability, DMA style

This founder grew up in wine country — now he’s built a platform for wine makers

Growing up in the Hunter Valley, a region of Australia renowned for its fine wine production, Mitchel Fowler never realized he might one day think of an idea that could…

This founder grew up in wine country — now he’s built a platform for wine makers

Picus Security, founded by 3 Turkish mathematicians, raises $45M after simulating 1B cyber attacks

Picus Security, which runs continuous validation processes to root out and fix inconsistencies in code, has raised $45 million in a Series C round.

Picus Security, founded by 3 Turkish mathematicians, raises $45M after simulating 1B cyber attacks

Brightband sees a bright (and open-source) future for AI-powered weather forecasting

With an explosion of weather and climate data that the last generation of tools can’t handle, is AI the future of forecasting? Research certainly suggests so, and a newly funded…

Brightband sees a bright (and open-source) future for AI-powered weather forecasting

AI governance can’t be left to the vested interests

A final report by the UN’s high level advisory body on artificial intelligence makes for, at times, a surreal read. Named ‘Governing AI for Humanity’, the document underlines the contradictory…

AI governance can’t be left to the vested interests

Phlair’s carbon sucking technology could lower direct air capture’s costs

The startup is taking a different approach to removing carbon dioxide from the air than most of its competitors.

Phlair’s carbon sucking technology could lower direct air capture’s costs

India weighs easing market share limits for UPI payment operators

The governing body overseeing India’s popular UPI payments rail is considering increasing its proposed market share cap for operators like Google Pay, PhonePe and Paytm.

India weighs easing market share limits for UPI payment operators

Palmer Luckey returns to headsets as Anduril partners with Microsoft on U.S. military tech

Palmer Luckey, the Hawaiian-shirt wearing founder who sold Oculus VR for $2 billion before co-founding the military tech company Anduril, is back in the headset business — in a sense.…

Palmer Luckey returns to headsets as Anduril partners with Microsoft on U.S. military tech

CEO of self-driving startup Motional is stepping down

Motional, the autonomous vehicle startup backed by Hyundai, is shaking up its leadership ranks. Karl Iagnemma, an early pioneer in the autonomous vehicle industry whose startup Nutonomy lies at the…

CEO of self-driving startup Motional is stepping down

Craig Newmark pledges $100M to fight hacking by foreign governments

Craig Newmark plans to donate $100 million to further strengthen U.S. cybersecurity. The Craigslist founder tells The Wall Street Journal he is addressing what he sees as a growing threat…

Craig Newmark pledges $100M to fight hacking by foreign governments

Bluesky addresses trust and safety concerns around abuse, spam, and more

The company is in various stages of developing and piloting a range of initiatives focused on dealing with bad actors, harassment, spam, fake accounts, video safety, and more.

Bluesky addresses trust and safety concerns around abuse, spam, and more

Fal.ai, which hosts media-generating AI models, raises $23M from a16z and others

Fal.ai, a dev-focused platform for AI-generated audio, video, and images, today revealed that it’s raised $23 million in funding from investors including Andreessen Horowitz (a16z), Black Forest Labs co-founder Robin…

Fal.ai, which hosts media-generating AI models, raises $23M from a16z and others

Bill requiring AM radio in new cars gets closer to law

A House committee overwhelmingly voted to approve a bill that would require new cars to be built with AM radio at no additional cost to the owner. The AM for…

Bill requiring AM radio in new cars gets closer to law

HTC takes on Apple’s Vision Pro and PC Gaming with $1,000 Vive Focus Vision

The Vive Focus Vision has enough firepower under the hood to appeal to PC gamers tethered via the DisplayPort.

HTC takes on Apple’s Vision Pro and PC Gaming with $1,000 Vive Focus Vision

Fisker reverses course on making Ocean owners pay for recall repairs

The reversal comes as EV startup Fisker prepares to enter the fourth month of its Chapter 11 bankruptcy process.

Fisker reverses course on making Ocean owners pay for recall repairs

Three new ways to personalize your iPhone’s Home Screen in iOS 18

iOS 18 offers the most control over the look and feel of your iPhone’s user interface than any other version of Apple’s mobile operating system to date.

Three new ways to personalize your iPhone’s Home Screen in iOS 18

LinkedIn scraped user data for training before updating its terms of service

LinkedIn may have trained AI models on user data without updating its terms. LinkedIn users in the U.S. — but not the EU, EEA, or Switzerland, likely due to those…

LinkedIn scraped user data for training before updating its terms of service

This Week in AI: Why OpenAI’s o1 changes the AI regulation game

Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here. It’s been just a few days since OpenAI revealed its latest…

This Week in AI: Why OpenAI’s o1 changes the AI regulation game

US government ‘took control’ of a botnet run by Chinese government hackers, says FBI director

The FBI, NSA and other U.S. government agencies detailed a Chinese-government operation that used 260,000 of internet-connected devices to launch cyberattacks.

US government ‘took control’ of a botnet run by Chinese government hackers, says FBI director

Luminate’s hair-saving chemo helmet nears release, as new funding goes toward home cancer care

The pitch sounds a bit sci-fi: a helmet called Lily that people undergoing chemotherapy wear to prevent hair loss, which is a common side effect of the treatment.

Luminate’s hair-saving chemo helmet nears release, as new funding goes toward home cancer care

YouTube launches Communities, a Discord-like space for creators and fans to interact with each other

At its Made On YouTube event on Wednesday, the company announced a new dedicated space for creators to interact with their fans and viewers. The space, called “Communities,” is kind…

YouTube launches Communities, a Discord-like space for creators and fans to interact with each other

Amazon adds PayPal as a payment option to Buy with Prime

Amazon’s Buy with Prime program, which lets shoppers with a Prime membership purchase items from third-party stores and check out using their Amazon account, is getting a new payment option:…

Amazon adds PayPal as a payment option to Buy with Prime

Edera is building a better Kubernetes and AI security solution from the ground up

Edera, a startup looking to simplify and improve how Kubernetes containers and AI workloads are secured by offering a new hypervisor, today announced that it has raised a $5 million…

Edera is building a better Kubernetes and AI security solution from the ground up

YouTube unveils ‘Hype,’ a new way for fans to help smaller creators grow their reach

YouTube creators no longer have to rely solely on the recommendation algorithm, search results, or collabs to help them grow their audience. At the company’s Made On YouTube event on…

YouTube unveils ‘Hype,’ a new way for fans to help smaller creators grow their reach

Last Week: Amplify your brand by hosting a Side Event at TechCrunch Disrupt 2024

Extend the buzz of TechCrunch Disrupt 2024 beyond the main event by hosting an exclusive Side Event. Expose your brand to 10,000 Disrupt attendees and the surrounding Bay Area tech…

Last Week: Amplify your brand by hosting a Side Event at TechCrunch Disrupt 2024

YouTube Shorts to integrate Veo, Google’s AI video model 

The main attraction of YouTube’s Made On YouTube event on Wednesday morning was, you guessed it, artificial intelligence. The company announced that it is integrating Google DeepMind’s AI video generation…

IMAGES

  1. Build A Chatbot Using Python, Tkinter, NLTK & Text-To-Speech

    how to make chatbot text to speech

  2. How to build a text and voice-powered ChatGPT bot with text-to-speech

    how to make chatbot text to speech

  3. Build an Automated Chatbot with Programmable Voice Using Speech-to-Text

    how to make chatbot text to speech

  4. Master Text-to-Speech for Chatbots: Ultimate Guide to TTS Setup on Chatbot Builder

    how to make chatbot text to speech

  5. How to make Text to speech bot using Python

    how to make chatbot text to speech

  6. Build a Text-to-Speech Chatbot with Gemini AI and React Native🚀

    how to make chatbot text to speech

VIDEO

  1. Master VoiceFlow: Create Powerful AI Chatbots in Minutes!

  2. Adding voice recognition to a chatbot

  3. Chatbots for Lawyers 2024

  4. How AI Can Play a Role in Environmental Monitoring

  5. How to Build Chatbots

  6. Build your own talking chatbot Alexa : Step by step guide

COMMENTS

  1. Building a Conversational Voice Chatbot: Integrating OpenAI's Speech-to

    with st.chat_message("user"): st.write(transcript) os.remove(webm_file_path) Here, we write the recorded audio to a file and then use the speech_to_text function from utils.py to convert it into text. The transcribed text is then added to the session state for the chatbot to process.

  2. Building a Conversational Voice Chatbot: OpenAI Speech-to-Text & Text

    Explore the cutting-edge world of AI chatbots in this detailed tutorial, where we delve into creating a voice-responsive chatbot utilizing OpenAI's speech-to...

  3. How to build a text and voice-powered ChatGPT bot with text-to-speech

    1. Text-based interaction Users can type their questions. 2. Voice input and output Users can send voice messages and the chatbot can transcribe them and reply with both text and audio responses. 3. Context-aware conversations The chatbot leverages the OpenAI ChatGPT API to maintain context during the conversation. Which enables coherent ...

  4. ChatGPT can now see, hear, and speak

    The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech. We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.

  5. Build Talking AI ChatBot with Text-to-Speech using Python!

    Build an AI chatbot with voice response in just 5 minutes! This step-by-step tutorial guides you through creating an AI using AssemblyAI for real-time speech...

  6. Make your voice chatbots more engaging with new text to speech features

    Transforming a chatbot from an impersonal, automated responder into a relatable and personable assistant significantly enhances user engagement. Today we're thrilled to announce Azure AI Speech's latest updates, enhancing text to speech capabilities for a more engaging and lifelike chatbot experience. These updates include: A wider range of ...

  7. Building Your Own Conversational Voice AI With Dialogflow & Speech to

    With Text to Speech (TTS), you can send text or SSML (text with voice markup) input and it will return audio bytes, which you can use to create an mp3 file or directly stream to an audio player (in your browser). Compared to the Google Assistant, by extending your apps with a conversational AI manually with the above tools, you no longer are ...

  8. Create a voice chatbot in python using NLTK, Speech ...

    We will be installing python libraries nltk, NumPy, gTTs (google text-to-speech), scikit-learn and, SpeechRecognition using pip. Rest, we will be installing mpg123, portaudio, for accessing the microphone from the system. ... To make our chatbot more engaging and interactive, we will create a greeting function. # Keyword Matching GREETING ...

  9. Build Your AI Chatbot with NLP in Python

    For this, the chatbot requires a text-to-speech module as well. Here, we will be using GTTS or Google Text to Speech library to save mp3 files on the file system which can be easily played back. ... How do you make a chat bot on NLP? A. To create an NLP chatbot, define its scope and capabilities, collect and preprocess a dataset, train an NLP ...

  10. Integrate OpenAI's ChatGPT with Twilio Programmable Voice and Functions

    Click Create Service. Give your Service a name, I'll call mine voice-chatgpt-demo and click Next. You'll now see the Console view for your Service, with Functions, Assets, Environment Variables, and Dependencies in the left hand nav, and a text editor and console for editing your code and monitoring the logs.

  11. Speaker.bot

    Speaker.bot . Supercharged Text to Speech (TTS) for your live stream! Get Started. ... Google Cloud. Convert text into natural-sounding speech using an API powered by the best of Google's AI technologies. Azure. Engage global audiences by using 400 neural voices across 140 languages and variants available with Azure TTS.

  12. AI Chatbot with NLP: Speech Recognition + Transformers

    I used Speech Recognition tools to perform speech-to-text and text-to-speech tasks, and I leveraged pre-trained Transformers language models to give the bot some Artificial Intelligence. Now you can build your own Chatbot, maybe including more virtual assistant tasks like searching things on Wikipedia or playing videos on Youtube.

  13. ElevenLabs: Free Text to Speech & AI Voice Generator

    Voices fit for all of your ideas. Generate high quality speech in any voice, style, and language. Our AI voice generator renders human intonation and inflections with exceptional fidelity, adjusting the delivery based on context. Create a voice clone.

  14. Chatbot text to speech software:an introduction

    A chatbot text to speech tool (the expression: "text to speech" is often abbreviated into the acronym TTS) is anything else than software that converts a textual chat into audio with the use of technologies like optical character recognition, natural language processing, and artificial intelligence. The difference with voice bots is that ...

  15. Tutorial: Voice-enable your bot

    Select the microphone button, and say a phrase in German. The recognized text appears, and the echo bot replies with the default German voice. Change the default bot voice. You can select the text to speech voice and control pronunciation if the bot specifies the reply in the form of a Speech Synthesis Markup Language (SSML) instead of simple ...

  16. Text-to-Speech

    Text-to-Speech Give your users the option of listening to the chatbot, rather than reading. Our text-to-speech is available in over sixty languages. Speech-to-Text Build natural and rich conversational experiences by giving users new ways to interact with your product with hands-free communication.; WhatsApp Let your customers contact your business over WhatsApp.

  17. Text-to-Speech

    The chat widget supports Text-to-Speech (TTS) feature that enables chatbots to convert text into audio. This helps to make the conversation between a user and the bot more interactive. TTS allows the chatbot to respond to users in a natural way, using a synthesized voice to provide a human-like conversation experience. The bot responds to the user's queries with a human voice and provides ...

  18. Text to Speech

    Choose a voice to read your text aloud. You can use it to narrate your videos, create voice-overs, convert your documents into audio, and more. Convert text to speech with DeepAI's free AI voice generator. Use your microphone and convert your voice, or generate speech from text. Realistic text to speech that sounds like a human voice.

  19. Voice Generator (Online & Free) ️

    Note: If the list of available text-to-speech voices is small, or all the voices sound the same, then you may need to install text-to-speech voices on your device. Many operating systems (including some versions of Android, for example) only come with one voice by default, and the others need to be downloaded in your device's settings.

  20. Now you can chat with ChatGPT using your voice

    First, ChatGPT now has a voice. Choose from one of five lifelike synthetic voices and you can have a conversation with the chatbot as if you were making a call, getting responses to your spoken ...

  21. ChatGPT: Everything you need to know about the AI chatbot

    ChatGPT is a general-purpose chatbot that uses artificial intelligence to generate text after a user enters a prompt, developed by tech startup OpenAI. The chatbot uses GPT-4, a large language ...