CodeFatherTech

Learn to Code. Shape Your Future

Text to Speech in Python [With Code Examples]

In this article, you will learn how to create text-to-speech programs in Python. You will create a Python program that converts any text you provide into speech.

This is an interesting experiment to discover what can be created with Python and to show you the power of Python and its modules.

How can you make Python speak?

Python provides hundreds of thousands of packages that allow developers to write pretty much any type of program. Two cross-platform packages you can use to convert text into speech using Python are PyTTSx3 and gTTS.

Together we will create a simple program to convert text into speech. This program will show you how powerful Python is as a language. It allows us to do even complex things with very few lines of code.

The Libraries to Make Python Speak

In this guide, we will try two different text-to-speech libraries:

  • gTTS (Google text to Speech API)

They are both available on the Python Package Index (PyPI), the official repository for Python third-party software. Below you can see the page on PyPI for the two libraries:

  • PyTTSx3: https://pypi.org/project/pyttsx3/
  • gTTS: https://pypi.org/project/gTTS/

There are different ways to create a program in Python that converts text to speech and some of them are specific to the operating system.

The reason why we will be using PyTTSx3 and gTTS is to create a program that can run in the same way on Windows, Mac, and Linux (cross-platform).

Letā€™s see how PyTTSx3 works firstā€¦

Text-To-Speech With the PyTTSx3 Module

Before using this module remember to install it using pip:

If you are using Windows and you see one of the following error messages, you will also have to install the module pypiwin32 :

You can use pip for that module too:

If the pyttsx3 module is not installed you will see the following error when executing your Python program:

Thereā€™s also a module called PyTTSx (without the 3 at the end), but itā€™s not compatible with both Python 2 and Python 3.

We are using PyTTSx3 because is compatible with both Python versions.

Itā€™s great to see that to make your computer speak using Python you just need a few lines of code:

Run your program and you will hear the message coming from your computer.

With just four lines of code! (excluding comments)

Also, notice the difference that commas make in your phrase. Try to remove the comma before ā€œand you?ā€ and run the program again.

Can you see (hear) the difference?

Also, you can use multiple calls to the say() function , so:

could be written also as:

All the messages passed to the say() function are not said unless the Python interpreter sees a call to runAndWait() . You can confirm that by commenting the last line of the program.

Change Voice with PyTTSx3

What else can we do with PyTTSx?

Letā€™s see if we can change the voice starting from the previous program.

First of all, letā€™s look at the voices available. To do that we can use the following program:

You will see an output similar to the one below:

The voices available depend on your system and they might be different from the ones present on a different computer.

Considering that our message is in English we want to find all the voices that support English as a language. To do that we can add an if statement inside the previous for loop.

Also to make the output shorter we just print the id field for each Voice object in the voices list (you will understand why shortly):

Here are the voice IDs printed by the program:

Letā€™s choose a female voice, to do that we use the following:

I select the id com.apple.speech.synthesis.voice.samantha , so our program becomes:

How does it sound? šŸ™‚

You can also modify the standard rate (speed) and volume of the voice setting the value of the following properties for the engine before the calls to the say() function.

Below you can see some examples on how to do it:

Play with voice id, rate, and volume to find the settings you like the most!

Text to Speech with gTTS

Now, letā€™s create a program using the gTTS module instead.

Iā€™m curious to see which one is simpler to use and if there are benefits in gTTS over PyTTSx or vice versa.

As usual, we install gTTS using pip:

One difference between gTTS and PyTTSx is that gTTS also provides a CLI tool, gtts-cli .

Letā€™s get familiar with gtts-cli first, before writing a Python program.

To see all the language available you can use:

Thatā€™s an impressive list!

The first thing you can do with the CLI is to convert text into an mp3 file that you can then play using any suitable applications on your system.

We will convert the same message used in the previous section: ā€œI love Python for text to speech, and you?ā€

Iā€™m on a Mac and I will use afplay to play the MP3 file.

The thing I see immediately is that the comma and the question mark donā€™t make much difference. One point for PyTTSx that does a better job with this.

I can use the ā€“lang flag to specify a different language, you can see an example in Italianā€¦

ā€¦the message says: ā€œI like programming in Python, and you?ā€

Now we will write a Python program to do the same thing.

If you run the program you will hear the message.

Remember that Iā€™m using afplay because Iā€™m on a Mac. You can just replace it with any utilities that can play sounds on your system.

Looking at the gTTS documentation, I can also read the text more slowly passing the slow parameter to the gTTS() function.

Give it a try!

Change Voice with gTTS

How easy is it to change the voice with gTTS?

Is it even possible to customize the voice?

It wasnā€™t easy to find an answer to this, I have been playing a bit with the parameters passed to the gTTS() function and I noticed that the English voice changes if the value of the lang parameter is ā€˜en-USā€™ instead of ā€˜enā€™ .

The language parameter uses IETF language tags.

The voice seems to take into account the comma and the question mark better than before.

Also from another test it looks like ā€˜enā€™ (the default language) is the same as ā€˜en-GBā€™.

It looks to me like thereā€™s more variety in the voices available with PyTTSx3 compared to gTTS.

Before finishing this section I also want to show you a way to create a single MP3 file that contains multiple messages, in this case in different languages:

The write_to_fp () function writes bytes to a file-like object that we save as hello_ciao.mp3.

Makes sense?

Work With Text to Speech Offline

One last question about text-to-speech in Python.

Can you do it offline or do you need an Internet connection?

Letā€™s run the first one of the programs we created using PyTTSx3.

From my tests, everything works well, so I can convert text into audio even if Iā€™m offline.

This can be very handy for the creation of any voice-based software.

Letā€™s try gTTS nowā€¦

If I run the program using gTTS after disabling my connection, I see the following error:

So, gTTS doesnā€™t work without a connection because it requires access to translate.google.com.

If you want to make Python speak offline use PyTTSx3.

We have covered a lot!

You have seen how to use two cross-platform Python modules, PyTTSx3 and gTTS, to convert text into speech and to make your computer talk!

We also went through the customization of voice, rate, volume, and language that from what I can see with the programs we created here are more flexible with the PyTTSx3 module.

Are you planning to use this for a specific project?

Let me know in the comments below šŸ™‚

Claudio Sabato is an IT expert with over 15 years of professional experience in Python programming, Linux Systems Administration, Bash programming, and IT Systems Design. He isĀ a professional certified by the Linux Professional Institute .

With a Masterā€™s degree in Computer Science, he has a strong foundation in Software Engineering and a passion for robotics with Raspberry Pi.

Related posts:

  • Search for YouTube Videos Using Python [6 Lines of Code]
  • How to Draw with Python Turtle: Express Your Creativity
  • Create a Random Password Generator in Python
  • Image Edge Detection in Python using OpenCV

1 thought on ā€œText to Speech in Python [With Code Examples]ā€

Hi, Yes I was planning to develop a program which would read text in multiple voices. Iā€™m not a programmer and was looking to find the simplest way to achieve this. There are so many programming languages out there, would you say Python would be the best to for this purpose? kind regards Delton

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

  • Privacy Overview
  • Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

How to Convert Text to Speech in Python

Welcome! Meet our Python Code Assistant , your new coding buddy. Why wait? Start exploring now!

Speech synthesis (or Text to Speech) is the computer-generated simulation of human speech. It converts human language text into human-like speech audio. In this tutorial, you will learn how to convert text to speech in Python.

Please note that I will use text-to-speech or speech synthesis interchangeably in this tutorial, as they're essentially the same thing.

In this tutorial, we won't be building neural networks and training the model from scratch to achieve results, as it is pretty complex and hard to do for regular developers. Instead, we will use some APIs, engines, and pre-trained models that offer it.

More specifically, we will use four different techniques to do text-to-speech:

  • gTTS : There are a lot of APIs out there that offer speech synthesis; one of the commonly used services is Google Text to Speech; we will play around with the gTTS library.
  • pyttsx3 : A library that looks for pre-installed speech synthesis engines on your operating system and, therefore, performs text-to-speech without needing an Internet connection.
  • openai : We'll be using the OpenAI Text to Speech API .
  • Huggingface Transformers : The famous transformer library that offers a wide range of pre-trained deep learning (transformer) models that are ready to use. We'll be using a model called SpeechT5 that does this.

To clarify, this tutorial is about converting text to speech and not vice versa. If you want to convert speech to text instead, check this tutorial .

Table of contents:

Online Text to Speech

Offline text to speech, speech synthesis using openai api, speech synthesis using šŸ¤— transformers.

To get started, let's install the required modules:

As you may guess, gTTS stands for Google Text To Speech; it is a Python library that interfaces with Google Translate's text-to-speech API. It requires an Internet connection, and it's pretty easy to use.

Open up a new Python file and import:

It's pretty straightforward to use this library; you just need to pass text to the gTTS object, which is an interface to Google Translate 's Text to Speech API:

Up to this point, we have sent the text and retrieved the actual audio speech from the API. Let's save this audio to a file:

Awesome, you'll see a new file appear in the current directory; let's play it using playsound module installed previously:

And that's it! You'll hear a robot talking about what you just told him to say!

It isn't available only in English; you can use other languages as well by passing the lang parameter:

If you don't want to save it to a file and just play it directly, then you should use tts.write_to_fp() which accepts io.BytesIO() object to write into; check this link for more information.

To get the list of available languages, use this:

Here are the supported languages:

Now you know how to use Google's API, but what if you want to use text-to-speech technologies offline?

Well, pyttsx3 library comes to the rescue. It is a text-to-speech conversion library in Python, and it looks for TTS engines pre-installed in your platform and uses them, here are the text-to-speech synthesizers that this library uses:

  • SAPI5 on Windows XP, Windows Vista, 8, 8.1, 10 and 11.
  • NSSpeechSynthesizer on Mac OS X.
  • espeak on Ubuntu Desktop Edition.

Here are the main features of the pyttsx3 library:

  • It works fully offline
  • You can choose among different voices that are installed on your system
  • Controlling the speed of speech
  • Tweaking volume
  • Saving the speech audio into a file

Note : If you're on a Linux system and the voice output is not working with this library, then you should install espeak, FFmpeg, and libespeak1:

To get started with this library, open up a new Python file and import it:

Now, we need to initialize the TTS engine:

To convert some text, we need to use say() and runAndWait() methods:

say() method adds an utterance to speak to the event queue, while the runAndWait() method runs the actual event loop until all commands are queued up. So you can call say() multiple times and run a single runAndWait() method in the end to hear the synthesis, try it out!

This library provides us with some properties we can tweak based on our needs. For instance, let's get the details of the speaking rate:

Alright, let's change this to 300 (make the speaking rate much faster):

Another useful property is voices, which allow us to get details of all voices available on your machine:

Here is the output in my case:

As you can see, my machine has three voice speakers. Let's use the second, for example:

You can also save the audio as a file using the save_to_file() method, instead of playing the sound using say() method:

A new MP3 file will appear in the current directory; check it out!

In this section, we'll be using the newly released OpenAI audio models. Before we get started, make sure to update openai library to the latest version:

Next, you must create an OpenAI account and navigate to the API key page to Create a new secret key . Make sure to save this somewhere safe and do not share it with anyone.

Next, let's open up a new Python file and initialize our OpenAI API client:

After that, we can simply use client.audio.speech.create() to perform text to speech:

This is a paid API, and at the time of writing this, there are two models: tts-1 for 0.015$ per 1,000 characters and tts-1-hd for 0.03$ per 1,000 characters. tts-1 is cheaper and faster, whereas tts-1-hd provides higher-quality audio.

There are currently 6 voices you can choose from. I've chosen nova , but you can use alloy , echo , fable , onyx , and shimmer .

You can also experiment with the speed parameter; the default is 1.0 , but if you set it lower than that, it'll generate a slow speech and a faster speech when above 1.0 .

There is another parameter that is response_format . The default is mp3 , but you can set it to opus , aac , and flac .

In this section, we will use the šŸ¤— Transformers library to load a pre-trained text-to-speech transformer model. More specifically, we will use the SpeechT5 model that is fine-tuned for speech synthesis on LibriTTS . You can learn more about the model in this paper .

To get started, let's install the required libraries (if you haven't already):

Open up a new Python file named tts_transformers.py and import the following:

Let's load everything:

The processor is the tokenizer of the input text, whereas the model is the actual model that converts text to speech.

The vocoder is the voice encoder that is used to convert human speech into electronic sounds or digital signals. It is responsible for the final production of the audio file.

In our case, the SpeechT5 model transforms the input text we provide into a sequence of mel-filterbank features (a type of representation of the sound). These features are acoustic features often used in speech and audio processing, derived from a Fourier transform of the signal.

The HiFi-GAN vocoder we're using takes these representations and synthesizes them into actual audible speech.

Finally, we load a dataset that will help us get the speaker's voice vectors to synthesize speech with various speakers. Here are the speakers:

Next, let's make our function that does all the speech synthesis for us:

The function takes the text , and the speaker (optional) as arguments and does the following:

  • It tokenizes the input text into a sequence of token IDs.
  • If the speaker is passed, then we use the speaker vector to mimic the sound of the passed speaker during synthesis.
  • If it's not passed, we simply make a random vector using torch.randn() . Although I do not think it's a reliable way of making a random voice.
  • Next, we use our model.generate_speech() method to generate the speech tensor, it takes the input IDs, speaker embeddings, and the vocoder.
  • Finally, we make our output filename and save it with a 16Khz sampling rate. (A funny thing you can do is when you reduce the sampling rate to 12Khz or 8Khz, you'll get a deeper and slower voice, and vice-versa: a higher-pitched and faster voice when you increase it to values like 22050 or 24000)

Let's use the function now:

This will generate a speech of the US female (as it's my favorite among all the speakers). This will generate a speech with a random voice:

Let's now call the function with all the speakers so you can compare speakers:

Listen to 6799-In-his-miracle-year,-he-published.mp3 :

Great, that's it for this tutorial; I hope that will help you build your application or maybe your own virtual assistant in Python!

To conclude, we have used four different methods for text-to-speech:

  • Online Text to speech using the gTTS library
  • Offline Text to speech using pyttsx3 library that uses an existing engine on your OS.
  • The convenient Audio OpenAI API.
  • Finally, we used šŸ¤— Transformers to perform text-to-speech (offline) using our computing resources.

So, to wrap it up, If you want to use a reliable synthesis, you can go for Audio OpenAI API, Google TTS API, or any other reliable API you choose. If you want a reliable but offline method, you can also use the SpeechT5 transformer. And if you just want to make it work quickly and without an Internet connection, you can use the pyttsx3 library.

You can get the complete code for all the methods used in the tutorial here .

Here is the documentation for used libraries:

  • gTTS (Google Text-to-Speech)
  • pyttsx3 - Text-to-speech x-platform
  • OpenAI Text to Speech
  • SpeechT5 (TTS task)

Related:Ā  How to Play and Record Audio in Python .

Happy Coding ā™„

Just finished the article? Now, boost your next project with our Python Code Generator . Discover a faster, smarter way to code.

How to Convert Speech to Text in Python

  • How to Convert Speech to Text in Python

Learning how to use Speech Recognition Python library for performing speech recognition to convert audio speech to text in Python.

How to Play and Record Audio in Python

How to Play and Record Audio in Python

Learn how to play and record sound files using different libraries such as playsound, Pydub and PyAudio in Python.

How to Translate Languages in Python

How to Translate Languages in Python

Learn how to make a language translator and detector using Googletrans library (Google Translation API) for translating more than 100 languages with Python.

Comment panel

Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!

Mastering YOLO - Topic - Top

Join 50,000+ Python Programmers & Enthusiasts like you!

  • Ethical Hacking
  • Machine Learning
  • General Python Tutorials
  • Web Scraping
  • Computer Vision
  • Python Standard Library
  • Application Programming Interfaces
  • Game Development
  • Web Programming
  • Digital Forensics
  • Natural Language Processing
  • PDF File Handling
  • Python for Multimedia
  • GUI Programming
  • Cryptography
  • Packet Manipulation Using Scapy

New Tutorials

  • 3 Best Online AI Code Generators
  • How to Validate Credit Card Numbers in Python
  • How to Make a Clickjacking Vulnerability Scanner with Python
  • How to Perform Reverse DNS Lookups Using Python
  • How to Check Password Strength with Python

Popular Tutorials

  • How to Read Emails in Python
  • How to Extract Tables from PDF in Python
  • How to Make a Keylogger in Python
  • How to Encrypt and Decrypt Files in Python

Ethical Hacking with Python EBook - Topic - Bottom

Claim your Free Chapter!

text to speech python

  • EspaƱol ā€“ AmĆ©rica Latina
  • PortuguĆŖs ā€“ Brasil
  • TiĆŖĢng ViĆŖĢ£t

Using the Text-to-Speech API with Python

1. overview.

1215f38908082356.png

The Text-to-Speech API enables developers to generate human-like speech. The API converts text into audio formats such as WAV, MP3, or Ogg Opus. It also supports Speech Synthesis Markup Language (SSML) inputs to specify pauses, numbers, date and time formatting, and other pronunciation instructions.

In this tutorial, you will focus on using the Text-to-Speech API with Python.

What you'll learn

  • How to set up your environment
  • How to list supported languages
  • How to list available voices
  • How to synthesize audio from text

What you'll need

  • A Google Cloud project
  • A browser, such as Chrome or Firefox
  • Familiarity using Python

How will you use this tutorial?

How would you rate your experience with python, how would you rate your experience with google cloud services, 2. setup and requirements, self-paced environment setup.

  • Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one .

fbef9caa1602edd0.png

  • The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
  • The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID ). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
  • For your information, there is a third value, a Project Number , which some APIs use. Learn more about all three of these values in the documentation .
  • Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Cloud Shell , a command line environment running in the Cloud.

Activate Cloud Shell

853e55310c205094.png

If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue .

9c92662c6a846a5c.png

It should only take a few moments to provision and connect to Cloud Shell.

9f0e51b578fecce5.png

This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.

Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.

  • Run the following command in Cloud Shell to confirm that you are authenticated:

Command output

  • Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:

If it is not, you can set it with this command:

3. Environment setup

Before you can begin using the Text-to-Speech API, run the following command in Cloud Shell to enable the API:

You should see something like this:

Now, you can use the Text-to-Speech API!

Navigate to your home directory:

Create a Python virtual environment to isolate the dependencies:

Activate the virtual environment:

Install IPython and the Text-to-Speech API client library:

Now, you're ready to use the Text-to-Speech API client library!

In the next steps, you'll use an interactive Python interpreter called IPython , which you installed in the previous step. Start a session by running ipython in Cloud Shell:

You're ready to make your first request and list the supported languages...

4. List supported languages

In this section, you will get the list of all supported languages.

Copy the following code into your IPython session:

Take a moment to study the code and see how it uses the list_voices client library method to build the list of supported languages.

Call the function:

You should get the following (or a larger) list:

The list shows 58 languages and variants such as:

  • Chinese and Taiwanese Mandarin,
  • Australian, British, Indian, and American English,
  • French from Canada and France,
  • Portuguese from Brazil and Portugal.

This list is not fixed and grows as new voices are available.

This step allowed you to list the supported languages.

5. List available voices

In this section, you will get the list of voices available in different languages.

Take a moment to study the code and see how it uses the client library method list_voices(language_code) to list voices available for a given language.

Now, get the list of available German voices:

Multiple female and male voices are available, as well as standard, WaveNet, Neural2, and Studio voices:

  • Standard voices are generated by signal processing algorithms.
  • WaveNet, Neural2, and Studio voices are higher quality voices synthesized by machine learning models and sounding more natural.

Now, get the list of available English voices:

You should get something like this:

In addition to a selection of multiple voices in different genders and qualities, multiple accents are available: Australian, British, Indian, and American English.

Take a moment to list the voices available for your preferred languages and variants (or even all of them):

This step allowed you to list the available voices. You can read more about the supported voices and languages .

6. Synthesize audio from text

You can use the Text-to-Speech API to convert a string into audio data. You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volume, speaking rate, and sample rate .

Take a moment to study the code and see how it uses the synthesize_speech client library method to generate the audio data and save it as a wav file.

Now, generate sentences in a few different accents:

To download all generated files at once, you can use this Cloud Shell command from your Python environment:

Validate and your browser will download the files:

44382e3b7a3314b0.png

Open each file and hear the result.

In this step, you were able to use Text-to-Speech API to convert sentences into audio wav files. Read more about creating voice audio files .

7. Congratulations!

You learned how to use the Text-to-Speech API using Python to generate human-like speech!

To clean up your development environment, from Cloud Shell:

  • If you're still in your IPython session, go back to the shell: exit
  • Stop using the Python virtual environment: deactivate
  • Delete your virtual environment folder: cd ~ ; rm -rf ./venv-texttospeech

To delete your Google Cloud project, from Cloud Shell:

  • Retrieve your current project ID: PROJECT_ID=$(gcloud config get-value core/project)
  • Make sure this is the project you want to delete: echo $PROJECT_ID
  • Delete the project: gcloud projects delete $PROJECT_ID
  • Test the demo in your browser: https://cloud.google.com/text-to-speech
  • Text-to-Speech documentation: https://cloud.google.com/text-to-speech/docs
  • Python on Google Cloud: https://cloud.google.com/python
  • Cloud Client Libraries for Python: https://github.com/googleapis/google-cloud-python

This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Home

Text-to-speech in Python with pyttsx3

Advertisement

Introduction

Install the package, convert text to speech.

  • Change voice and Language

Reference links

This tutorials demonstrates how to use Python for text-to-speech using a cross-platform library, pyttsx3 . This lets you synthesize text in to audio you can hear. This package works in Windows, Mac, and Linux. It uses native speech drivers when available and works completely offline.

There are some other cool features that are not covered here, like the event system. You can hook in to the engine on certain events. You can use this to count how many words are said and cut it off if it has received input that is too long. You can inspect each word and cut it off if there are inappropriate words. The event hooks are not covered here but are worth a mention. Check the official examples to see how this is done.

Always refer to the official documentation for the most accurate, complete, and up-to-date information. This is only meant to serve as a primer.

The pyttsx3 module supports native Windows and Mac speech APIs but also supports espeak, making it the best available text-to-speech package in my opinion. If you are interested specifically and only in speak, you might be interested in my Python text-to-speech with espeak tutorial .

Use pip to install the package. If you are in Windows, you will need an additional package, pypiwin32 which it will need to access the native Windows speech API.

Change voice and language

The voices available will depend on what your system has installed. You can get a list of available voices on your machine by pulling the voices property from the engine. Note that the voices you have available on your computer might be different from someone else's machine. There is a default voice set so you are not required to pick a voice. This is only if you want to change it from the default.

In Windows, you can learn more about installing other languages with this Microsoft support article, How to download Text-to-Speech languages for Windows 10 . It also covers how to install espeak open source languages.

You can get a list of available voices like this:

Example output from my Windows 10 machine with three voices available.

Set the voice you want to use with the setProperty() method on the engine. For example, using voice IDs found earlier, this is how you would set the voice. This example shows how to set one voice to say soemthing, and then use a different voice from a different language to say something else.

After reading this you should feel comfortable using Python for basic text-to-speech applications on all major platforms. What uses can you think of?

  • pyttsx3 module onn Pypi
  • pypiwin32 module on Pypi
  • pyttsx3 source code
  • pyttsx3 documentation
  • Official examples
  • How to download Text-to-Speech languages for Windows 10
  • Python text-to-speech with espeak tutorial

View the discussion thread.

pyttsx3 - Text-to-speech x-platform Ā¶

This documentation describes the pyttsx3 Python package v 2.6 and was rendered on Jul 14, 2021.

Table of Contents

  • Supported synthesizers
  • The Engine factory
  • The Engine interface
  • The Voice metadata
  • The Driver interface
  • The DriverProxy interface

Project Links

  • Project home page at GitHub
  • Package listing in PyPI
  • Documentation at ReadTheDocs
  • Using pyttsx3
  • Implementing drivers

Related Topics

  • Next: Supported synthesizers

Quick search

Hack The Developer Logo

Text to Speech (TTS) in Python Using Pyttsx3 Read it later

In todayā€™s world, where automation is the need of the hour, the Text-to-Speech (TTS) technology is gaining popularity at an exponential rate. Text-to-Speech (TTS) allows users to convert written text into spoken words, which is useful in a wide range of applications such as automated customer service, accessibility for visually impaired users, and language learning. Python is a versatile programming language that is widely used for developing TTS applications. In this blog post, we will explore how to use Pyttsx3, a Python library for TTS, to build powerful text to speech (TTS) applications.

What is Pyttsx3?

Pyttsx3 is a Python library that allows developers to create text to speech (TTS) applications in a simple and easy manner. It is a cross-platform library that supports various operating systems, including Windows, Linux, and macOS.

Pyttsx3 in Python is a wrapper for the eSpeak and Microsoft Speech API (SAPI) text-to-speech engines , which provide high-quality speech synthesis capabilities. Pyttsx3 is easy to use and provides a simple interface for controlling speech output, including pitch, volume, and rate.

Install Pythonā€™s Pyttsx3 Library

Before we start, letā€™s install Pyttsx3 using pip, which is the most popular package manager for Python. Open a terminal or command prompt and type the following command:

Once installed, we can start using Pyttsx3 to build text to speech (TTS) applications in Python.

Text to Speech using Pyttsx3 in Python

Using Pyttsx3 is straightforward. We first need to import the pyttsx3 library in our Python code. We can do this by using the following command:

After importing the library, we need to create an object of the pyttsx3.init() class. This object will act as our text-to-speech engine. We can create the object using the following command:

Once we have created the engine object, we can use its say() method to convert our text into speech. The say() method takes a string as input, which is the text we want to convert into speech. We can use the following command to convert our text into speech:

In this example, we are converting the string ā€œHello Worldā€ into speech using the say() method.

After we have converted our text into speech, we need to play it. We can use the runAndWait() method of the engine object to play the speech. The runAndWait() method waits until the speech is complete before returning control to the program. We can use the following command to play our speech:

If using enumerate can make your Python code more efficient, why do so many developers overlook it? Are you one of them?  Python Enumerate

This command will play the speech generated by the say() method.

Customizing Voice Properties in Pyttsx3

Pyttsx3 provides the ability to customize various properties of the voice used for speech, such as the speaking rate, volume, and language. These properties can be set using the setProperty() method of the engine object.

Changing the Python Pyttsx3 TTS Voice

By default, Pyttsx3 uses the voice installed on our system. However, we can change the voice using the setProperty() method of the engine object.

The setProperty() method takes two arguments: the property we want to set and the value we want to set it to. We can use the following command to change the voice:

In this example, we are changing the voice to ā€œen-usā€.

Setting the Pyttsx3 Voice Speed

We can also change the speed of the speech using the setProperty() method.

In this example, we are setting the speed of the speech to 150 words per minute.

Change TTS Voice In Pyttsx3 Python

We can also change the voice of the engine, the default is the voice of a male named David.

To change the voice of the  pyttsx3  engine, first, we will have to get the list of objects of voices.

The getProperty() function of the  pyttsx3  package takes a string as a parameter and returns an object matching the string.

When we print the voices, we get a list that contains two objects.

Letā€™s Print Each Voices.

voices[0] ā€“ Microsoft David Desktop (Male Voice Default).

voices[1] ā€“ Female Voice named Microsoft Zira Desktop.

Now when we have understood the voice property. Also, stored the voices list from the  engine.getProperty('voices')  function. Letā€™s change the Voice of the engine to Female Voice Named Zira.

The naming convention of the pyttsx3 functions is very convenient.  getProperty()  function is used to get the property related to that string. So, the  setProperty()  function is used to set property based on a string.

The above code changes the voice to Zira, the setProperty() function takes the property name and the changing property id.

Change Pitch in Pyttsx3

Pyttsx3 provides the ability to change the pitch of the speech using the setProperty() method of the engine object. This can be useful in creating more natural-sounding speech or for adding emphasis to certain parts of the text.

To change the pitch of the speech, we can use the setProperty() method with the parameter 'pitch' and a value between 0 and 1. A value of 0 represents the lowest pitch and 1 represents the highest pitch. The default value for pitch is 0.5.

In this code, we are setting the pitch to 0.8, which represents a higher pitch than the default value.

Check Voice Pitch Support in Python Pyttsx3

It is important to note that not all voices support changing the pitch, so it may not have any effect depending on the voice that is being used. We can check if a voice supports pitch by using the getProperty() method of the engine object with the parameter 'voices' .

In this code, we are iterating through all the available voices and checking if the voice supports pitch by looking for the 'pitch' attribute in the voice object. If the voice supports pitch, we print a message indicating that the voice supports pitch.

Closing the Pyttsx3 Engine

After we are done with the TTS, we need to close the engine using the stop() method of the engine object.

This command will stop the engine and free up any resources used by it.

Convert a text file to speech using Python Pyttsx3

Pyttsx3 allows us to convert a text file to speech. Hereā€™s an example:

In this example, we read the contents of a text file named sample.txt and use the say and runAndWait methods to convert the text to speech and play the output.

Save Text to Speech into File using Pyttsx3 Python

Sometimes, we may want to save the speech generated by Pyttsx3 to a file for later use. Fortunately, Pyttsx3 provides a simple way to do this. We can use the save_to_file() method of the engine object to save the speech to a file. This method takes two arguments: the text to be spoken and the name of the file to which the speech should be saved.

In this code, we are using the save_to_file() method to save the text ā€œHello, world!ā€ to a file named ā€œoutput.mp3ā€. We then call the runAndWait() method to wait for the speech to finish before proceeding with the rest of the code.

By default, Pyttsx3 saves the speech in the WAV format. However, we can also save the speech in other formats such as MP3 and OGG by specifying the file extension in the file name.

In this code, we are saving the speech to a file named ā€œoutput.oggā€, which will be in the OGG format.

Itā€™s worth noting that Pyttsx3 uses the audio encoding format specified by the systemā€™s audio driver. This means that the audio format may vary depending on the system on which the code is executed. If you need to ensure a specific audio format, you may need to use an external audio library to convert the audio file to the desired format.

Python Text to Speech with Flask using Pyttsx3

Pyttsx3 can also be used with Flask, which is a popular Python web framework. Hereā€™s an example:

In this example, we define a Flask application that has two routes: the root route that displays a form to enter text, and the /speak route that converts the text to speech using Pyttsx3. We create an instance of the engine class and use the say and runAndWait methods in the /speak route to convert the text to speech and play the output.

Python Pyttsx3 Text to Speech Callbacks

Pyttsx3 provides a Engine class that allows us to define callbacks, which are functions that are called at specific points during the speech synthesis process. Hereā€™s an example:

In this example, we define two callback functions named onStart and onEnd , which are called when speech synthesis starts and ends, respectively.

We then create an instance of the engine class and use the connect method to register the callback functions. Finally, we use the say and runAndWait methods to convert the text to speech and play the output.

Utterance Event Handlers

Utterance event handlers are used to handle events related to the utterances, such as the completion of speech, the start of speech, and errors during speech. We can use these event handlers to perform specific actions when certain events occur during speech.

Pyttsx3 provides four types of utterance event handlers:

1. started-utterance

This event occurs when the engine starts speaking an utterance.

In this code, we are defining a function on_started_utterance() that will be called when the 'started-utterance' event occurs.

When the engine starts speaking the text, the utterance event handler on_started_utterance() will be called and will print the message ā€œStarted speakingā€¦ā€.

2. finished-utterance

This event occurs when the engine finishes speaking an utterance.

In this code, we are defining a function on_finished_utterance() that will be called when the 'finished-utterance' event occurs.

When the engine finishes speaking the text, the utterance event handler on_finished_utterance() will be called and will print the message ā€œFinished speakingā€¦ā€.

3. started-word

This event occurs when the engine starts speaking a word.

In this code, we are defining a function on_started_word() that will be called when the 'started-word' event occurs.

When the engine starts speaking each word in the text, the utterance event handler on_started_word() will be called and will print a message indicating the location of the word.

4. finished-word

This event occurs when the engine finishes speaking a word.

In this code, we are defining a function on_finished_word() that will be called when the 'finished-word' event occurs.

When the engine finishes speaking each word in the text, the utterance event handler on_finished_word() will be called and will print a message indicating the end of the word.

Error Event Handler

Pyttsx3 also provides an error event handler that can be used to handle errors that may occur during the text to speech process.

The error event handler is called whenever an error occurs during the speaking process.

The syntax for registering an error event handler is similar to registering other event handlers. We use the connect() method of the engine object to register the error event handler function, as shown below:

In the above code, we have defined a function on_error() which takes two arguments: the name of the event that occurred (in this case, 'error' ), and the error message itself.

If an error occurs during the speaking process, the error event handler on_error() will be called, and it will print a message indicating the error.

By using the error event handler, we can handle errors that may occur during the text-to-speech process and respond to them appropriately. This can help us to create more robust and reliable text-to-speech applications using pyttsx3 in Python.

Driver Event Loop

Pyttsx3 in Python also supports driver event handlers. These handlers allow you to register functions that will be called when certain events occur in the Text-to-Speech engineā€™s event loop.

To use driver event handlers in pyttsx3, we first need to run the engineā€™s event loop using the startLoop() method. This method starts a new thread that runs the event loop, allowing the main thread to continue executing other code.

In this code, we are defining a function on_start() and on_end() that will be called when the 'started-utterance' and ā€˜ finished-utterance' event occurs.

When we run this code, the engine will speak the text ā€œThis text will be spokenā€, and the on_start() function will be called when the engine starts speaking the utterance.

Once the engine finishes speaking the utterance, the on_end() function will be called, and the event loop will be stopped by calling endLoop() .

Difference between startLoop() and runAndWait()

When using Pyttsx3 for text to speech, there are two main methods that can be used to initiate speech: startLoop() and runAndWait() . While these methods might seem similar at first glance, they actually have some important differences in their behavior.

startLoop()

The startLoop() method is used to start the speech synthesis engine in a separate thread, which allows the application to continue running while the engine is speaking. This can be useful when the application needs to perform other tasks while the engine is speaking.

In this example, we initialize the Pyttsx3 engine and call startLoop() to start the engine in a separate thread. We then use the say() method to queue up two phrases to be spoken.

Finally, we enter an infinite loop to keep the application running, since the engine will continue speaking in the background.

runAndWait()

The runAndWait() method is used to start the speech synthesis engine and block the application until the engine has finished speaking. This can be useful when the application needs to wait for the engine to finish speaking before performing other tasks.

In this example, we initialize the Pyttsx3 engine and use the say() method to queue up two phrases to be spoken. We then call runAndWait() to start the engine and block the application until the engine has finished speaking both phrases.

The key difference between these methods is that startLoop() does not block the application, while runAndWait() does. This can impact the behavior of the application and the user experience, so it is important to choose the appropriate method based on the specific needs of the application.

Speech Synthesis Markup Language (SSML)

SSML stands for Speech Synthesis Markup Language and it is a markup language that is used to control various aspects of the Text to Speech output such as pronunciation, pitch, rate, volume, etc. It allows users to add additional information to the text that is being spoken to provide a better TTS experience.

SSML is divided into three types of tags:

1. Text Processing Tags

  • Input: <speak>My phone number is <say-as interpret-as="telephone">+1-123-456-7890</say-as>.</speak>
  • Output: ā€œMy phone number is plus one, one two three, four five six, seven eight nine zero.ā€
  • Input: <speak>The word "schedule" can be pronounced as <phoneme alphabet="ipa" ph="ĖˆŹƒÉ›djuĖl">shedyool</phoneme> or <phoneme alphabet="ipa" ph="ĖˆskɛdjuĖl">skedyool</phoneme>.</speak>
  • Output: ā€œThe word ā€˜scheduleā€™ can be pronounced as shedyool or skedyool.ā€
  • Input: <speak>She ate a <sub alias="big">large</sub> slice of pizza.</speak>
  • Output: ā€œShe ate a big slice of pizza.ā€
  • Input: <speak>This is a sentence.<break time="1000ms"/>This is another sentence.</speak>
  • Output: ā€œThis is a sentence. (1 second pause) This is another sentence.ā€

2. Prosody Tags

  • Input: <speak><prosody pitch="+30%">This is spoken with a higher pitch.</prosody></speak>
  • Output: ā€œThis is spoken with a higher pitch.ā€
  • Input: <speak>The <emphasis level="strong">cat</emphasis> ran quickly.</speak>
  • Output: ā€œThe CAT ran quickly.ā€
  • Input: <speak><say-as interpret-as="interjection">Wow!</say-as> That was amazing.</speak>
  • Output: ā€œWow! That was amazing.ā€
  • Input: <speak>The year is <say-as interpret-as="spell-out">2023</say-as>.</speak>
  • Output: ā€œThe year is two zero two three.ā€

3. Audio Tags

  • Input: <speak>Listen to this sound:<audio src="https://example.com/sound.mp3"/></speak>
  • Output: Plays the audio file located at .
  • Input: <speak>Listen to this sound:<audio src="https://example.com/sound.mp3"><desc>This is the sound of a bird singing.</desc></audio></speak>
  • Output: Plays the audio file located at and provides a description of the audio file for accessibility purposes.
  • Input: <speak>This is a sentence. <p><audio src="https://example.com/sound.mp3"/></p> This is another sentence.</speak>
  • Output: ā€œThis is a sentence. (audio file plays) This is another sentence.ā€

Python Pyttsx3 supports the use of SSML tags, allowing users to customize the Text-to-Speech output in a variety of ways. The say() method of the engine object can be used to convert text to speech using SSML tags. The text should be enclosed in a <speak> tag to indicate that it is SSML.

Pyttsx3 SSML Example

Here is an example of using SSML to add a pause in the TTS output:

In this code, the <break> tag is used to insert a pause of 500 milliseconds in the TTS output.

Other SSML tags can be used to customize the TTS output in various ways. For example, the <prosody> tag can be used to adjust the pitch, rate, and volume of the TTS output. The <emphasis> tag can be used to emphasize certain words in the TTS output.

Here is an example of using the <prosody> tag to adjust the pitch, rate, and volume of the TTS output:

In this code, we are using the <prosody> tag to adjust the pitch to ā€œhighā€, the rate to ā€œslowā€, and the volume to ā€œloudā€ for the text ā€œThis is a test of SSML tagsā€.

By using SSML tags in Text-to-Speech using Python Pyttsx3, we can create a more customized and natural-sounding TTS experience for our users.

Whatā€™s Next?

If youā€™re interested in exploring more about Text to Speech, you should definitely check out the blog post on using Google Cloud Text to Speech (TTS) API . It covers everything from project creation on Google Cloud Platform to REST API Endpoint implementation, and itā€™s a fantastic companion to this blog.

You can also learn about Speech Recognition in Python .

Wrapping Up

In conclusion, we have covered a lot of ground in this complete guide to using Pyttsx3 for text-to-speech in Python. We started by learning how to install Pyttsx3 and set up a basic TTS application, and then went on to explore some of the more advanced features such as changing engine properties, using callbacks and event handlers, and working with driver event loops.

We also learned about SSML and how to use it in Pyttsx3 to add more control over the prosody and audio output of our TTS application. By using SSML tags like <prosody> , <say-as> , and <audio> , we can create more dynamic and expressive speech output.

We can say that, Pyttsx3 is a powerful and flexible tool for adding text-to-speech functionality to Python applications. With its easy-to-use API and extensive documentation, it is a great choice for developers looking to add TTS to their projects. Whether you are building a chatbot, a virtual assistant, or any other type of application that requires speech output, Pyttsx3 is definitely worth checking out.

  • Pyttsx3 documentation
  • Pyttsx3 Engine
  • Flask documentation: https://flask.palletsprojects.com/en/2.1.x/

Related Posts

Motion detection opencv python, contours in opencv python: approximation, sorting, and hierarchy, edge detection opencv python.

I am using Pyttsx3 in a Python app. The text to speech works fine but then the app just closes after the speech. I’m using the same code as above in my app.

def speak(self,text):     engine = pyttsx3.init()     engine.say(text)     engine.runAndWait()     engine.stop()

Any suggestions?? Thanks.

try writing the code in if statement and end the code with break statement

Just remove the engine.stop()

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Text to speech in python

  • machine-learning

Text to speech (TTS) is the conversion of written text into spoken voice.You can create TTS programs in python. The quality of the spoken voice depends on your speech engine.

In this article youā€™ll learn how to create your own TTS program.

Related course: Complete Python Programming Course & Exercises

Example with espeak

The program ā€˜espeakā€™ is a simple speech synthesizer which converst written text into spoken voice. The espeak program does sound a bit robotic, but its simple enough to build a basic program.


2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
subprocess

def execute_unix(inputcommand):
p = subprocess.Popen(inputcommand, stdout=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
return output

a = "Say something in natural language."

# create wav file
# w = 'espeak -w temp.wav "%s" 2>>/dev/null' % a
# execute_unix(w)

# tts using espeak
c = 'espeak -ven+f3 -k5 -s150 --punct="<characters>" "%s" 2>>/dev/null' % a
execute_unix(c)

TTS with Google

Google has a very natural sounding voices. You can use their TTS engine with the code below. For this program you need the module gTTS installed as well as the program mpg123.


2
3
4
5
6
7
8
9
10
11
12
13
14
15

# pip install gTTS
# apt install mpg123

from gtts import gTTS
import os

# define variables
s = "escape with plane"
file = "file.mp3"

# initialize tts, create mp3 and play
tts = gTTS(s, 'en')
tts.save(file)
os.system("mpg123 " + file)

This will output spoken voice / an mp3 file.

Play sound in Python

Convert MP3 to WAV

  • Python Course
  • Python Basics
  • Interview Questions
  • Python Quiz
  • Popular Packages
  • Python Projects
  • Practice Python
  • AI With Python
  • Learn Python3
  • Python Automation
  • Python Web Dev
  • DSA with Python
  • Python OOPs
  • Dictionaries

Python Text To Speech | pyttsx module

pyttsx is a cross-platform text to speech library which is platform independent. The major advantage of using this library for text-to-speech conversion is that it works offline. However, pyttsx supports only Python 2.x. Hence, we will see pyttsx3 which is modified to work on both Python 2.x and Python 3.x with the same code.

Use this command for Installation:

  Usage – First we need to import the library and then initialise it using init() function. This function may take 2 arguments. init(driverName string, debug bool)

  • drivername : [Name of available driver] sapi5 on Windows | nsss on MacOS
  • debug: to enable or disable debug output

After initialisation, we will make the program speak the text using say() function. This method may also take 2 arguments. say(text unicode, name string)

  • text : Any text you wish to hear.
  • name : To set a name for this speech. (optional)

Finally, to run the speech we use runAndWait() All the say() texts won’t be said unless the interpreter encounters runAndWait() .

Code #1: Speaking Text

   

  Code #2: Listening for events

               

Why pyttsx? It works offline, unlike other text-to-speech libraries. Rather than saving the text as audio file, pyttsx actually speaks it there. This makes it more reliable to use for voice-based projects.

Please Login to comment...

Similar reads.

  • How to Get a Free SSL Certificate
  • Best SSL Certificates Provider in India
  • Elon Musk's xAI releases Grok-2 AI assistant
  • What is OpenAI SearchGPT? How it works and How to Get it?
  • Content Improvement League 2024: From Good To A Great Article

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

py3-tts 3.5

pip install py3-tts Copy PIP instructions

Released: Oct 2, 2023

Text to Speech (TTS) library for Python 3. Works without internet connection or delay. Supports multiple TTS engines, including Sapi5, nsss, and espeak.

Verified details

Maintainers.

Avatar for thevickypedia from gravatar.com

Unverified details

Project links.

  • License: Mozilla Public License 2.0 (MPL 2.0) (Mozilla Public License Version 2.0 ================================== 1. Definitions --------------...)
  • Author: Vignesh Sivanandha Rao
  • Tags pyttsx, ivona, pyttsx for python3, TTS for python3, py3-tts, text to speech for python, tts, text to speech, speech, speech synthesis, offline text to speech, offline tts, gtts
  • Requires: Python >=3

Classifiers

  • End Users/Desktop
  • Information Technology
  • System Administrators
  • OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
  • MacOS :: MacOS X
  • Microsoft :: Windows
  • Python :: 3
  • Python :: 3.5
  • Python :: 3.6
  • Python :: 3.7

Project description

text to speech python

Offline Text To Speech (TTS) converter for Python

text to speech python

py3-tts (originally pyttsx3 ) is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline .

Installation

If you get installation errors, make sure you first upgrade your wheel version using

Linux installation requirements

If you are on a linux system and if the voice output is not working,

Install espeak , ffmpeg and libespeak1 as shown below

  • āœØFully OFFLINE text to speech conversion
  • šŸŽˆ Choose among different voices installed in your system
  • šŸŽ› Control speed/rate of speech
  • šŸŽš Tweak Volume
  • šŸ“€ Save the speech audio as a file
  • ā¤ļø Simple, powerful, & intuitive API

Single line usage with speak function with default options

Changing Voice, Rate and Volume

Included TTS engines

Feel free to wrap another text-to-speech engine for use with pyttsx3 .

Project Links

  • PyPI ( https://pypi.org/project/py3-tts/ )
  • GitHub ( https://github.com/thevickypedia/py3-tts )
  • Full Documentation ( https://py3-tts.vigneshrao.com/ )

nateshmbhat for the original code pyttsx3

Project details

Release history release notifications | rss feed.

Oct 2, 2023

Sep 13, 2023

Sep 9, 2023

3.3b0 pre-release

3.3a0 pre-release

Sep 7, 2023

3.2rc0 pre-release

3.2b0 pre-release

3.2a0 pre-release

Aug 31, 2023

Apr 19, 2023

Apr 3, 2023

3.0b0 pre-release

3.0a0 pre-release

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages .

Source Distributions

Built distribution.

Uploaded Oct 2, 2023 Python 3

Hashes for py3_tts-3.5-py3-none-any.whl

Hashes for py3_tts-3.5-py3-none-any.whl
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256
  • portuguĆŖs (Brasil)

Supported by

text to speech python

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectivesā„¢ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

How to make Python speak

How could I make Python say some text?

I could use Festival with subprocess but I won't be able to control it (or maybe in interactive mode, but it won't be clean).

Is there a Python TTS library? Like an API for Festival, eSpeak, ... ?

  • text-to-speech

Ninjakannon's user avatar

  • does "Festival" have a public API? –  jldupont Commented Oct 23, 2009 at 15:04
  • For text to speech I found this package called " gTTS " in Python. You can try this out. It does work with Python 3.5. The github repo for this package is gTTS-github . –  Harshdeep Sokhey Commented Jan 21, 2017 at 21:47

14 Answers 14

You should try using the PyTTSx package since PyTTS is outdated. PyTTSx works with Python 2. For Python 3, install the PyTTSx3 package.

http://pypi.python.org/pypi/pyttsx/

https://pypi.org/project/pyttsx3/

Al Sweigart's user avatar

  • 6 Does not work for python 3. This answer was up to date as of 2009 –  Jonathan Commented Feb 6, 2015 at 11:36
  • 5 Despite being available through pip, still does not work as of 2015 –  Programming by Permutation Commented Jun 7, 2015 at 15:57
  • I confirm it does not work with python3 and easy fixes (printf as a function, fixing exception handling syntax and fixing imports) don't make it work, it simply fails silently. Interfacing with espeak (what it does on Linux) is as simple as spawning a subprocess, so that's what I ended up doing. –  Léo Germond Commented Mar 13, 2016 at 10:47
  • 1 Just added a comment eat the top of the question to note this only works with Python 2.x –  Eligio Becerra Commented Mar 22, 2016 at 23:45
  • PYTTSX3 works in python 3 too. Its a cool module –  Pear Commented Apr 30, 2021 at 10:12

A bit cheesy, but if you use a mac you can pass a terminal command to the console from python.

Try typing the following in the terminal:

And there will be a voice from the mac that will speak that. From python such a thing is relatively easy:

cantdutchthis's user avatar

  • 9 I don't want the say command to block my Python code, so I add an ampersand like this: os.system("say 'hello world' &") –  VinceFior Commented May 17, 2016 at 22:51
  • On ubuntu, the terminal command to use is spd-say –  natka_m Commented Nov 26, 2019 at 10:10

install pip install pypiwin32

How to use the text to speech features of a Windows PC

Using google text-to-speech api to create an mp3 and hear it.

After you installed the gtts module in cmd: pip install gtts

PythonProgrammi's user avatar

  • 2 You can install required module in your system by running pip install pypiwin32 as administartor. –  Kamil Szot Commented Dec 21, 2016 at 12:35
  • 2 Google solution seems to be one of the best : allows to change of language, it is also really fast. –  snoob dogg Commented May 2, 2018 at 23:05
  • Strangely, the first code example works on some Windows 10 PCs but not others. Why is that? –  ColorCodin Commented Jul 9, 2018 at 1:06
  • 1 @ColorCodin I am not sure, but you should check in the control panel, the syntetized voice (I don't remember the exact name of this options) and see if it has been set... there is a button you can press to see if it works. If it works in the settings, should work with the code, because I think it uses the windows synthesized voice, I think. –  PythonProgrammi Commented Jul 9, 2018 at 17:24
  • It's been set, but when the command is run through CMD it says "Access is denied." –  ColorCodin Commented Jul 9, 2018 at 22:41

The python-espeak package is available in Debian, Ubuntu, Redhat, and other Linux distributions. It has recent updates, and works fine.

Jonathan Leaders notes that it also works on Windows, and you can install the mbrola voices as well. See the espeak website at http://espeak.sourceforge.net

nealmcb's user avatar

A simple Google led me to pyTTS , and a few documents about it . It looks unmaintained and specific to Microsoft's speech engine, however.

On at least Mac OS X, you can use subprocess to call out to the say command, which is quite fun for messing with your coworkers but might not be terribly useful for your needs.

It sounds like Festival has a few public APIs, too:

Festival offers a BSD socket-based interface. This allows Festival to run as a server and allow client programs to access it. Basically the server offers a new command interpreter for each client that attaches to it. The server is forked for each client but this is much faster than having to wait for a Festival process to start from scratch. Also the server can run on a bigger machine, offering much faster synthesis. linky

There's also a full-featured C++ API , which you might be able to make a Python module out of (it's fun!). Festival also offers a pared-down C API -- keep scrolling in that document -- which you might be able to throw ctypes at for a one-off.

Perhaps you've identified a hole in the market?

Jed Smith's user avatar

There are a number of ways to make Python speak in both Python3 and Python2, two great methods are:

If you are on mac you will have the os module built into your computer. You can import the os module using:

You can then use os to run terminal commands using the os.system command:

In terminal, the way you make your computer speak is using the "say" command, thus to make the computer speak you simply use:

If you want to use this to speak a variable you can use:

The second way to get python to speak is to use

  • The pyttsx module

You will have to install this using

or for Python3

You can then use the following code to get it to speak:

I hope this helps! :)

KetZoomer's user avatar

Pyttsx3 is a python module which is a modern clone of pyttsx, modified to work with the latest versions of Python 3!

  • GitHub: https://github.com/nateshmbhat/pyttsx3
  • Read the documentation : https://pyttsx3.readthedocs.org

It is multi-platform , works offline , and works with any python version .

It can be installed with pip install pyttsx3 and usage is the same as pyttsx:

Toby56's user avatar

  • Is there a recommended way to make saying async? –  Anatoly Alekseev Commented Nov 6, 2020 at 19:42
  • @AnatolyAlekseev No there doesn't seem to be one. Just use asyncio or however you do that in python I guess. –  Toby56 Commented Nov 7, 2020 at 23:14

You can use espeak using python for text to speech converter. Here is an example python code

P.S : if espeak isn't installed on your linux system then you need to install it first. Open terminal(using ctrl + alt + T) and type

alphaguy's user avatar

I prefer to use the Google Text To Speech library because it has a more natural voice.

There is one limitation. gTTS can only convert text to speech and save. So you will have to find another module or function to play that file. (Ex: playsound)

Playsound is a very simple module that has one function, which is to play sound.

You can call playsound.playsound() directly after saving the mp3 file.

thisisnotshort's user avatar

There may not be anything 'Python specific', but the KDE and GNOME desktops offer text-to-speech as a part of their accessibility support, and also offer python library bindings. It may be possible to use the python bindings to control the desktop libraries for text to speech.

If using the Jython implementation of Python on the JVM, the FreeTTS system may be usable.

Finally, OSX and Windows have native APIs for text to speech. It may be possible to use these from python via ctypes or other mechanisms such as COM.

Community's user avatar

If you are using python 3 and windows 10, the best solution that I found to be working is from Giovanni Gianni. This played for me in the male voice:

I also found this video on youtube so if you really want to, you can get someone you know and make your own DIY tts voice.

Elijah's user avatar

  • Is there a way to get this to work with other languages (Japanese or Chinese?) –  Moondra Commented May 7, 2018 at 21:02

This is what you are looking for. A complete TTS solution for the Mac. You can use this standalone or as a co-location Mac server for web apps:

http://wolfpaulus.com/jounal/mac/ttsserver/

Drew Gaynor's user avatar

Combining the following sources, the following code works on Windows, Linux and macOS using just the platform and os modules:

  • cantdutchthis' answer for the mac command
  • natka_m's comment for the Ubuntu command
  • BananaAcid's answer for the Windows command
  • Louis Brandy's answer for how to detect the OS
  • nc3b's answer for how to detect the Linux distribution

Note: This method is not secure and could be exploited by malicious text.

Minion Jim's user avatar

Just use this simple code in python.

Works only for windows OS.

I personally use this.

Dhruv Arne's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking ā€œPost Your Answerā€, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python text-to-speech or ask your own question .

  • The Overflow Blog
  • From PHP to JavaScript to Kubernetes: how one backend engineer evolved over time
  • Where does Postgres fit in a world of GenAI and vector databases?
  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Bringing clarity to status tag usage on meta sites
  • Feedback requested: How do you use tag hover descriptions for curating and do...
  • What does a new user need in a homepage experience on Stack Overflow?
  • Staging Ground Reviewer Motivation

Hot Network Questions

  • Can I retain the ordinal nature of a predictor while answering a question about it that is inherently binary?
  • What did the Ancient Greeks think the stars were?
  • Can a rope thrower act as a propulsion method for land based craft?
  • Experience related to The GA4
  • Origin of the phrase "I'm on it"
  • Why is Emacs recompiling some packages on every startup?
  • My enemy sent me this puzzle!
  • Deviation from the optimal solution for Solomon instances of CVRPTW
  • My school wants me to download an SSL certificate to connect to WiFi. Can I just avoid doing anything private while on the WiFi?
  • Crystal Oscillator Waveform
  • Inconsistent “unzip -l … | grep -q …” results with pipefail
  • Is it possible to do physics without mathematics?
  • Can light become a satellite of a black hole?
  • Weird definition of discrete random variable - How do you define a sum over an uncountable set?
  • Why does Russia strike electric power in Ukraine?
  • How do I alter a table by using AFTER in hook update?
  • How to attach a 4x8 plywood to a air hockey table
  • How can you trust a forensic scientist to have maintained the chain of custody?
  • What did Jesus mean by 'cold water'' in Mtt 10:42?
  • Can a 2-sphere be squashed flat?
  • How to justify our beliefs so that it is not circular?
  • In lattice, does converting a "bad" basis to a "good" basis constitute a hard problem?
  • On a 3D Gagliardo-Nirenberg inequality
  • What are the risks of a compromised top tube and of attempts to repair it?

text to speech python

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Speech To Speech: an effort for an open-sourced and modular GPT4-o

huggingface/speech-to-speech

Folders and files.

NameName
88 Commits

Repository files navigation

text to speech python

šŸ“– Quick Index

  • Docker Server approach
  • Server/Client approach
  • Local approach
  • Model parameters
  • Generation parameters
  • Notable parameters

This repository implements a speech-to-speech cascaded pipeline with consecutive parts:

  • Voice Activity Detection (VAD) : silero VAD v5
  • Speech to Text (STT) : Whisper checkpoints (including distilled versions )
  • Language Model (LM) : Any instruct model available on the Hugging Face Hub ! šŸ¤—
  • Text to Speech (TTS) : Parler-TTS šŸ¤—

The pipeline aims to provide a fully open and modular approach, leveraging models available on the Transformers library via the Hugging Face hub. The level of modularity intended for each part is as follows:

  • VAD : Uses the implementation from Silero's repo .
  • STT : Uses Whisper models exclusively; however, any Whisper checkpoint can be used, enabling options like Distil-Whisper and French Distil-Whisper .
  • LM : This part is fully modular and can be changed by simply modifying the Hugging Face hub model ID. Users need to select an instruct model since the usage here involves interacting with it.
  • TTS : The mini architecture of Parler-TTS is standard, but different checkpoints, including fine-tuned multilingual checkpoints, can be used.

The code is designed to facilitate easy modification. Each component is implemented as a class and can be re-implemented to match specific needs.

Clone the repository:

Install the required dependencies using uv :

The pipeline can be run in two ways:

  • Server/Client approach : Models run on a server, and audio input/output are streamed from a client.
  • Local approach : Runs locally.

Docker Server

Install the nvidia container toolkit.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Start the docker container

docker compose up

Server/Client Approach

To run the pipeline on the server:

Then run the client locally to handle sending microphone input and receiving generated audio:

Running on Mac

To run on mac, we recommend setting the flag --local_mac_optimal_settings :

You can also pass --device mps to have all the models set to device mps. The local mac optimal settings set the mode to be local as explained above and change the models to:

  • LightningWhisperMLX

Recommended usage with Cuda

Leverage Torch Compile for Whisper and Parler-TTS:

For the moment, modes capturing CUDA Graphs are not compatible with streaming Parler-TTS ( reduce-overhead , max-autotune ).

Command-line Usage

Model parameters.

model_name , torch_dtype , and device are exposed for each part leveraging the Transformers' implementations: Speech to Text, Language Model, and Text to Speech. Specify the targeted pipeline part with the corresponding prefix:

  • stt (Speech to Text)
  • lm (Language Model)
  • tts (Text to Speech)

For example:

Generation Parameters

Other generation parameters of the model's generate method can be set using the part's prefix + _gen_ , e.g., --stt_gen_max_new_tokens 128 . These parameters can be added to the pipeline part's arguments class if not already exposed (see LanguageModelHandlerArguments for example).

Notable Parameters

Vad parameters.

  • --thresh : Threshold value to trigger voice activity detection.
  • --min_speech_ms : Minimum duration of detected voice activity to be considered speech.
  • --min_silence_ms : Minimum length of silence intervals for segmenting speech, balancing sentence cutting and latency reduction.

Language Model

  • --init_chat_role : Defaults to None . Sets the initial role in the chat template, if applicable. Refer to the model's card to set this value (e.g. for Phi-3-mini-4k-instruct you have to set --init_chat_role system )
  • --init_chat_prompt : Defaults to "You are a helpful AI assistant." Required when setting --init_chat_role .

Speech to Text

--description : Sets the description for Parler-TTS generated voice. Defaults to: "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

--play_steps_s : Specifies the duration of the first chunk sent during streaming output from Parler-TTS, impacting readiness and decoding steps.

Distil-Whisper

Contributors 7.

@andimarafioti

  • Python 99.6%
  • Dockerfile 0.4%

IMAGES

  1. Convert Text to Speech Using Python

    text to speech python

  2. Easy steps to Convert Text to Speech in Python || Using Pyttsx3 ||Text

    text to speech python

  3. Simple Text To Speech In Python

    text to speech python

  4. TEXT TO SPEECH IN PYTHON

    text to speech python

  5. Text to speech conversion using Python

    text to speech python

  6. Python Text to Speech Converter

    text to speech python

COMMENTS

  1. pyttsx3 Ā· PyPI

    pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3. Installation pip install pyttsx3 > If you get installation errors , make sure you first upgrade your wheel version using : pip install -upgrade wheel Linux installation requirements :

  2. Text to Speech in Python [With Code Examples]

    Learn how to create text-to-speech programs in Python using two cross-platform modules: PyTTSx3 and gTTS. See code examples, voice options, and how to install and use them.

  3. How to Convert Text to Speech in Python

    To get started with this library, open up a new Python file and import it: import pyttsx3. Now, we need to initialize the TTS engine: # initialize Text-to-speech engine. engine = pyttsx3.init() To convert some text, we need to use say() and runAndWait() methods: # convert this text to speech.

  4. Convert Text to Speech in Python

    Learn how to use the Google Text to Speech API (gTTS) to convert text to audio in Python. See examples of code, output and installation steps for different languages and speeds.

  5. pyttsx4 Ā· PyPI

    Text to Speech (TTS) library for Python 3. Works without internet connection or delay. Supports multiple TTS engines, including Sapi5, nsss, and espeak.

  6. voicebox-tts Ā· PyPI

    voicebox. Python text-to-speech library with built-in voice effects and support for multiple TTS engines. | GitHub | Documentation šŸ“˜ | Audio Samples šŸ”‰ | # Example: Use gTTS with a vocoder effect to speak in a robotic voice from voicebox import SimpleVoicebox from voicebox.tts import gTTS from voicebox.effects import Vocoder, Normalize voicebox = SimpleVoicebox (tts = gTTS (), effects ...

  7. An Essential Python Text-to-Speech Tutorial Using the pyttsx3 Library

    Learn how to use pyttsx3, a Python library that allows you to convert text to speech and customize the speed, volume, and voice. See examples of how to install, initialize, and save the TTS audio file.

  8. Using the Text-to-Speech API with Python

    In this step, you were able to use Text-to-Speech API to convert sentences into audio wav files. Read more about creating voice audio files. 7. Congratulations! You learned how to use the Text-to-Speech API using Python to generate human-like speech! Clean up. To clean up your development environment, from Cloud Shell:

  9. Text-to-speech in Python with pyttsx3

    Learn how to use Python for text-to-speech with a cross-platform library, pyttsx3. This tutorial covers how to install, convert text to speech, change voice and language, and use event hooks.

  10. pyttsx3

    pyttsx3 is a Python module that allows you to convert text to speech using various synthesizers and drivers. Learn how to use pyttsx3, its engine and voice metadata, and how to implement your own drivers.

  11. Python

    Learn how to convert text to speech using pyttsx3, a text-to-speech conversion library in Python. See the installation, code and output examples for Windows, Mac and other platforms.

  12. Text-to-Speech with Python: A Guide to Using pyttsx3

    In this script: We import the pyttsx3 library. Initialize the TTS engine using pyttsx3.init(). Set optional properties such as speech rate and volume. Specify the text to be converted to speech. Use the say method to perform the text-to-speech conversion. Finally, we wait for the speech to finish with runAndWait().

  13. Pyttsx3 Python Library: Create Text to Speech (TTS) Applications

    What is Pyttsx3? Pyttsx3 is a Python library that allows developers to create text to speech (TTS) applications in a simple and easy manner. It is a cross-platform library that supports various operating systems, including Windows, Linux, and macOS. Pyttsx3 in Python is a wrapper for the eSpeak and Microsoft Speech API (SAPI) text-to-speech engines, which provide high-quality speech synthesis ...

  14. Exploring Text-to-Speech in Python with pyttsx3

    Introduction. Text-to-speech (TTS) technology is a fascinating field that allows computers to convert written text into spoken words. In this blog post, we will delve into the world of text-to-speech synthesis using Python and the powerful pyttsx3 library. Whether you're interested in creating accessible applications, building interactive ...

  15. Text to speech in python

    Learn how to create your own text to speech (TTS) program in python using espeak and gTTS modules. Compare the quality and sound of different speech engines and voices.

  16. Exploring Text-to-Speech in Python with pyttsx3: A ...

    pyttsx3 is a Python library that enables text-to-speech conversion. It is a more advanced and feature-rich version of the older pyttsx library, making it an excellent choice for developing text-to ...

  17. TTS Ā· PyPI

    TTS is a Python package that allows you to synthesize speech with various models and vocoders. You can use command-line or API to run TTS with different languages, speakers, and voice conversion options.

  18. Text-to-Speech in Python Using pyttsx3

    Importing pyttsx3: Begin by importing the pyttsx3 library, which provides text-to-speech functionality. Initializing the Engine: Create an instance of the text-to-speech engine using pyttsx3.init ...

  19. Convert Text to Speech with Python: A Step-by-Step Guide for ...

    Step 4. Prepare Your Text: Assign the text you wish to convert into speech to a variable for now. Later, in the next chapter of this guide, you can use this variable to store input text from users ...

  20. Python Text To Speech

    pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3. An application invokes the pyttsx3.init() factory function to get a reference to a pyttsx3. Engine instance. it is a very easy to use tool which converts the entered text into speech. The pyttsx3 modul

  21. py3-tts Ā· PyPI

    py3-tts is a Python library that converts text to speech without internet connection or delay. It supports multiple TTS engines, such as Sapi5, nsss, and espeak, and allows you to control speed, volume, and voice.

  22. text to speech

    In terminal, the way you make your computer speak is using the "say" command, thus to make the computer speak you simply use: os.system("say 'some text'") If you want to use this to speak a variable you can use: os.system("say " + myVariable) The second way to get python to speak is to use. The pyttsx module.

  23. Text to Speech (TTS) in Python Using Pyttsx3

    Pyttsx3 in Python is a wrapper for the eSpeak and Microsoft Speech API (SAPI) text-to-speech engines, which provide high-quality speech synthesis capabilities. Pyttsx3 is easy to use and provides ...

  24. GitHub

    This repository implements a speech-to-speech cascaded pipeline with consecutive parts: Voice Activity Detection (VAD): silero VAD v5; Speech to Text (STT): Whisper checkpoints (including distilled versions) Language Model (LM): Any instruct model available on the Hugging Face Hub! šŸ¤—; Text to Speech (TTS): Parler-TTSšŸ¤—