text to speech whisper voice

Voice Generator

This web app allows you to generate voice audio from text - no login needed, and it's completely free! It uses your browser's built-in voice synthesis technology, and so the voices will differ depending on the browser that you're using. You can download the audio as a file, but note that the downloaded voices may be different to your browser's voices because they are downloaded from an external text-to-speech server. If you don't like the externally-downloaded voice, you can use a recording app on your device to record the "system" or "internal" sound while you're playing the generated voice audio.

Want more voices? You can download the generated audio and then use voicechanger.io to add effects to the voice. For example, you can make the voice sound more robotic, or like a giant ogre, or an evil demon. You can even use it to reverse the generated audio, randomly distort the speed of the voice throughout the audio, add a scary ghost effect, or add an "anonymous hacker" effect to it.

Note: If the list of available text-to-speech voices is small, or all the voices sound the same, then you may need to install text-to-speech voices on your device. Many operating systems (including some versions of Android, for example) only come with one voice by default, and the others need to be downloaded in your device's settings. If you don't know how to install more voices, and you can't find a tutorial online, you can try downloading the audio with the download button instead. As mentioned above, the downloaded audio uses external voices which may be different to your device's local ones.

You're free to use the generated voices for any purpose - no attribution needed. You could use this website as a free voice over generator for narrating your videos in cases where don't want to use your real voice. You can also adjust the pitch of the voice to make it sound younger/older, and you can even adjust the rate/speed of the generated speech, so you can create a fast-talking high-pitched chipmunk voice if you want to.

Note: If you have offline-compatible voices installed on your device (check your system Text-To-Speech settings), then this web app works offline! Find the "add to homescreen" or "install" button in your browser to add a shortcut to this app in your home screen. And note that if you don't have an internet connection, or if for some reason the voice audio download isn't working for you, you can also use a recording app that records your devices "internal" or "system" sound.

Got some feedback? You can share it with me here .

If you like this project check out these: AI Chat , AI Anime Generator , AI Image Generator , and AI Story Generator .

Don't have an account? Register

Two Factor Authentication

Forgot password.

Already have an account? Login

Pronunciation Editor

Access more product features by logging in.

Pause Settings

Question ? Seconds
Exclamation ! Seconds
At @ Seconds
Hash # Seconds
Between Paragraphs Seconds

Pronunciation Editor is available only with our all paid plans.

Voice Profile

Voice profile feature is available only with all our paid plans.

Voice Selection

Audio Setting

My projects, add project, edit project name, delete project, are you sure you want to delete this project, add to archive, volume ( 0db ), speed ( 0% ), pitch ( 0% ).

Voice Effects
Voice Settings

Voice Volume

Voice Speed

Voice Pitch

Audio Settings

Upload Background Music

File upload.

No voices here, Please add some

Delete Voice

Are you sure you want to delete this voice, full text view, export voice, trusted by 1000+ well-known brands, create audio files for your commercial use.

Voicemaker allows you to redistribute your generated audio files even after your subscription expires.

Audiobooks & Podcast

Youtube videos, e-learning material, sales & social media videos, public use and brodcasting, web & mobile application, call centers & ivr system, view plans >, share audio across multiple platforms.

The converted audio files can be shared on any platform worldwide.

Industry-leading features that help us grow fast

Every day, text characters are converted into voiceovers.

Registered users from over 120 countries worldwide.

Discover how voice-over transforms words into human-sounding voices.

Pro settings.

Voice Stability

Voice Similarity

WhisperSpeech / WhisperSpeech like 217 Follow WhisperSpeech 10

Whisperspeech.

An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch .

We want this model to be like Stable Diffusion but for speech – both powerful and easily customizable.

We are working only with properly licensed speech recordings and all the code is Open Source so the model will be always safe to use for commercial applications.

Currently the models are trained on the English LibreLight dataset. In the next release we want to target multiple languages (Whisper and EnCodec are both multilanguage).

Sample of the synthesized voice:

https://github.com/collabora/WhisperSpeech/assets/107984/aa5a1e7e-dc94-481f-8863-b022c7fd7434

Progress update [2024-01-29]

We successfully trained a tiny S2A model on an en+pl+fr dataset and it can do voice cloning in French:

https://github.com/collabora/WhisperSpeech/assets/107984/267f2602-7eec-4646-a43b-059ff91b574e

https://github.com/collabora/WhisperSpeech/assets/107984/fbf08e8e-0f9a-4b0d-ab5e-747ffba2ccb9

We were able to do this with frozen semantic tokens that were only trained on English and Polish. This supports the idea that we will be able to train a single semantic token model to support all the languages in the world. Quite likely even ones that are not currently well supported by the Whisper model. Stay tuned for more updates on this front. :)

Progress update [2024-01-18]

We spend the last week optimizing inference performance. We integrated torch.compile , added kv-caching and tuned some of the layers – we are now working over 12x faster than real-time on a consumer 4090!

We can mix languages in a single sentence (here the highlighted English project names are seamlessly mixed into Polish speech):

To jest pierwszy test wielojęzycznego Whisper Speech modelu zamieniającego tekst na mowę, który Collabora i Laion nauczyli na superkomputerze Jewels .

https://github.com/collabora/WhisperSpeech/assets/107984/d7092ef1-9df7-40e3-a07e-fdc7a090ae9e

We also added an easy way to test voice-cloning. Here is a sample voice cloned from a famous speech by Winston Churchill (the radio static is a feature, not a bug ;) – it is part of the reference recording):

https://github.com/collabora/WhisperSpeech/assets/107984/bd28110b-31fb-4d61-83f6-c997f560bc26

You can test all of these on Colab (we optimized the dependencies so now it takes less than 30 seconds to install). A Huggingface Space is coming soon.

Progress update [2024-01-10]

We’ve pushed a new SD S2A model that is a lot faster while still generating high-quality speech. We’ve also added an example of voice cloning based on a reference audio file.

As always, you can check out our Colab to try it yourself!

Progress update [2023-12-10]

Another trio of models, this time they support multiple languages (English and Polish). Here are two new samples for a sneak peek. You can check out our Colab to try it yourself!

English speech, female voice (transferred from a Polish language dataset):

A Polish sample, male voice:

https://github.com/collabora/WhisperSpeech/assets/107984/4da14b03-33f9-4e2d-be42-f0fcf1d4a6ec

Older progress updates are archived here

We encourage you to start with the Google Colab link above or run the provided notebook locally. If you want to download manually or train the models from scratch then both the WhisperSpeech pre-trained models as well as the converted datasets are available on HuggingFace.

Gather a bigger emotive speech dataset
Figure out a way to condition the generation on emotions and prosody
Create a community effort to gather freely licensed speech in multiple languages
Train final multi-language models

Architecture

The general architecture is similar to AudioLM , SPEAR TTS from Google and MusicGen from Meta. We avoided the NIH syndrome and built it on top of powerful Open Source models: Whisper from OpenAI to generate semantic tokens and perform transcription, EnCodec from Meta for acoustic modeling and Vocos from Charactr Inc as the high-quality vocoder.

We gave two presentation diving deeper into WhisperSpeech. The first one talks about the challenges of large scale training:

Tricks Learned from Scaling WhisperSpeech Models to 80k+ Hours of Speech - video recording by Jakub Cłapa, Collabora

The other one goes a bit more into the architectural choices we made:

Open Source Text-To-Speech Projects: WhisperSpeech - In Depth Discussion

Whisper for modeling semantic tokens

We utilize the OpenAI Whisper encoder block to generate embeddings which we then quantize to get semantic tokens.

If the language is already supported by Whisper then this process requires only audio files (without ground truth transcriptions).

EnCodec for modeling acoustic tokens

We use EnCodec to model the audio waveform. Out of the box it delivers reasonable quality at 1.5kbps and we can bring this to high-quality by using Vocos – a vocoder pretrained on EnCodec tokens.

Appreciation

This work would not be possible without the generous sponsorships from:

Collabora – code development and model training
LAION – community building and datasets (special thanks to
Jülich Supercomputing Centre - JUWELS Booster supercomputer

We gratefully acknowledge the Gauss Centre for Supercomputing e.V. ( www.gauss-centre.eu ) for funding part of this work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC), with access to compute provided via LAION cooperation on foundation models research.

We’d like to also thank individual contributors for their great help in building this model:

inevitable-2031 ( qwerty_qwer on Discord) for dataset curation

We rely on many amazing Open Source projects and research papers:

Spaces using WhisperSpeech/WhisperSpeech 17

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

An Open Source text-to-speech system built by inverting Whisper.

collabora/WhisperSpeech

Folders and files, repository files navigation, whisperspeech.

An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch .

We want this model to be like Stable Diffusion but for speech – both powerful and easily customizable.

We are working only with properly licensed speech recordings and all the code is Open Source so the model will be always safe to use for commercial applications.

Currently the models are trained on the English LibreLight dataset. In the next release we want to target multiple languages (Whisper and EnCodec are both multilanguage).

Sample of the synthesized voice:

Progress update [2024-01-29]

We successfully trained a tiny S2A model on an en+pl+fr dataset and it can do voice cloning in French:

Progress update [2024-01-18]

We can mix languages in a single sentence (here the highlighted English project names are seamlessly mixed into Polish speech):

To jest pierwszy test wielojęzycznego Whisper Speech modelu zamieniającego tekst na mowę, który Collabora i Laion nauczyli na superkomputerze Jewels .

You can test all of these on Colab (we optimized the dependencies so now it takes less than 30 seconds to install). A Huggingface Space is coming soon.

Progress update [2024-01-10]

We’ve pushed a new SD S2A model that is a lot faster while still generating high-quality speech. We’ve also added an example of voice cloning based on a reference audio file.

As always, you can check out our Colab to try it yourself!

Progress update [2023-12-10]

Another trio of models, this time they support multiple languages (English and Polish). Here are two new samples for a sneak peek. You can check out our Colab to try it yourself!

English speech, female voice (transferred from a Polish language dataset):

A Polish sample, male voice:

Older progress updates are archived here

Gather a bigger emotive speech dataset
Figure out a way to condition the generation on emotions and prosody
Create a community effort to gather freely licensed speech in multiple languages
Train final multi-language models

Architecture

We gave two presentation diving deeper into WhisperSpeech. The first one talks about the challenges of large scale training:

Tricks Learned from Scaling WhisperSpeech Models to 80k+ Hours of Speech - video recording by Jakub Cłapa, Collabora

The other one goes a bit more into the architectural choices we made:

Open Source Text-To-Speech Projects: WhisperSpeech - In Depth Discussion

Whisper for modeling semantic tokens

We utilize the OpenAI Whisper encoder block to generate embeddings which we then quantize to get semantic tokens.

If the language is already supported by Whisper then this process requires only audio files (without ground truth transcriptions).

Using Whisper for semantic token extraction diagram

EnCodec for modeling acoustic tokens

We use EnCodec to model the audio waveform. Out of the box it delivers reasonable quality at 1.5kbps and we can bring this to high-quality by using Vocos – a vocoder pretrained on EnCodec tokens.

Appreciation

This work would not be possible without the generous sponsorships from:

Collabora – code development and model training
LAION – community building and datasets (special thanks to
Jülich Supercomputing Centre - JUWELS Booster supercomputer

We’d like to also thank individual contributors for their great help in building this model:

inevitable-2031 ( qwerty_qwer on Discord) for dataset curation

We rely on many amazing Open Source projects and research papers:

Contributors 10

Jupyter Notebook 98.5%
Python 1.5%

Text to Speech

Generate speech from text. choose a voice to read your text aloud. you can use it to narrate your videos, create voice-overs, convert your documents into audio, and more..

Please sign up or login with your details

Generation Overview

AI Generator calls

AI Video Generator calls

AI Chat messages

Genius Mode messages

Genius Mode images

Genius Mode videos

AD-free experience

Private images

Includes 500 AI images, 1750 chat messages, 30 videos, 60 Genius Mode messages, 60 Genius Mode images, and 5 Genius Mode videos per month. If you go over any of these limits, there is a $5 charge for each group. Extra Genius Mode videos cost $1 each.
Includes 100 AI images and 300 chat messages. Exceeding either limit requires reloading credits from $5 to $1000, paying only for what you use. Genius Mode videos are $1 each.

Out of credits

Refill your membership to continue using DeepAI

Share your generations with friends

Use On4t to Create Amazing, Professional Voiceovers That Sound Like Real Human Voices

We GUARANTEE no one will tell your voiceover is A.I. generated with On4t’s text-to-voice tool

[Create an account to unlock more features and create longer voiceovers for free!]

Turn Text into Amazing Speech: 500+ Natural Sounding AI Voices in 140+ Languages!

500+ AI natural Sounding Voices
Supports 140+ Different Languages
Voices with Real Emotions
Advanced Editor

On4t Voices Has Emotions That Make Them Sound Super Natural

Want to make your voices laugh, happy cry, excited? Check out the emotions that you can add to your voices.

Try Out On4t’s Unique AI Text to Speech Voices That Sound Like Real Human

Want cool voices for your videos? ON4T’s text-to-speech tool has lots of AI voices just for you. It's like having a voice actor, but faster and cheaper. You can make your own special voice-overs easily. It's great for lots of videos and saves you time and money. With our tool, making fun and interesting videos is easy and quick!

Christopher

Get Instant, Perfect Voiceovers in 140+ Languages.

Just a few taps and your text becomes speech in any language you want. Fast, easy, and just for you!

AZERBAIJANI

Over 500+ Human- Sounding Voice Overs for Everyone to read aloud

Multilingual support in English Text To Speech and 140+ other languages

Generate Speech from a Document to Excellent Quality audio version

Add Background Music to Enhance Clarity and Attractiveness with online Text to Speech MP3

Customize the Speed, Pronunciation, and Pitch of the selected natural sounding Voice as Per Your Preference

Undetectable Standard Sounding Voice Overs for Various Situations

Get Multiple Audio Files Against a Single Input Text

Explore and Choose the Perfect Voice Type , Tone, pitch, & Speed

Make Your Text to voice tone More Cheerful, Unfriendly, Whispering, Sad, and Friendly

Entirely Web-based Application that Can Be Accessed without Installation

Merge Multiple Text to Audio Files in One Larger File for Easy Storing and Sharing

On4t’s TTS is best for:

We have happy customers who are using our voices to create tutorial videos, sales pitches, webinars, and much more.

Turning Text To Speech Using On4t Is Easier Than Scrolling Through Your Social Media Feed.

No more need to pay freelancers or voiceover artists anymore

Type or paste your text into the textbox.

Select the voices and style by previewing the voices.

Click Create Voiceover, and once it is ready, download and use it!

With ON4T’s Text to Speech, You Can Make

realistic AI voices and utilize them to generate custom voiceovers for marketers, product developers, authors, and podcasters using our cutting-edge online Text-to-speech service!

Marketers, say goodbye to the hassle and high costs of hiring voiceover artists. With ON4T's Text to Speech, creating the perfect voice for your marketing projects is super easy. Whether you're a small business owner, a digital marketing agency, or a freelance content creator, our platform lets you effortlessly turn your written content into high-quality, natural-sounding AI voices. Compare different voice outputs to find the one that best fits your brand and boosts your engagement.

Check out how our text-to-speech tool can make learning way cooler. You can turn tough and technical topics into easy-to-understand audio. This means you can change written stuff into fun, spoken lessons. It's great for students who want to learn while they're on the bus or just chilling in bed. It's all about making learning easy to grasp and more enjoyable.

Who doesn't love listening to great stories? Now, you can turn your books into audiobooks and reach more listeners. Our free online text-to-speech tool makes it super easy. Just pick a voice you like, copy and paste your story, and click to create the audio. It's a simple way to bring your stories to life and share them with more people!

Want to make your customer service even better? Use our awesome AI voice generator to answer common questions about your products and services. It's so good your customers might just think they're talking to a real person! This means they get quick, clear answers, making everyone's day a bit brighter. Just record your FAQs with our text-to-speech tool, and you're all set to provide top-notch support.

Need to show off your cool new products? Use our text-to-speech reader to make awesome voiceovers for your website, sales pages, tutorials, and demo videos. It's a great way to make your products shine and explain how they work. Plus, it's quick and easy, so you can spend more time getting great feedback on what you've created.

Making cool voiceovers for your podcasts is super quick and easy with our tool. You can make your podcasts sound just right and get more people to listen. Our voices are here to help you get more views and make everything simpler. So, say goodbye to hard stuff and hello to awesome podcast voiceovers!

Text-to-Speech Generation vs. Human VoiceOvers

Why use our text-to-speech reader? It's all about making things easier, cheaper, and faster for you. Instead of finding and paying experts, our tool lets you make cool voiceovers on your own. It's a quick, budget-friendly way to get great-sounding voiceovers without the fuss.

Waste time hiring perfect voiceover artists
Turnaround time of average 1 week
Learn editing skills to use voices
Spend money and time to record again
Super easy to use
Takes less than a minute for an output
Beginner friendly interface
Lifetime updates and new features

Why is Text-to-Speech Super Cool? (Awesome Features)

Text-to-speech is packed with special features that make it the best choice for fast and natural-sounding voiceovers. It's like a magic tool for making amazing AI voices!

500+ AI Voices

Dive into our huge collection of top-quality voices. Choose from male, female, and even kids' voices, all ready for you to use.

Adjust Tone, Pitch & Speed

With our tool, changing how the voice sounds is easy. Make it whisper, shout, or speak at the speed you like. It's all about making the voice just right for your project.

Various Accents

Make your voiceovers feel real with different English accents. American, UK, Canadian, Australian, Indian, South African, Irish, and British accents are all there for you.

140+ Languages

Our awesome Text-to-Speech tool supports loads of languages. From English to Japanese, German to Arabic, and many more, we've got you covered.

Set Voice Emotions

Need your AI voiceover to sound serious, soft, angry, or happy? Our tool makes your voice sound just like a real person, no matter the emotion.

Create Voice Overs Like a Pro

Highlight important words and control how your voiceover sounds. Add pauses and change sentence lengths to get the perfect effect.

Support for All Video Editing Software

The audio files you make with our tool work with any video and audio editing software. It's super easy to add them to your projects.

Optimize Your Efficiency

Save time and boost your productivity with our natural-sounding text reader. It can read your emails and documents out loud in a clear AI voice, so you can listen to your text anytime and stay on top of your game.

For Individuals with Visual Impairments

On4t's text-to-speech generator is a game-changer for people with visual impairments, dyslexia, or other disabilities. It turns written words into spoken ones, making things easier to understand. If reading is tough or if you have trouble seeing, our Text to Speech online software is here to help. It changes your reading material into audio so you can just listen to it. Super convenient and helpful!

Common Questions Asked Related to ON4T Text-to-Speech

Can i use this text-to-speech without installing any application, can we use ai voices for youtube monetized content, how many voices are available in this tts tool, can i access this ai text to voice converter on mac and android, what if i get stuck at some point while converting text to speech, what if i don't like the quality of the voices, what is a text-to-speech generator, how does on4t’s text to audio converter tool work, on4t text to speech pricing.

We offer a 30-day free trial and a no-questions-asked money-back guarantee. If you're not happy with our Text-to-Speech tool, you'll get a full refund within 24 hours. Terms and conditions apply.

$0 / Monthly

Maximum 5k Characters
Access to 140+ Languages
Unlimited Downloads
License for Commercial Use
HD File Exports
Multi-Emotional Settings

$11 / First Month

Regular Price $22

120k Characters

$49 / Monthly

500k Characters

$229 / Monthly

2M Characters

$220 / Year

Regular Price $264

120k Characters/Monthly

$490 / Year

Regular Price $588

550k Characters/Monthly

$2290 / Yearly

Regular Price $2748

2M Characters/Monthly

Enterprise Plan

We also have the best plan for organizations, which requires a custom plan that fits their needs. Contact us ( [email protected] ) If you want volume-based discounts, preference features access, dedicated support, custom data, and team management controls. We will provide you with the best enterprise plan, including:

Flexible Character Limits
Custom Voice Models
Access to Preference Features
TTS Dashboard Installation on Custom Domain as Per Your Choice

Discover ON4T Premium with Zero Cost: Try it Free Today

$19 / Monthly

Unlimited Voiceovers
500,000 Characters
500+ Voices
Unlimited Projects
12k Characters Per Clip
Merge Unlimited Audios
Commercial License

$39 / Quarterly

1,500,000 Characters

$49 / Yearly

Agency Plan

4,000.000 Characters

Voice Generator

Two Factor Authentication

Pronunciation Editor

Pause Settings

Voice Profile

Audio Setting

Upload Background Music

Delete Voice

Audiobooks & Podcast

Industry-leading features that help us grow fast

Discover how voice-over transforms words into human-sounding voices.

WhisperSpeech / WhisperSpeech like 217 Follow WhisperSpeech 10

Progress update [2024-01-29]

Progress update [2024-01-18]

Progress update [2024-01-10]

Progress update [2023-12-10]

Architecture

Whisper for modeling semantic tokens

EnCodec for modeling acoustic tokens

Appreciation

Spaces using WhisperSpeech/WhisperSpeech 17

Navigation Menu

Saved searches

collabora/WhisperSpeech

Progress update [2024-01-29]

Progress update [2024-01-18]

Progress update [2024-01-10]

Progress update [2023-12-10]

Architecture

Whisper for modeling semantic tokens

EnCodec for modeling acoustic tokens

Appreciation

Contributors 10

Text to Speech

Out of credits

Use On4t to Create Amazing, Professional Voiceovers That Sound Like Real Human Voices

Turn Text into Amazing Speech: 500+ Natural Sounding AI Voices in 140+ Languages!

On4t Voices Has Emotions That Make Them Sound Super Natural

Try Out On4t’s Unique AI Text to Speech Voices That Sound Like Real Human

Christopher

Get Instant, Perfect Voiceovers in 140+ Languages.

AZERBAIJANI

On4t’s TTS is best for:

Turning Text To Speech Using On4t Is Easier Than Scrolling Through Your Social Media Feed.

With ON4T’s Text to Speech, You Can Make

Text-to-Speech Generation vs. Human VoiceOvers

Why is Text-to-Speech Super Cool? (Awesome Features)

Adjust Tone, Pitch & Speed

Various Accents

Set Voice Emotions

Create Voice Overs Like a Pro

Support for All Video Editing Software

Optimize Your Efficiency

For Individuals with Visual Impairments

Common Questions Asked Related to ON4T Text-to-Speech

Enterprise Plan