7.3 KiB
Easy Inferencing with 🐸 TTS ⚡¶
You want to quicly synthesize speech using Coqui 🐸 TTS model?¶
💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡
🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .
In this notebook, we will:
1. List available pre-trained 🐸 TTS models
2. Run a 🐸 TTS model
3. Listen to the synthesized wave 📣
4. Run multispeaker 🐸 TTS model
So, let's jump right in!
Install 🐸 TTS ⬇️¶
! pip install -U pip ! pip install TTS
✅ List available pre-trained 🐸 TTS models¶
Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures.
You can either use your own model or the release models under 🐸TTS.
Use tts --list_models
to find out the availble models.
! tts --list_models
!tts --text "hello world" \ --model_name "tts_models/en/ljspeech/glow-tts" \ --out_path output.wav
📣 Listen to the synthesized wave 📣¶
import IPython IPython.display.Audio("output.wav")
Second things second:¶
🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech.
🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using --list_speaker_idx
flag and use this speaker voice to synthesize speech.
# list the possible speaker IDs. !tts --model_name "tts_models/en/vctk/vits" \ --list_speaker_idxs
💬 Synthesize speech using speaker ID 💬¶
!tts --text "Trying out specific speaker voice"\ --out_path spkr-out.wav --model_name "tts_models/en/vctk/vits" \ --speaker_idx "p341"
📣 Listen to the synthesized speaker specific wave 📣¶
import IPython IPython.display.Audio("spkr-out.wav")
🔶 If you want to use an external speaker to synthesize speech, you need to supply --speaker_wav
flag along with an external speaker encoder path and config file, as follows:
First we need to get the speaker encoder model, its config and a referece speaker_wav
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json !wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar !wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav
!tts --model_name tts_models/multilingual/multi-dataset/your_tts \ --encoder_path model_se.pth.tar \ --encoder_config config_se.json \ --speaker_wav LJ001-0001.wav \ --text "Are we not allowed to dim the lights so people can see that a bit better?"\ --out_path spkr-out.wav \ --language_idx "en"
📣 Listen to the synthesized speaker specific wave 📣¶
import IPython IPython.display.Audio("spkr-out.wav")
🎉 Congratulations! 🎉 You now know how to use a TTS model to synthesize speech!¶
Follow up with the next tutorials to learn more adnavced material.