docs: streamline readme and reuse content in other docs pages

[ci skip]
This commit is contained in:
Enno Hermann 2024-12-12 17:34:00 +01:00
parent ae2f8d2354
commit e38dcbea7a
7 changed files with 242 additions and 404 deletions

236
README.md
View File

@ -1,39 +1,34 @@
# 🐸Coqui TTS # <img src="https://raw.githubusercontent.com/idiap/coqui-ai-TTS/main/images/coqui-log-green-TTS.png" height="56"/>
## News
- 📣 Fork of the [original, unmaintained repository](https://github.com/coqui-ai/TTS). New PyPI package: [coqui-tts](https://pypi.org/project/coqui-tts)
- 📣 [OpenVoice](https://github.com/myshell-ai/OpenVoice) models now available for voice conversion.
- 📣 Prebuilt wheels are now also published for Mac and Windows (in addition to Linux as before) for easier installation across platforms.
- 📣 XTTSv2 is here with 17 languages and better performance across the board. XTTS can stream with <200ms latency.
- 📣 XTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech).
- 📣 You can use [Fairseq models in ~1100 languages](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
## <img src="https://raw.githubusercontent.com/idiap/coqui-ai-TTS/main/images/coqui-log-green-TTS.png" height="56"/>
**🐸TTS is a library for advanced Text-to-Speech generation.** **🐸 Coqui TTS is a library for advanced Text-to-Speech generation.**
🚀 Pretrained models in +1100 languages. 🚀 Pretrained models in +1100 languages.
🛠️ Tools for training new models and fine-tuning existing models in any language. 🛠️ Tools for training new models and fine-tuning existing models in any language.
📚 Utilities for dataset analysis and curation. 📚 Utilities for dataset analysis and curation.
______________________________________________________________________
[![Discord](https://img.shields.io/discord/1037326658807533628?color=%239B59B6&label=chat%20on%20discord)](https://discord.gg/5eXr5seRrv) [![Discord](https://img.shields.io/discord/1037326658807533628?color=%239B59B6&label=chat%20on%20discord)](https://discord.gg/5eXr5seRrv)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/coqui-tts) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/coqui-tts)](https://pypi.org/project/coqui-tts/)
[![License](<https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg>)](https://opensource.org/licenses/MPL-2.0) [![License](<https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg>)](https://opensource.org/licenses/MPL-2.0)
[![PyPI version](https://badge.fury.io/py/coqui-tts.svg)](https://badge.fury.io/py/coqui-tts) [![PyPI version](https://badge.fury.io/py/coqui-tts.svg)](https://pypi.org/project/coqui-tts/)
[![Downloads](https://pepy.tech/badge/coqui-tts)](https://pepy.tech/project/coqui-tts) [![Downloads](https://pepy.tech/badge/coqui-tts)](https://pepy.tech/project/coqui-tts)
[![DOI](https://zenodo.org/badge/265612440.svg)](https://zenodo.org/badge/latestdoi/265612440) [![DOI](https://zenodo.org/badge/265612440.svg)](https://zenodo.org/badge/latestdoi/265612440)
[![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml)
![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml/badge.svg) [![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml)
![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml/badge.svg) [![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml)
![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml/badge.svg)
[![Docs](<https://readthedocs.org/projects/coqui-tts/badge/?version=latest&style=plastic>)](https://coqui-tts.readthedocs.io/en/latest/) [![Docs](<https://readthedocs.org/projects/coqui-tts/badge/?version=latest&style=plastic>)](https://coqui-tts.readthedocs.io/en/latest/)
</div> </div>
______________________________________________________________________ ## 📣 News
- **Fork of the [original, unmaintained repository](https://github.com/coqui-ai/TTS). New PyPI package: [coqui-tts](https://pypi.org/project/coqui-tts)**
- 0.25.0: [OpenVoice](https://github.com/myshell-ai/OpenVoice) models now available for voice conversion.
- 0.24.2: Prebuilt wheels are now also published for Mac and Windows (in addition to Linux as before) for easier installation across platforms.
- 0.20.0: XTTSv2 is here with 17 languages and better performance across the board. XTTS can stream with <200ms latency.
- 0.19.0: XTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech).
- 0.14.1: You can use [Fairseq models in ~1100 languages](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
## 💬 Where to ask questions ## 💬 Where to ask questions
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it. Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.
@ -117,8 +112,10 @@ repository are also still a useful source of information.
You can also help us implement more models. You can also help us implement more models.
<!-- start installation -->
## Installation ## Installation
🐸TTS is tested on Ubuntu 24.04 with **python >= 3.9, < 3.13.**, but should also
🐸TTS is tested on Ubuntu 24.04 with **python >= 3.9, < 3.13**, but should also
work on Mac and Windows. work on Mac and Windows.
If you are only interested in [synthesizing speech](https://coqui-tts.readthedocs.io/en/latest/inference.html) with the pretrained 🐸TTS models, installing from PyPI is the easiest option. If you are only interested in [synthesizing speech](https://coqui-tts.readthedocs.io/en/latest/inference.html) with the pretrained 🐸TTS models, installing from PyPI is the easiest option.
@ -159,13 +156,15 @@ pip install -e .[server,ja]
### Platforms ### Platforms
If you are on Ubuntu (Debian), you can also run following commands for installation. If you are on Ubuntu (Debian), you can also run the following commands for installation.
```bash ```bash
make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS. make system-deps
make install make install
``` ```
<!-- end installation -->
## Docker Image ## Docker Image
You can also try out Coqui TTS without installation with the docker image. You can also try out Coqui TTS without installation with the docker image.
Simply run the following command and you will be able to run TTS: Simply run the following command and you will be able to run TTS:
@ -182,10 +181,10 @@ More details about the docker images (like GPU support) can be found
## Synthesizing speech by 🐸TTS ## Synthesizing speech by 🐸TTS
<!-- start inference -->
### 🐍 Python API ### 🐍 Python API
#### Running a multi-speaker and multi-lingual model #### Multi-speaker and multi-lingual model
```python ```python
import torch import torch
@ -197,47 +196,60 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
# List available 🐸TTS models # List available 🐸TTS models
print(TTS().list_models()) print(TTS().list_models())
# Init TTS # Initialize TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device) tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
# List speakers
print(tts.speakers)
# Run TTS # Run TTS
# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language # ❗ XTTS supports both, but many models allow only one of the `speaker` and
# Text to speech list of amplitude values as output # `speaker_wav` arguments
wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
# Text to speech to a file # TTS with list of amplitude values as output, clone the voice from `speaker_wav`
tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav") wav = tts.tts(
text="Hello world!",
speaker_wav="my/cloning/audio.wav",
language="en"
)
# TTS to a file, use a preset speaker
tts.tts_to_file(
text="Hello world!",
speaker="Craig Gutsy",
language="en",
file_path="output.wav"
)
``` ```
#### Running a single speaker model #### Single speaker model
```python ```python
# Init TTS with the target model name # Initialize TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to(device) tts = TTS("tts_models/de/thorsten/tacotron2-DDC").to(device)
# Run TTS # Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH) tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)
# Example voice cloning with YourTTS in English, French and Portuguese
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to(device)
tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr-fr", file_path="output.wav")
tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt-br", file_path="output.wav")
``` ```
#### Example voice conversion #### Voice conversion (VC)
Converting the voice in `source_wav` to the voice of `target_wav` Converting the voice in `source_wav` to the voice of `target_wav`
```python ```python
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda") tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda")
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav") tts.voice_conversion_to_file(
source_wav="my/source.wav",
target_wav="my/target.wav",
file_path="output.wav"
)
``` ```
Other available voice conversion models: Other available voice conversion models:
- `voice_conversion_models/multilingual/multi-dataset/openvoice_v1` - `voice_conversion_models/multilingual/multi-dataset/openvoice_v1`
- `voice_conversion_models/multilingual/multi-dataset/openvoice_v2` - `voice_conversion_models/multilingual/multi-dataset/openvoice_v2`
#### Example voice cloning together with the default voice conversion model. #### Voice cloning by combining single speaker TTS model with the default VC model
This way, you can clone voices by using any model in 🐸TTS. The FreeVC model is This way, you can clone voices by using any model in 🐸TTS. The FreeVC model is
used for voice conversion after synthesizing speech. used for voice conversion after synthesizing speech.
@ -252,7 +264,7 @@ tts.tts_with_vc_to_file(
) )
``` ```
#### Example text to speech using **Fairseq models in ~1100 languages** 🤯. #### TTS using Fairseq models in ~1100 languages 🤯
For Fairseq models, use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`. For Fairseq models, use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.
You can find the language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) You can find the language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms). and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
@ -266,7 +278,7 @@ api.tts_to_file(
) )
``` ```
### Command-line `tts` ### Command-line interface `tts`
<!-- begin-tts-readme --> <!-- begin-tts-readme -->
@ -274,120 +286,118 @@ Synthesize speech on the command line.
You can either use your trained model or choose a model from the provided list. You can either use your trained model or choose a model from the provided list.
If you don't specify any models, then it uses a Tacotron2 English model trained
on LJSpeech.
#### Single Speaker Models
- List provided models: - List provided models:
``` ```sh
$ tts --list_models tts --list_models
``` ```
- Get model info (for both tts_models and vocoder_models): - Get model information. Use the names obtained from `--list_models`.
```sh
- Query by type/name: tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
The model_info_by_name uses the name as it from the --list_models.
```
$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
```
For example:
```
$ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
$ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
```
- Query by type/idx:
The model_query_idx uses the corresponding idx from --list_models.
```
$ tts --model_info_by_idx "<model_type>/<model_query_idx>"
```
For example:
```
$ tts --model_info_by_idx tts_models/3
```
- Query info for model info by full name:
```
$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
```
- Run TTS with default models:
``` ```
$ tts --text "Text for TTS" --out_path output/path/speech.wav For example:
```sh
tts --model_info_by_name tts_models/tr/common-voice/glow-tts
tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
```
#### Single speaker models
- Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`):
```sh
tts --text "Text for TTS" --out_path output/path/speech.wav
``` ```
- Run TTS and pipe out the generated TTS wav file data: - Run TTS and pipe out the generated TTS wav file data:
``` ```sh
$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
``` ```
- Run a TTS model with its default vocoder model: - Run a TTS model with its default vocoder model:
``` ```sh
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav tts --text "Text for TTS" \
--model_name "<model_type>/<language>/<dataset>/<model_name>" \
--out_path output/path/speech.wav
``` ```
For example: For example:
``` ```sh
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav tts --text "Text for TTS" \
--model_name "tts_models/en/ljspeech/glow-tts" \
--out_path output/path/speech.wav
``` ```
- Run with specific TTS and vocoder models from the list: - Run with specific TTS and vocoder models from the list. Note that not every vocoder is compatible with every TTS model.
``` ```sh
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav tts --text "Text for TTS" \
--model_name "<model_type>/<language>/<dataset>/<model_name>" \
--vocoder_name "<model_type>/<language>/<dataset>/<model_name>" \
--out_path output/path/speech.wav
``` ```
For example: For example:
``` ```sh
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav tts --text "Text for TTS" \
--model_name "tts_models/en/ljspeech/glow-tts" \
--vocoder_name "vocoder_models/en/ljspeech/univnet" \
--out_path output/path/speech.wav
``` ```
- Run your own TTS model (Using Griffin-Lim Vocoder): - Run your own TTS model (using Griffin-Lim Vocoder):
``` ```sh
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav tts --text "Text for TTS" \
--model_path path/to/model.pth \
--config_path path/to/config.json \
--out_path output/path/speech.wav
``` ```
- Run your own TTS and Vocoder models: - Run your own TTS and Vocoder models:
``` ```sh
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav tts --text "Text for TTS" \
--vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json --model_path path/to/model.pth \
--config_path path/to/config.json \
--out_path output/path/speech.wav \
--vocoder_path path/to/vocoder.pth \
--vocoder_config_path path/to/vocoder_config.json
``` ```
#### Multi-speaker Models #### Multi-speaker models
- List the available speakers and choose a <speaker_id> among them: - List the available speakers and choose a `<speaker_id>` among them:
``` ```sh
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
``` ```
- Run the multi-speaker TTS model with the target speaker ID: - Run the multi-speaker TTS model with the target speaker ID:
``` ```sh
$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id> tts --text "Text for TTS." --out_path output/path/speech.wav \
--model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
``` ```
- Run your own multi-speaker TTS model: - Run your own multi-speaker TTS model:
``` ```sh
$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id> tts --text "Text for TTS" --out_path output/path/speech.wav \
--model_path path/to/model.pth --config_path path/to/config.json \
--speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
``` ```
### Voice Conversion Models #### Voice conversion models
``` ```sh
$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav> tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" \
--source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
``` ```
<!-- end-tts-readme --> <!-- end-tts-readme -->

View File

@ -14,123 +14,122 @@ from TTS.utils.generic_utils import ConsoleFormatter, setup_logger
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
description = """ description = """
Synthesize speech on command line. Synthesize speech on the command line.
You can either use your trained model or choose a model from the provided list. You can either use your trained model or choose a model from the provided list.
If you don't specify any models, then it uses LJSpeech based English model. - List provided models:
```sh
tts --list_models
```
- Get model information. Use the names obtained from `--list_models`.
```sh
tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
```
For example:
```sh
tts --model_info_by_name tts_models/tr/common-voice/glow-tts
tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
```
#### Single Speaker Models #### Single Speaker Models
- List provided models: - Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`):
``` ```sh
$ tts --list_models tts --text "Text for TTS" --out_path output/path/speech.wav
```
- Get model info (for both tts_models and vocoder_models):
- Query by type/name:
The model_info_by_name uses the name as it from the --list_models.
```
$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
```
For example:
```
$ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
$ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
```
- Query by type/idx:
The model_query_idx uses the corresponding idx from --list_models.
```
$ tts --model_info_by_idx "<model_type>/<model_query_idx>"
```
For example:
```
$ tts --model_info_by_idx tts_models/3
```
- Query info for model info by full name:
```
$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
```
- Run TTS with default models:
```
$ tts --text "Text for TTS" --out_path output/path/speech.wav
``` ```
- Run TTS and pipe out the generated TTS wav file data: - Run TTS and pipe out the generated TTS wav file data:
``` ```sh
$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
``` ```
- Run a TTS model with its default vocoder model: - Run a TTS model with its default vocoder model:
``` ```sh
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav tts --text "Text for TTS" \\
--model_name "<model_type>/<language>/<dataset>/<model_name>" \\
--out_path output/path/speech.wav
``` ```
For example: For example:
``` ```sh
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav tts --text "Text for TTS" \\
--model_name "tts_models/en/ljspeech/glow-tts" \\
--out_path output/path/speech.wav
``` ```
- Run with specific TTS and vocoder models from the list: - Run with specific TTS and vocoder models from the list. Note that not every vocoder is compatible with every TTS model.
``` ```sh
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav tts --text "Text for TTS" \\
--model_name "<model_type>/<language>/<dataset>/<model_name>" \\
--vocoder_name "<model_type>/<language>/<dataset>/<model_name>" \\
--out_path output/path/speech.wav
``` ```
For example: For example:
``` ```sh
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav tts --text "Text for TTS" \\
--model_name "tts_models/en/ljspeech/glow-tts" \\
--vocoder_name "vocoder_models/en/ljspeech/univnet" \\
--out_path output/path/speech.wav
``` ```
- Run your own TTS model (Using Griffin-Lim Vocoder): - Run your own TTS model (using Griffin-Lim Vocoder):
``` ```sh
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav tts --text "Text for TTS" \\
--model_path path/to/model.pth \\
--config_path path/to/config.json \\
--out_path output/path/speech.wav
``` ```
- Run your own TTS and Vocoder models: - Run your own TTS and Vocoder models:
``` ```sh
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav tts --text "Text for TTS" \\
--vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json --model_path path/to/model.pth \\
--config_path path/to/config.json \\
--out_path output/path/speech.wav \\
--vocoder_path path/to/vocoder.pth \\
--vocoder_config_path path/to/vocoder_config.json
``` ```
#### Multi-speaker Models #### Multi-speaker Models
- List the available speakers and choose a <speaker_id> among them: - List the available speakers and choose a `<speaker_id>` among them:
``` ```sh
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
``` ```
- Run the multi-speaker TTS model with the target speaker ID: - Run the multi-speaker TTS model with the target speaker ID:
``` ```sh
$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id> tts --text "Text for TTS." --out_path output/path/speech.wav \\
--model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
``` ```
- Run your own multi-speaker TTS model: - Run your own multi-speaker TTS model:
``` ```sh
$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id> tts --text "Text for TTS" --out_path output/path/speech.wav \\
--model_path path/to/model.pth --config_path path/to/config.json \\
--speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
``` ```
### Voice Conversion Models #### Voice Conversion Models
``` ```sh
$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav> tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" \\
--source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
``` ```
""" """

View File

@ -1,8 +1,11 @@
---
hide-toc: true
---
```{include} ../../README.md ```{include} ../../README.md
:relative-images: :relative-images:
:end-before: <!-- start installation -->
``` ```
----
```{toctree} ```{toctree}
:maxdepth: 1 :maxdepth: 1

View File

@ -1,199 +1,21 @@
(synthesizing_speech)= (synthesizing_speech)=
# Synthesizing speech # Synthesizing speech
First, you need to install TTS. We recommend using PyPi. You need to call the command below: ## Overview
```bash Coqui TTS provides three main methods for inference:
$ pip install coqui-tts
1. 🐍Python API
2. TTS command line interface (CLI)
3. [Local demo server](server.md)
```{include} ../../README.md
:start-after: <!-- start inference -->
``` ```
After the installation, 2 terminal commands are available.
1. TTS Command Line Interface (CLI). - `tts`
2. Local Demo Server. - `tts-server`
3. In 🐍Python. - `from TTS.api import TTS`
## On the Commandline - `tts`
![cli.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/tts_cli.gif)
After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS.
Listing released 🐸TTS models.
```bash
tts --list_models
```
Run a TTS model, from the release models list, with its default vocoder. (Simply copy and paste the full model names from the list as arguments for the command below.)
```bash
tts --text "Text for TTS" \
--model_name "<type>/<language>/<dataset>/<model_name>" \
--out_path folder/to/save/output.wav
```
Run a tts and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.
```bash
tts --text "Text for TTS" \
--model_name "tts_models/<language>/<dataset>/<model_name>" \
--vocoder_name "vocoder_models/<language>/<dataset>/<model_name>" \
--out_path folder/to/save/output.wav
```
Run your own TTS model (Using Griffin-Lim Vocoder)
```bash
tts --text "Text for TTS" \
--model_path path/to/model.pth \
--config_path path/to/config.json \
--out_path folder/to/save/output.wav
```
Run your own TTS and Vocoder models
```bash
tts --text "Text for TTS" \
--config_path path/to/config.json \
--model_path path/to/model.pth \
--out_path folder/to/save/output.wav \
--vocoder_path path/to/vocoder.pth \
--vocoder_config_path path/to/vocoder_config.json
```
Run a multi-speaker TTS model from the released models list.
```bash
tts --model_name "tts_models/<language>/<dataset>/<model_name>" --list_speaker_idxs # list the possible speaker IDs.
tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "tts_models/<language>/<dataset>/<model_name>" --speaker_idx "<speaker_id>"
```
Run a released voice conversion model
```bash
tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
--source_wav "my/source/speaker/audio.wav"
--target_wav "my/target/speaker/audio.wav"
--out_path folder/to/save/output.wav
```
**Note:** You can use ```./TTS/bin/synthesize.py``` if you prefer running ```tts``` from the TTS project folder.
## On the Demo Server - `tts-server`
<!-- <img src="https://raw.githubusercontent.com/idiap/coqui-ai-TTS/main/images/demo_server.gif" height="56"/> -->
![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)
You can boot up a demo 🐸TTS server to run an inference with your models (make
sure to install the additional dependencies with `pip install coqui-tts[server]`).
Note that the server is not optimized for performance and does not support all
Coqui models yet.
The demo server provides pretty much the same interface as the CLI command.
```bash
tts-server -h # see the help
tts-server --list_models # list the available models.
```
Run a TTS model, from the release models list, with its default vocoder.
If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize
speech.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>"
```
Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>" \
--vocoder_name "<type>/<language>/<dataset>/<model_name>"
```
## Python 🐸TTS API
You can run a multi-speaker and multi-lingual model in Python as
```python
import torch
from TTS.api import TTS
# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"
# List available 🐸TTS models
print(TTS().list_models())
# Init TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
# Run TTS
# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
# Text to speech list of amplitude values as output
wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
```
### Single speaker model.
```python
# Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False)
# Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)
```
### Voice cloning with YourTTS in English, French and Portuguese:
```python
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to("cuda")
tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="output.wav")
tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="output.wav")
```
### Voice conversion from the speaker of `source_wav` to the speaker of `target_wav`
```python
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")
```
### Voice cloning by combining single speaker TTS model with the voice conversion model.
This way, you can clone voices by using any model in 🐸TTS.
```python
tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
tts.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="ouptut.wav"
)
```
### Text to speech using **Fairseq models in ~1100 languages** 🤯.
For these models use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.
You can find the list of language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
```python
from TTS.api import TTS
api = TTS(model_name="tts_models/eng/fairseq/vits").to("cuda")
api.tts_to_file("This is a test.", file_path="output.wav")
# TTS with on the fly voice conversion
api = TTS("tts_models/deu/fairseq/vits")
api.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="ouptut.wav"
)
```
```{toctree} ```{toctree}
:hidden: :hidden:
server
marytts marytts
``` ```

View File

@ -1,36 +1,6 @@
# Installation # Installation
🐸TTS supports python >=3.9 <3.13.0 and was tested on Ubuntu 24.04, but should ```{include} ../../README.md
also run on Mac and Windows. :start-after: <!-- start installation -->
:end-before: <!-- end installation -->
## Using `pip`
`pip` is recommended if you want to use 🐸TTS only for inference.
You can install from PyPI as follows:
```bash
pip install coqui-tts # from PyPI
```
Or install from Github:
```bash
pip install git+https://github.com/idiap/coqui-ai-TTS # from Github
```
## Installing From Source
This is recommended for development and more control over 🐸TTS.
```bash
git clone https://github.com/idiap/coqui-ai-TTS
cd coqui-ai-TTS
make system-deps # only on Linux systems.
# Install package and optional extras
make install
# Same as above + dev dependencies and pre-commit
make install_dev
``` ```

30
docs/source/server.md Normal file
View File

@ -0,0 +1,30 @@
# Demo server
![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)
You can boot up a demo 🐸TTS server to run an inference with your models (make
sure to install the additional dependencies with `pip install coqui-tts[server]`).
Note that the server is not optimized for performance and does not support all
Coqui models yet.
The demo server provides pretty much the same interface as the CLI command.
```bash
tts-server -h # see the help
tts-server --list_models # list the available models.
```
Run a TTS model, from the release models list, with its default vocoder.
If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize
speech.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>"
```
Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>" \
--vocoder_name "<type>/<language>/<dataset>/<model_name>"
```

View File

@ -22,8 +22,12 @@ def sync_readme():
new_content = replace_between_markers(orig_content, "tts-readme", description.strip()) new_content = replace_between_markers(orig_content, "tts-readme", description.strip())
if args.check: if args.check:
if orig_content != new_content: if orig_content != new_content:
print("README.md is out of sync; please edit TTS/bin/TTS_README.md and run scripts/sync_readme.py") print(
"README.md is out of sync; please reconcile README.md and TTS/bin/synthesize.py and run scripts/sync_readme.py"
)
exit(42) exit(42)
print("All good, files in sync")
exit(0)
readme_path.write_text(new_content) readme_path.write_text(new_content)
print("Updated README.md") print("Updated README.md")