mirror of https://github.com/coqui-ai/TTS.git
docs: use nested contents for easier overview
This commit is contained in:
parent
e23766d501
commit
ae2f8d2354
|
@ -1,7 +1,9 @@
|
||||||
(formatting_your_dataset)=
|
(formatting_your_dataset)=
|
||||||
# Formatting your dataset
|
# Formatting your dataset
|
||||||
|
|
||||||
For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription.
|
For training a TTS model, you need a dataset with speech recordings and
|
||||||
|
transcriptions. The speech must be divided into audio clips and each clip needs
|
||||||
|
a transcription.
|
||||||
|
|
||||||
If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.
|
If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.
|
||||||
|
|
|
@ -0,0 +1,12 @@
|
||||||
|
# Datasets
|
||||||
|
|
||||||
|
For training a TTS model, you need a dataset with speech recordings and
|
||||||
|
transcriptions. See the following pages for more information on:
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
formatting_your_dataset
|
||||||
|
what_makes_a_good_dataset
|
||||||
|
tts_datasets
|
||||||
|
```
|
|
@ -1,6 +1,6 @@
|
||||||
# TTS datasets
|
# Public TTS datasets
|
||||||
|
|
||||||
Some of the known public datasets that we successfully applied 🐸TTS:
|
Some of the known public datasets that were successfully used for 🐸TTS:
|
||||||
|
|
||||||
- [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
|
- [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
|
||||||
- [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)
|
- [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)
|
|
@ -36,7 +36,8 @@
|
||||||
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
|
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
|
||||||
an infinite flexibility to add custom behaviours for your model and training routines.
|
an infinite flexibility to add custom behaviours for your model and training routines.
|
||||||
|
|
||||||
For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
|
For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model)
|
||||||
|
and `TTS.utils.callbacks`.
|
||||||
|
|
||||||
6. Optionally, define `MyModelArgs`.
|
6. Optionally, define `MyModelArgs`.
|
||||||
|
|
|
@ -0,0 +1,14 @@
|
||||||
|
# Adding models or languages
|
||||||
|
|
||||||
|
You can extend Coqui by implementing new model architectures or adding front
|
||||||
|
ends for new languages. See the pages below for more details. The [project
|
||||||
|
structure](../project_structure.md) and [contribution
|
||||||
|
guidelines](../contributing.md) may also be helpful. Please open a pull request
|
||||||
|
with your changes to share back the improvements with the community.
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
implementing_a_new_model
|
||||||
|
implementing_a_new_language_frontend
|
||||||
|
```
|
|
@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
|
||||||
- If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.
|
- If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.
|
||||||
|
|
||||||
## What are the requirements of a good 🐸TTS dataset?
|
## What are the requirements of a good 🐸TTS dataset?
|
||||||
- [See this page](what_makes_a_good_dataset.md)
|
- [See this page](datasets/what_makes_a_good_dataset.md)
|
||||||
|
|
||||||
## How should I choose the right model?
|
## How should I choose the right model?
|
||||||
- First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
|
- First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
|
||||||
|
@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
|
||||||
## How can I train my own `tts` model?
|
## How can I train my own `tts` model?
|
||||||
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
|
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
|
||||||
|
|
||||||
1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech.
|
1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
|
||||||
A `formatter` parses the metadata file and converts a list of training samples.
|
A `formatter` parses the metadata file and converts a list of training samples.
|
||||||
|
|
||||||
2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.
|
2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.
|
||||||
|
|
|
@ -4,10 +4,10 @@
|
||||||
```
|
```
|
||||||
----
|
----
|
||||||
|
|
||||||
# Documentation Content
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:caption: Get started
|
:caption: Get started
|
||||||
|
:hidden:
|
||||||
|
|
||||||
tutorial_for_nervous_beginners
|
tutorial_for_nervous_beginners
|
||||||
installation
|
installation
|
||||||
|
@ -20,22 +20,19 @@ contributing
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:caption: Using Coqui
|
:caption: Using Coqui
|
||||||
|
:hidden:
|
||||||
|
|
||||||
inference
|
inference
|
||||||
training_a_model
|
training/index
|
||||||
finetuning
|
extension/index
|
||||||
implementing_a_new_model
|
datasets/index
|
||||||
implementing_a_new_language_frontend
|
|
||||||
formatting_your_dataset
|
|
||||||
what_makes_a_good_dataset
|
|
||||||
tts_datasets
|
|
||||||
marytts
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:caption: Main Classes
|
:caption: Main Classes
|
||||||
|
:hidden:
|
||||||
|
|
||||||
configuration
|
configuration
|
||||||
main_classes/trainer_api
|
main_classes/trainer_api
|
||||||
|
@ -50,6 +47,7 @@ main_classes/speaker_manager
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:caption: TTS Models
|
:caption: TTS Models
|
||||||
|
:hidden:
|
||||||
|
|
||||||
models/glow_tts.md
|
models/glow_tts.md
|
||||||
models/vits.md
|
models/vits.md
|
||||||
|
|
|
@ -86,8 +86,8 @@ tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
|
||||||
|
|
||||||
You can boot up a demo 🐸TTS server to run an inference with your models (make
|
You can boot up a demo 🐸TTS server to run an inference with your models (make
|
||||||
sure to install the additional dependencies with `pip install coqui-tts[server]`).
|
sure to install the additional dependencies with `pip install coqui-tts[server]`).
|
||||||
Note that the server is not optimized for performance but gives you an easy way
|
Note that the server is not optimized for performance and does not support all
|
||||||
to interact with the models.
|
Coqui models yet.
|
||||||
|
|
||||||
The demo server provides pretty much the same interface as the CLI command.
|
The demo server provides pretty much the same interface as the CLI command.
|
||||||
|
|
||||||
|
@ -192,3 +192,8 @@ api.tts_with_vc_to_file(
|
||||||
file_path="ouptut.wav"
|
file_path="ouptut.wav"
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:hidden:
|
||||||
|
marytts
|
||||||
|
```
|
||||||
|
|
|
@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
||||||
speech dataset and achieve reasonable results with only a couple of hours of data.
|
speech dataset and achieve reasonable results with only a couple of hours of data.
|
||||||
|
|
||||||
However, note that, fine-tuning does not ensure great results. The model
|
However, note that, fine-tuning does not ensure great results. The model
|
||||||
performance still depends on the [dataset quality](what_makes_a_good_dataset.md)
|
performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md)
|
||||||
and the hyper-parameters you choose for fine-tuning. Therefore,
|
and the hyper-parameters you choose for fine-tuning. Therefore,
|
||||||
it still takes a bit of tinkering.
|
it still takes a bit of tinkering.
|
||||||
|
|
||||||
|
@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
||||||
1. Setup your dataset.
|
1. Setup your dataset.
|
||||||
|
|
||||||
You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
|
You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
|
||||||
training. Please see [this page](formatting_your_dataset.md) for more information about formatting.
|
training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting.
|
||||||
|
|
||||||
2. Choose the model you want to fine-tune.
|
2. Choose the model you want to fine-tune.
|
||||||
|
|
||||||
|
@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
||||||
You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
|
You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
|
||||||
One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
|
One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
|
||||||
simple testing, you can use the `tts` command on the terminal. For more info
|
simple testing, you can use the `tts` command on the terminal. For more info
|
||||||
see [here](inference.md).
|
see [here](../inference.md).
|
||||||
|
|
||||||
3. Download the model.
|
3. Download the model.
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
# Training and fine-tuning
|
||||||
|
|
||||||
|
The following pages show you how to train and fine-tune Coqui models:
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
training_a_model
|
||||||
|
finetuning
|
||||||
|
```
|
|
@ -11,11 +11,10 @@
|
||||||
|
|
||||||
3. Check the recipes.
|
3. Check the recipes.
|
||||||
|
|
||||||
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for
|
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point.
|
||||||
`Nervous Beginners`.
|
|
||||||
A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.
|
A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.
|
||||||
|
|
||||||
```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py
|
```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py
|
||||||
```
|
```
|
||||||
|
|
||||||
You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
|
You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
|
||||||
|
@ -113,7 +112,7 @@
|
||||||
|
|
||||||
Note that different models have different metrics, visuals and outputs.
|
Note that different models have different metrics, visuals and outputs.
|
||||||
|
|
||||||
You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions
|
You should also check the [FAQ page](../faq.md) for common problems and solutions
|
||||||
that occur in a training.
|
that occur in a training.
|
||||||
|
|
||||||
7. Use your best model for inference.
|
7. Use your best model for inference.
|
||||||
|
@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th
|
||||||
|
|
||||||
The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.
|
The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.
|
||||||
|
|
||||||
```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py
|
```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py
|
||||||
```
|
```
|
|
@ -24,10 +24,14 @@ $ tts-server --list_models # list the available models.
|
||||||
```
|
```
|
||||||

|

|
||||||
|
|
||||||
|
See [this page](inference.md) for more details on synthesizing speech with the
|
||||||
|
CLI, server or Python API.
|
||||||
|
|
||||||
## Training a `tts` Model
|
## Training a `tts` Model
|
||||||
|
|
||||||
A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details.
|
A breakdown of a simple script that trains a GlowTTS model on the LJspeech
|
||||||
|
dataset. For a more in-depth guide to training and fine-tuning also see [this
|
||||||
|
page](training/index.md).
|
||||||
|
|
||||||
### Pure Python Way
|
### Pure Python Way
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue