docs: use nested contents for easier overview

This commit is contained in:
Enno Hermann 2024-12-12 15:52:55 +01:00
parent e23766d501
commit ae2f8d2354
14 changed files with 71 additions and 26 deletions

View File

@ -1,7 +1,9 @@
(formatting_your_dataset)=
# Formatting your dataset
For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription.
For training a TTS model, you need a dataset with speech recordings and
transcriptions. The speech must be divided into audio clips and each clip needs
a transcription.
If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.

View File

@ -0,0 +1,12 @@
# Datasets
For training a TTS model, you need a dataset with speech recordings and
transcriptions. See the following pages for more information on:
```{toctree}
:maxdepth: 1
formatting_your_dataset
what_makes_a_good_dataset
tts_datasets
```

View File

@ -1,6 +1,6 @@
# TTS datasets
# Public TTS datasets
Some of the known public datasets that we successfully applied 🐸TTS:
Some of the known public datasets that were successfully used for 🐸TTS:
- [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
- [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)

View File

@ -36,7 +36,8 @@
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
an infinite flexibility to add custom behaviours for your model and training routines.
For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model)
and `TTS.utils.callbacks`.
6. Optionally, define `MyModelArgs`.

View File

@ -0,0 +1,14 @@
# Adding models or languages
You can extend Coqui by implementing new model architectures or adding front
ends for new languages. See the pages below for more details. The [project
structure](../project_structure.md) and [contribution
guidelines](../contributing.md) may also be helpful. Please open a pull request
with your changes to share back the improvements with the community.
```{toctree}
:maxdepth: 1
implementing_a_new_model
implementing_a_new_language_frontend
```

View File

@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
- If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.
## What are the requirements of a good 🐸TTS dataset?
- [See this page](what_makes_a_good_dataset.md)
- [See this page](datasets/what_makes_a_good_dataset.md)
## How should I choose the right model?
- First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
## How can I train my own `tts` model?
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech.
1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
A `formatter` parses the metadata file and converts a list of training samples.
2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.

View File

@ -4,10 +4,10 @@
```
----
# Documentation Content
```{toctree}
:maxdepth: 1
:caption: Get started
:hidden:
tutorial_for_nervous_beginners
installation
@ -20,22 +20,19 @@ contributing
```{toctree}
:maxdepth: 1
:caption: Using Coqui
:hidden:
inference
training_a_model
finetuning
implementing_a_new_model
implementing_a_new_language_frontend
formatting_your_dataset
what_makes_a_good_dataset
tts_datasets
marytts
training/index
extension/index
datasets/index
```
```{toctree}
:maxdepth: 1
:caption: Main Classes
:hidden:
configuration
main_classes/trainer_api
@ -50,6 +47,7 @@ main_classes/speaker_manager
```{toctree}
:maxdepth: 1
:caption: TTS Models
:hidden:
models/glow_tts.md
models/vits.md

View File

@ -86,8 +86,8 @@ tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
You can boot up a demo 🐸TTS server to run an inference with your models (make
sure to install the additional dependencies with `pip install coqui-tts[server]`).
Note that the server is not optimized for performance but gives you an easy way
to interact with the models.
Note that the server is not optimized for performance and does not support all
Coqui models yet.
The demo server provides pretty much the same interface as the CLI command.
@ -192,3 +192,8 @@ api.tts_with_vc_to_file(
file_path="ouptut.wav"
)
```
```{toctree}
:hidden:
marytts
```

View File

@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
speech dataset and achieve reasonable results with only a couple of hours of data.
However, note that, fine-tuning does not ensure great results. The model
performance still depends on the [dataset quality](what_makes_a_good_dataset.md)
performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md)
and the hyper-parameters you choose for fine-tuning. Therefore,
it still takes a bit of tinkering.
@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
1. Setup your dataset.
You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
training. Please see [this page](formatting_your_dataset.md) for more information about formatting.
training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting.
2. Choose the model you want to fine-tune.
@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
simple testing, you can use the `tts` command on the terminal. For more info
see [here](inference.md).
see [here](../inference.md).
3. Download the model.

View File

@ -0,0 +1,10 @@
# Training and fine-tuning
The following pages show you how to train and fine-tune Coqui models:
```{toctree}
:maxdepth: 1
training_a_model
finetuning
```

View File

@ -11,11 +11,10 @@
3. Check the recipes.
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for
`Nervous Beginners`.
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point.
A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.
```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py
```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py
```
You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
@ -113,7 +112,7 @@
Note that different models have different metrics, visuals and outputs.
You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions
You should also check the [FAQ page](../faq.md) for common problems and solutions
that occur in a training.
7. Use your best model for inference.
@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th
The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.
```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py
```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py
```

View File

@ -24,10 +24,14 @@ $ tts-server --list_models # list the available models.
```
![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)
See [this page](inference.md) for more details on synthesizing speech with the
CLI, server or Python API.
## Training a `tts` Model
A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details.
A breakdown of a simple script that trains a GlowTTS model on the LJspeech
dataset. For a more in-depth guide to training and fine-tuning also see [this
page](training/index.md).
### Pure Python Way