diff --git a/docs/source/formatting_your_dataset.md b/docs/source/datasets/formatting_your_dataset.md similarity index 98% rename from docs/source/formatting_your_dataset.md rename to docs/source/datasets/formatting_your_dataset.md index 7376ff66..e9226333 100644 --- a/docs/source/formatting_your_dataset.md +++ b/docs/source/datasets/formatting_your_dataset.md @@ -1,7 +1,9 @@ (formatting_your_dataset)= # Formatting your dataset -For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription. +For training a TTS model, you need a dataset with speech recordings and +transcriptions. The speech must be divided into audio clips and each clip needs +a transcription. If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software. diff --git a/docs/source/datasets/index.md b/docs/source/datasets/index.md new file mode 100644 index 00000000..6b040fc4 --- /dev/null +++ b/docs/source/datasets/index.md @@ -0,0 +1,12 @@ +# Datasets + +For training a TTS model, you need a dataset with speech recordings and +transcriptions. See the following pages for more information on: + +```{toctree} +:maxdepth: 1 + +formatting_your_dataset +what_makes_a_good_dataset +tts_datasets +``` diff --git a/docs/source/tts_datasets.md b/docs/source/datasets/tts_datasets.md similarity index 90% rename from docs/source/tts_datasets.md rename to docs/source/datasets/tts_datasets.md index 3a0bcf11..df8d2f2a 100644 --- a/docs/source/tts_datasets.md +++ b/docs/source/datasets/tts_datasets.md @@ -1,6 +1,6 @@ -# TTS datasets +# Public TTS datasets -Some of the known public datasets that we successfully applied 🐸TTS: +Some of the known public datasets that were successfully used for 🐸TTS: - [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/) - [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/) diff --git a/docs/source/what_makes_a_good_dataset.md b/docs/source/datasets/what_makes_a_good_dataset.md similarity index 100% rename from docs/source/what_makes_a_good_dataset.md rename to docs/source/datasets/what_makes_a_good_dataset.md diff --git a/docs/source/implementing_a_new_language_frontend.md b/docs/source/extension/implementing_a_new_language_frontend.md similarity index 100% rename from docs/source/implementing_a_new_language_frontend.md rename to docs/source/extension/implementing_a_new_language_frontend.md diff --git a/docs/source/implementing_a_new_model.md b/docs/source/extension/implementing_a_new_model.md similarity index 98% rename from docs/source/implementing_a_new_model.md rename to docs/source/extension/implementing_a_new_model.md index a2721a1c..25217897 100644 --- a/docs/source/implementing_a_new_model.md +++ b/docs/source/extension/implementing_a_new_model.md @@ -36,7 +36,8 @@ There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you an infinite flexibility to add custom behaviours for your model and training routines. - For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`. + For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model) + and `TTS.utils.callbacks`. 6. Optionally, define `MyModelArgs`. diff --git a/docs/source/extension/index.md b/docs/source/extension/index.md new file mode 100644 index 00000000..39c36b63 --- /dev/null +++ b/docs/source/extension/index.md @@ -0,0 +1,14 @@ +# Adding models or languages + +You can extend Coqui by implementing new model architectures or adding front +ends for new languages. See the pages below for more details. The [project +structure](../project_structure.md) and [contribution +guidelines](../contributing.md) may also be helpful. Please open a pull request +with your changes to share back the improvements with the community. + +```{toctree} +:maxdepth: 1 + +implementing_a_new_model +implementing_a_new_language_frontend +``` diff --git a/docs/source/faq.md b/docs/source/faq.md index e0197cf7..1dd5c184 100644 --- a/docs/source/faq.md +++ b/docs/source/faq.md @@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is - If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny. ## What are the requirements of a good 🐸TTS dataset? -- [See this page](what_makes_a_good_dataset.md) +- [See this page](datasets/what_makes_a_good_dataset.md) ## How should I choose the right model? - First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2. @@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is ## How can I train my own `tts` model? 0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis. -1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech. +1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech. A `formatter` parses the metadata file and converts a list of training samples. 2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```. diff --git a/docs/source/index.md b/docs/source/index.md index cb835d47..ae34771c 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -4,10 +4,10 @@ ``` ---- -# Documentation Content ```{toctree} :maxdepth: 1 :caption: Get started +:hidden: tutorial_for_nervous_beginners installation @@ -20,22 +20,19 @@ contributing ```{toctree} :maxdepth: 1 :caption: Using Coqui +:hidden: inference -training_a_model -finetuning -implementing_a_new_model -implementing_a_new_language_frontend -formatting_your_dataset -what_makes_a_good_dataset -tts_datasets -marytts +training/index +extension/index +datasets/index ``` ```{toctree} :maxdepth: 1 :caption: Main Classes +:hidden: configuration main_classes/trainer_api @@ -50,6 +47,7 @@ main_classes/speaker_manager ```{toctree} :maxdepth: 1 :caption: TTS Models +:hidden: models/glow_tts.md models/vits.md diff --git a/docs/source/inference.md b/docs/source/inference.md index 4556643c..ccce84b0 100644 --- a/docs/source/inference.md +++ b/docs/source/inference.md @@ -86,8 +86,8 @@ tts --model_name "voice_conversion///" You can boot up a demo 🐸TTS server to run an inference with your models (make sure to install the additional dependencies with `pip install coqui-tts[server]`). -Note that the server is not optimized for performance but gives you an easy way -to interact with the models. +Note that the server is not optimized for performance and does not support all +Coqui models yet. The demo server provides pretty much the same interface as the CLI command. @@ -192,3 +192,8 @@ api.tts_with_vc_to_file( file_path="ouptut.wav" ) ``` + +```{toctree} +:hidden: +marytts +``` diff --git a/docs/source/finetuning.md b/docs/source/training/finetuning.md similarity index 95% rename from docs/source/finetuning.md rename to docs/source/training/finetuning.md index 9c9f2c8d..1fe54fbc 100644 --- a/docs/source/finetuning.md +++ b/docs/source/training/finetuning.md @@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways: speech dataset and achieve reasonable results with only a couple of hours of data. However, note that, fine-tuning does not ensure great results. The model - performance still depends on the [dataset quality](what_makes_a_good_dataset.md) + performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md) and the hyper-parameters you choose for fine-tuning. Therefore, it still takes a bit of tinkering. @@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways: 1. Setup your dataset. You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the - training. Please see [this page](formatting_your_dataset.md) for more information about formatting. + training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting. 2. Choose the model you want to fine-tune. @@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways: You should choose the model based on your requirements. Some models are fast and some are better in speech quality. One lazy way to test a model is running the model on the hardware you want to use and see how it works. For simple testing, you can use the `tts` command on the terminal. For more info - see [here](inference.md). + see [here](../inference.md). 3. Download the model. diff --git a/docs/source/training/index.md b/docs/source/training/index.md new file mode 100644 index 00000000..bb76a705 --- /dev/null +++ b/docs/source/training/index.md @@ -0,0 +1,10 @@ +# Training and fine-tuning + +The following pages show you how to train and fine-tune Coqui models: + +```{toctree} +:maxdepth: 1 + +training_a_model +finetuning +``` diff --git a/docs/source/training_a_model.md b/docs/source/training/training_a_model.md similarity index 93% rename from docs/source/training_a_model.md rename to docs/source/training/training_a_model.md index 6f612dc0..22505ccb 100644 --- a/docs/source/training_a_model.md +++ b/docs/source/training/training_a_model.md @@ -11,11 +11,10 @@ 3. Check the recipes. - Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for - `Nervous Beginners`. + Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point. A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`. - ```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py + ```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py ``` You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig` @@ -113,7 +112,7 @@ Note that different models have different metrics, visuals and outputs. - You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions + You should also check the [FAQ page](../faq.md) for common problems and solutions that occur in a training. 7. Use your best model for inference. @@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below. -```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py +```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py ``` diff --git a/docs/source/tutorial_for_nervous_beginners.md b/docs/source/tutorial_for_nervous_beginners.md index 5df56fc6..a8a64410 100644 --- a/docs/source/tutorial_for_nervous_beginners.md +++ b/docs/source/tutorial_for_nervous_beginners.md @@ -24,10 +24,14 @@ $ tts-server --list_models # list the available models. ``` ![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif) +See [this page](inference.md) for more details on synthesizing speech with the +CLI, server or Python API. ## Training a `tts` Model -A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details. +A breakdown of a simple script that trains a GlowTTS model on the LJspeech +dataset. For a more in-depth guide to training and fine-tuning also see [this +page](training/index.md). ### Pure Python Way