mirror of https://github.com/coqui-ai/TTS.git
commit
6b2ba527fa
|
@ -56,4 +56,4 @@ ModelConfig()
|
|||
|
||||
In the example above, ```ModelConfig()``` is the final configuration that the model receives and it has all the fields necessary for the model.
|
||||
|
||||
We host pre-defined model configurations under ```TTS/<model_class>/configs/```.Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.
|
||||
We host pre-defined model configurations under ```TTS/<model_class>/configs/```. Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.
|
||||
|
|
|
@ -21,7 +21,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
|||
Fine-tuning comes to the rescue in this case. You can take one of our pre-trained models and fine-tune it on your own
|
||||
speech dataset and achieve reasonable results with only a couple of hours of data.
|
||||
|
||||
However, note that, fine-tuning does not ensure great results. The model performance is still depends on the
|
||||
However, note that, fine-tuning does not ensure great results. The model performance still depends on the
|
||||
{ref}`dataset quality <what_makes_a_good_dataset>` and the hyper-parameters you choose for fine-tuning. Therefore,
|
||||
it still takes a bit of tinkering.
|
||||
|
||||
|
@ -41,7 +41,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
|||
tts --list_models
|
||||
```
|
||||
|
||||
The command above lists the the models in a naming format as ```<model_type>/<language>/<dataset>/<model_name>```.
|
||||
The command above lists the models in a naming format as ```<model_type>/<language>/<dataset>/<model_name>```.
|
||||
|
||||
Or you can manually check the `.model.json` file in the project directory.
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@ If you have a single audio file and you need to split it into clips, there are d
|
|||
|
||||
It is also important to use a lossless audio file format to prevent compression artifacts. We recommend using `wav` file format.
|
||||
|
||||
Let's assume you created the audio clips and their transcription. You can collect all your clips under a folder. Let's call this folder `wavs`.
|
||||
Let's assume you created the audio clips and their transcription. You can collect all your clips in a folder. Let's call this folder `wavs`.
|
||||
|
||||
```
|
||||
/wavs
|
||||
|
@ -17,7 +17,7 @@ Let's assume you created the audio clips and their transcription. You can collec
|
|||
...
|
||||
```
|
||||
|
||||
You can either create separate transcription files for each clip or create a text file that maps each audio clip to its transcription. In this file, each column must be delimitered by a special character separating the audio file name, the transcription and the normalized transcription. And make sure that the delimiter is not used in the transcription text.
|
||||
You can either create separate transcription files for each clip or create a text file that maps each audio clip to its transcription. In this file, each column must be delimited by a special character separating the audio file name, the transcription and the normalized transcription. And make sure that the delimiter is not used in the transcription text.
|
||||
|
||||
We recommend the following format delimited by `|`. In the following example, `audio1`, `audio2` refer to files `audio1.wav`, `audio2.wav` etc.
|
||||
|
||||
|
@ -55,7 +55,7 @@ For more info about dataset qualities and properties check our [post](https://gi
|
|||
|
||||
After you collect and format your dataset, you need to check two things. Whether you need a `formatter` and a `text_cleaner`. The `formatter` loads the text file (created above) as a list and the `text_cleaner` performs a sequence of text normalization operations that converts the raw text into the spoken representation (e.g. converting numbers to text, acronyms, and symbols to the spoken format).
|
||||
|
||||
If you use a different dataset format then the LJSpeech or the other public datasets that 🐸TTS supports, then you need to write your own `formatter`.
|
||||
If you use a different dataset format than the LJSpeech or the other public datasets that 🐸TTS supports, then you need to write your own `formatter`.
|
||||
|
||||
If your dataset is in a new language or it needs special normalization steps, then you need a new `text_cleaner`.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
- Language frontends are located under `TTS.tts.utils.text`
|
||||
- Each special language has a separate folder.
|
||||
- Each folder containst all the utilities for processing the text input.
|
||||
- Each folder contains all the utilities for processing the text input.
|
||||
- `TTS.tts.utils.text.phonemizers` contains the main phonemizer for a language. This is the class that uses the utilities
|
||||
from the previous step and used to convert the text to phonemes or graphemes for the model.
|
||||
- After you implement your phonemizer, you need to add it to the `TTS/tts/utils/text/phonemizers/__init__.py` to be able to
|
||||
|
|
|
@ -145,7 +145,7 @@ class MyModel(BaseTTS):
|
|||
Args:
|
||||
ap (AudioProcessor): audio processor used at training.
|
||||
batch (Dict): Model inputs used at the previous training step.
|
||||
outputs (Dict): Model outputs generated at the previoud training step.
|
||||
outputs (Dict): Model outputs generated at the previous training step.
|
||||
|
||||
Returns:
|
||||
Tuple[Dict, np.ndarray]: training plots and output waveform.
|
||||
|
@ -183,7 +183,7 @@ class MyModel(BaseTTS):
|
|||
...
|
||||
|
||||
def get_optimizer(self) -> Union["Optimizer", List["Optimizer"]]:
|
||||
"""Setup an return optimizer or optimizers."""
|
||||
"""Setup a return optimizer or optimizers."""
|
||||
pass
|
||||
|
||||
def get_lr(self) -> Union[float, List[float]]:
|
||||
|
|
|
@ -2,13 +2,13 @@
|
|||
|
||||
## What is Mary-TTS?
|
||||
|
||||
[Mary (Modular Architecture for Research in sYynthesis) Text-to-Speech](http://mary.dfki.de/) is an open-source (GNU LGPL license), multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of [DFKI’s](http://www.dfki.de/web) Language Technology Lab and the [Institute of Phonetics](http://www.coli.uni-saarland.de/groups/WB/Phonetics/) at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [Cluster of Excellence MMCI](https://www.mmci.uni-saarland.de/) and DFKI.
|
||||
[Mary (Modular Architecture for Research in sYnthesis) Text-to-Speech](http://mary.dfki.de/) is an open-source (GNU LGPL license), multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of [DFKI’s](http://www.dfki.de/web) Language Technology Lab and the [Institute of Phonetics](http://www.coli.uni-saarland.de/groups/WB/Phonetics/) at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [Cluster of Excellence MMCI](https://www.mmci.uni-saarland.de/) and DFKI.
|
||||
MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016.
|
||||
You can check out this OpenVoice-Tech page to learn more: https://openvoice-tech.net/index.php/MaryTTS
|
||||
|
||||
## Why Mary-TTS compatibility is relevant
|
||||
|
||||
Due to it's open-source nature, relatively high quality voices and fast synthetization speed Mary-TTS was a popular choice in the past and many tools implemented API support over the years like screen-readers (NVDA + SpeechHub), smart-home HUBs (openHAB, Home Assistant) or voice assistants (Rhasspy, Mycroft, SEPIA). A compatibility layer for Coqui-TTS will ensure that these tools can use Coqui as a drop-in replacement and get even better voices right away.
|
||||
Due to its open-source nature, relatively high quality voices and fast synthetization speed Mary-TTS was a popular choice in the past and many tools implemented API support over the years like screen-readers (NVDA + SpeechHub), smart-home HUBs (openHAB, Home Assistant) or voice assistants (Rhasspy, Mycroft, SEPIA). A compatibility layer for Coqui-TTS will ensure that these tools can use Coqui as a drop-in replacement and get even better voices right away.
|
||||
|
||||
## API and code examples
|
||||
|
||||
|
|
Loading…
Reference in New Issue