From c9e2df14510a6413f2cc8dec3294f4f01f4a7b8a Mon Sep 17 00:00:00 2001 From: erogol Date: Wed, 15 Jul 2020 11:08:35 +0200 Subject: [PATCH] README.md update --- README.md | 38 ++++++++++++++------------------------ 1 file changed, 14 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index ce496707..4903d4b0 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,10 @@

+
+[![Discourse](https://img.shields.io/badge/discourse-online-green.svg)](https://discourse.mozilla.org/c/tts) +
This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. @@ -38,25 +41,26 @@ Vocoders: You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers). ## Features -- High performance Deep Learning models for Text2Speech related tasks. - - Text2Speech models (Tacotron, Tacotron2). +- High performance Deep Learning models for Text2Speech tasks. + - Text2Spec models (Tacotron, Tacotron2). - Speaker Encoder to compute speaker embeddings efficiently. - - Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS) -- Support for multi-speaker TTS training. -- Support for Multi-GPUs training. -- Ability to convert Torch models to Tensorflow 2.0 for inference. -- Released pre-trained models. + - Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN) - Fast and efficient model training. - Detailed training logs on console and Tensorboard. +- Support for multi-speaker TTS. +- Efficient Multi-GPUs training. +- Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference. +- Released models in PyTorch, Tensorflow and TFLite. - Tools to curate Text2Speech datasets under```dataset_analysis```. - Demo server for model testing. - Notebooks for extensive model benchmarking. - Modular (but not too much) code base enabling easy testing for new ideas. -## Requirements and Installation +## Main Requirements and Installation Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation. * python>=3.6 - * pytorch>=0.4.1 + * pytorch>=1.4.1 + * tensorflow>=2.2 * librosa * tensorboard * tensorboardX @@ -107,21 +111,7 @@ Audio examples: [soundcloud](https://soundcloud.com/user-565970875/pocket-articl example_output -## Runtime -The most time-consuming part is the vocoder algorithm (Griffin-Lim) which runs on CPU. By setting its number of iterations lower, you might have faster execution with a small loss of quality. Some of the experimental values are below. - -Sentence: "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent." - -Audio length is approximately 6 secs. - -| Time (secs) | System | # GL iters | Model -| ---- |:-------|:-----------| ---- | -|2.00|GTX1080Ti|30|Tacotron| -|3.01|GTX1080Ti|60|Tacotron| -|3.57|CPU|60|Tacotron| -|5.27|GTX1080Ti|60|Tacotron2| -|6.50|CPU|60|Tacotron2| - +## [Mozilla TTS Tutorials and Notebooks](https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials) ## Datasets and Data-Loading TTS provides a generic dataloader easy to use for new datasets. You need to write an preprocessor function to integrate your own dataset.Check ```datasets/preprocess.py``` to see some examples. After the function, you need to set ```dataset``` field in ```config.json```. Do not forget other data related fields too.