coqui-tts

History

Eren Golge cd06a4c1e5 linter fix		2019-11-12 13:51:22 +01:00
..
notebooks	speaker encoder implementation	2019-11-01 12:23:03 +01:00
README.md	Update README.md	2019-11-01 12:52:12 +01:00
__init__.py	speaker encoder implementation	2019-11-01 12:23:03 +01:00
compute_embeddings.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00
config.json	speaker encoder implementation	2019-11-01 12:23:03 +01:00
dataset.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00
generic_utils.py	speaker encoder implementation	2019-11-01 12:23:03 +01:00
loss.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00
model.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00
tests.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00
train.py	linter fix	2019-11-12 13:51:22 +01:00
umap.png	speaker encoder implementation	2019-11-01 12:23:03 +01:00
visual.py	linter and test updates for speaker_encoder, gmm_Attention	2019-11-12 12:42:42 +01:00

README.md

Speaker embedding (Experimental)

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding. So you can generate d-vectors for multi-speaker TTS or prune bad samples from your TTS dataset. Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook.

Download a pretrained model from Released Models page.

To run the code, you need to follow the same flow as in TTS.

Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
Example training call python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
Generate embedding vectors python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth.tar model/config/path/config.json dataset/path/ output_path . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
Watch training on Tensorboard as in TTS