coqui-tts

History

erogol 6a46339a43 rebase fixes		2020-08-05 20:19:23 +02:00
..
README.md	mass linter fix	2020-08-04 14:07:47 +02:00
__init__.py	rebranding !!!!!!!!!!!!!!!!!!!!!!!!!!!!	2020-08-04 10:32:04 +02:00
compute_embeddings.py	rebranding and replacing import statements	2020-08-04 10:51:19 +02:00
config.json	add support for CorentinJ Speaker encoder and add notebook for extract embeddings	2020-08-05 19:35:11 +02:00
dataset.py	update deprecated functions call from speaker encoder	2020-08-05 19:33:35 +02:00
generic_utils.py	add suport for AngleProto loss	2020-08-05 19:32:41 +02:00
losses.py	rebase fixes	2020-08-05 20:19:23 +02:00
model.py	add support for CorentinJ Speaker encoder and add notebook for extract embeddings	2020-08-05 19:35:11 +02:00
requirements.txt	rebranding !!!!!!!!!!!!!!!!!!!!!!!!!!!!	2020-08-04 10:32:04 +02:00
umap.png	rebranding !!!!!!!!!!!!!!!!!!!!!!!!!!!!	2020-08-04 10:32:04 +02:00
visual.py	rebranding !!!!!!!!!!!!!!!!!!!!!!!!!!!!	2020-08-04 10:32:04 +02:00

README.md

Speaker Encoder

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.

With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.

Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in this video.

Download a pretrained model from Released Models page.

To run the code, you need to follow the same flow as in mozilla_voice_tts.

Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
Example training call python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
Generate embedding vectors python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth.tar model/config/path/config.json dataset/path/ output_path . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
Watch training on Tensorboard as in TTS