coqui-tts/TTS/encoder
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
..
configs REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
models REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
utils REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
README.md REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
__init__.py REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
dataset.py REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
losses.py REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
requirements.txt REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00

README.md

Speaker Encoder

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.

With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.

Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in this video.

Download a pretrained model from Released Models page.

To run the code, you need to follow the same flow as in TTS.

  • Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
  • Example training call python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
  • Generate embedding vectors python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth.tar model/config/path/config.json dataset/path/ output_path . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
  • Watch training on Tensorboard as in TTS