coqui-tts/TTS/speaker_encoder
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
..
configs use SpeakerManager on compute embeddings script 2021-05-29 21:11:53 -03:00
models Allow saving / loading checkpoints from cloud paths (#683) 2021-08-09 18:02:36 +00:00
utils Allow saving / loading checkpoints from cloud paths (#683) 2021-08-09 18:02:36 +00:00
README.md rename the project to old TTS 2020-09-09 12:27:23 +02:00
__init__.py rename the project to old TTS 2020-09-09 12:27:23 +02:00
dataset.py make style 2021-05-31 16:37:15 +02:00
losses.py make style 2021-05-31 16:37:15 +02:00
requirements.txt rename the project to old TTS 2020-09-09 12:27:23 +02:00
speaker_encoder_config.py make style 2021-05-31 16:37:15 +02:00
umap.png rename the project to old TTS 2020-09-09 12:27:23 +02:00

README.md

Speaker Encoder

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.

With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.

Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in this video.

Download a pretrained model from Released Models page.

To run the code, you need to follow the same flow as in TTS.

  • Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
  • Example training call python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
  • Generate embedding vectors python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth.tar model/config/path/config.json dataset/path/ output_path . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
  • Watch training on Tensorboard as in TTS