coqui-tts/TTS/encoder
Enno Hermann 2df9bfa78e
refactor: handle deprecation of torch.cuda.amp.autocast (#144)
torch.cuda.amp.autocast(args...) and torch.cpu.amp.autocast(args...) will be
deprecated. Please use torch.autocast("cuda", args...) or torch.autocast("cpu",
args...) instead.

https://pytorch.org/docs/stable/amp.html
2024-11-09 18:37:08 +01:00
..
configs Ruff autofix unused imports and import order 2023-12-13 14:56:41 +02:00
models refactor: handle deprecation of torch.cuda.amp.autocast (#144) 2024-11-09 18:37:08 +01:00
utils Merge pull request #50 from idiap/umap 2024-07-25 13:26:09 +01:00
README.md fix(bin.synthesize): correctly handle boolean arguments 2024-05-31 08:39:32 +02:00
__init__.py REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00
dataset.py chore: address pytorch deprecations 2024-06-26 11:38:25 +02:00
losses.py fix: use logging instead of print statements 2024-04-03 15:19:45 +02:00
requirements.txt REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349) 2022-03-11 14:43:40 +01:00

README.md

Speaker Encoder

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.

With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.

Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in this video.

Download a pretrained model from Released Models page.

To run the code, you need to follow the same flow as in TTS.

  • Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
  • Example training call python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
  • Generate embedding vectors python speaker_encoder/compute_embeddings.py --use_cuda /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
  • Watch training on Tensorboard as in TTS