mirror of https://github.com/coqui-ai/TTS.git
* Fix checkpointing GAN models (#1641)
* checkpoint sae step crash fix
* checkpoint save step crash fix
* Update gan.py
updated requested changes
* crash fix
* Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fix Publish CI (#1597)
* Try out manylinux
* temporary removal of useless pipeline
* remove check and use only manylinux
* Try --plat-name
* Add install requirements
* Add back other actions
* Add PR trigger
* Remove conditions
* Fix sythax
* Roll back some changes
* Add other python versions
* Add test pypi upload
* Add username
* Add back __token__ as username
* Modify name of entry to testpypi
* Set it to release only
* Fix version checking
* Fix tokenizer for punc only (#1717)
* Remove redundant config field
* Fix SSIM loss
* Separate loss tests
* Fix BCELoss adressing #1192
* Make style
* Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS
* Make style
* Fix tts_tests
* Fix test_get_aux_input
* Make lint
* feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
* Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig
* Update VITS LJSpeech recipe
* Update VITS VCTK recipe
* Make style
* Add missing decorator
* Add missing param
* Make style
* Update recipes
* Fix test
* Bug fix
* Exclude tests folder
* Make linter
* Make style
* Fix device allocation
* Fix SSIM loss correction
* Fix aux tests (#1753)
* Set n_jobs to 1 for resample script
* Delete resample test
* Set n_jobs 1 in vad test
* delete vad test
* Revert "Delete resample test"
This reverts commit
|
||
---|---|---|
.. | ||
configs | ||
models | ||
utils | ||
README.md | ||
__init__.py | ||
dataset.py | ||
losses.py | ||
requirements.txt |
README.md
Speaker Encoder
This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.
With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.
Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in this video.
Download a pretrained model from Released Models page.
To run the code, you need to follow the same flow as in TTS.
- Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
- Example training call
python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360
- Generate embedding vectors
python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path
. This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files. - Watch training on Tensorboard as in TTS