* Use packaging.version for version comparisons
The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.
With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:
> TypeError: '<' not supported between instances of 'str' and 'int'
This is fixed by this migration.
[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/
* Improve espeak version detection robustness
On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.
This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.
This improves the version detection by simply looking for the version
after the "text-to-speech:" token.
* Replace distuils.copy_tree with shutil.copytree
The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
* Fixed bug related to yourtts speaker embeddings issue
* Reverted code for base_tts
* Bug fix on VITS d_vector_file type
* Ignore the test speakers on YourTTS recipe
* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference
* Update YourTTS config file
* Update ModelManager._update_path to deal with list attributes
* Fix lint checks
* Remove unused code
* Fix unit tests
* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files
* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
* Adding encoder
* currently modifying hmm
* Adding hmm
* Adding overflow
* Adding overflow setting up flat start
* Removing runs
* adding normalization parameters
* Fixing models on same device
* Training overflow and plotting evaluations
* Adding inference
* At the end of epoch the test sentences are coming on cpu instead of gpu
* Adding figures from model during training to monitor
* reverting tacotron2 training recipe
* fixing inference on gpu for test sentences on config
* moving helpers and texts within overflows source code
* renaming to overflow
* moving loss to the model file
* Fixing the rename
* Model training but not plotting the test config sentences's audios
* Formatting logs
* Changing model name to camelcase
* Fixing test log
* Fixing plotting bug
* Adding some tests
* Adding more tests to overflow
* Adding all tests for overflow
* making changes to camel case in config
* Adding information about parameters and docstring
* removing compute_mel_statistics moved statistic computation to the model instead
* Added overflow in readme
* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
* Update requirements.txt
install jamo for korean
* Update formatters.py
add KSS formatter
KSS is a korean single speech dataset (12hours)
* Add files via upload
add phonemizer for korean
* Add files via upload
add korean phonemizer
* Update requirements.txt
* change code style with `black` and `pylint`
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation-2
* isort
* edit about separator
write test case and add 'nltk' for requirements.txt
* add korean g2p (g2pkk)
* isort
* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)
TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)
* black
* Fix checkpointing GAN models (#1641)
* checkpoint sae step crash fix
* checkpoint save step crash fix
* Update gan.py
updated requested changes
* crash fix
* Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fix Publish CI (#1597)
* Try out manylinux
* temporary removal of useless pipeline
* remove check and use only manylinux
* Try --plat-name
* Add install requirements
* Add back other actions
* Add PR trigger
* Remove conditions
* Fix sythax
* Roll back some changes
* Add other python versions
* Add test pypi upload
* Add username
* Add back __token__ as username
* Modify name of entry to testpypi
* Set it to release only
* Fix version checking
* Fix tokenizer for punc only (#1717)
* Remove redundant config field
* Fix SSIM loss
* Separate loss tests
* Fix BCELoss adressing #1192
* Make style
* Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS
* Make style
* Fix tts_tests
* Fix test_get_aux_input
* Make lint
* feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
* Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig
* Update VITS LJSpeech recipe
* Update VITS VCTK recipe
* Make style
* Add missing decorator
* Add missing param
* Make style
* Update recipes
* Fix test
* Bug fix
* Exclude tests folder
* Make linter
* Make style
* Fix device allocation
* Fix SSIM loss correction
* Fix aux tests (#1753)
* Set n_jobs to 1 for resample script
* Delete resample test
* Set n_jobs 1 in vad test
* delete vad test
* Revert "Delete resample test"
This reverts commit bb7c8466af.
* Remove tests with resample
* Fix for FloorDiv Function Warning (#1760)
* Fix for Floor Function Warning
Fix for Floor Function Warning
* Adding double quotes to fix formatting
Adding double quotes to fix formatting
* Update glow_tts.py
* Update glow_tts.py
* Fix type in download_vctk.sh (#1739)
typo in comment
* Update decoder.py (#1792)
Minor comment correction.
* Update requirements.txt (#1791)
Support for #1775
* Update README.md (#1776)
Fix typo in different and code sample
* Fix & update WaveRNN vocoder model (#1749)
* Fixes KeyError bug. Adding logging to dashboard.
* Make pep8 compliant
* Make style compliant
* Still fixing style
* Fix rand_segment edge case (input_len == seg_len - 1)
* Update requirements.txt; inflect==5.6 (#1809)
New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808
Force inflect==5.6 (pydantic free) install to solve dependency issue.
* Update README.md; download progress bar in CLI. (#1797)
* Update README.md
- minor PR
- added model_info usage guide based on #1623 in README.md .
* "added tqdm bar for model download"
* Update manage.py
* fixed style
* fixed style
* sort imports
* Update wavenet.py (#1796)
* Update wavenet.py
Current version does not use "in_channels" argument.
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor.
However, since it is a generic implementation, I believe it is better to update it for a more general use.
* "in_channels -> hidden_channels"
* Adjust default to be able to process longer sentences (#1835)
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.
This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].
This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.
[1] https://github.com/mozilla/TTS/issues/734
* Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer
* Style
* Updated language flag regex to consider all language codes alike
* fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding
function typo leads to training crash, no such function
* fix typo
get_random_embedding
* Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions
* Add tests for numpy transforms
* Fix imports
* Fix imports2
* Implement bucketed weighted sampling for VITS (#1871)
* Update capacitron_layers.py (#1664)
crashing because of dimension miss match at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
* updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853)
* Fix BCE loss issue (#1872)
* Fix BCE loss issue
* Remove import
* Remove deprecated files (#1873)
- samplers.py is moved
- distribute.py is replaces by the 👟Trainer
* Handle when no batch sampler (#1882)
* Fix tune wavegrad (#1844)
* fix imports in tune_wavegrad
* load_config returns Coqpit object instead None
* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program
* fix var order in the result of batch collating
* make style
* make style with black and isort
* Bump up to v0.8.0
* Add new DE Thorsten models (#1898)
- Tacotron2-DDC
- HifiGAN vocoder
Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com>
Co-authored-by: camillem <camillem@users.noreply.github.com>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: a-froghyar <adamfroghyar@gmail.com>
Co-authored-by: ivan provalov <iprovalo@yahoo.com>
Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com>
Co-authored-by: p0p4k <rajiv.punmiya@gmail.com>
Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com>
Co-authored-by: vanIvan <alfa1211@gmail.com>
Co-authored-by: Lars Kiesow <lkiesow@uos.de>
Co-authored-by: rbaraglia <baraglia.r@live.fr>
Co-authored-by: jchai.me <jreus@users.noreply.github.com>
Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>
* Use fsspec and torch for embedding file
* Fixup
* Fix load and save files
* Fix compute embedding script
* Set use_cuda to true if available
* Add dummy speakers.pth file
* Make style
* Change default speakers file extension
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
* new CI config
* initial Capacitron implementation
* delete old unused file
* fix empty formatting changes
* update losses and training script
* fix previous commit
* fix commit
* Add Capacitron test and first round of test fixes
* revert formatter change
* add changes to the synthesizer
* add stepwise gradual lr scheduler and changes to the recipe
* add inference script for dev use
* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting
* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model
* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style
* chore: fix linting
* feat: add Tacotron 2 support
* leftover from dev
* chore:rename parser args
* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers
* chore: revert arbitrary trainer changes
* fmt: revert formatting bug
* formatting again
* formatting fixed
* fix: log func
* fix: update optimizer
- Implemented load_state_dict for continuing training
* fix: clean optimizer init for standard models
* improvement: purge espeak flags and add training scripts
* Delete capacitronT2.py
delete old training script, new one is pushed
* feat: capacitron trainer methods
- extracted capacitron specific training operations from the trainer into custom
methods in taco1 and taco2 models
* chore: renaming and merging capacitron and gst style args
* fix: bug fixes from the previous commit
* fix: implement state_dict method on CapacitronOptimizer
* fix: call method
* fix: inference naming
* Delete train_capacitron.py
* fix: synthesize
* feat: update tests
* chore: fix style
* Delete capacitron_inference.py
* fix: fix train tts t2 capacitron tests
* fix: double forward in T2 train step
* fix: double forward in T1 train step
* fix: run make style
* fix: remove unused import
* fix: test for T1 capacitron
* fix: make lint
* feat: add blizzard2013 recipes
* make style
* fix: update recipes
* chore: make style
* Plot test sentences in Tacotron
* chore: make style and fix import
* fix: call forward first before problematic floordiv op
* fix: update recipes
* feat: add min_audio_len to recipes
* aux_input["style_mel"]
* chore: make style
* Make capacitron T2 recipe more stable
* Remove T1 capacitron Ljspeech
* feat: implement new grad clipping routine and update configs
* make style
* Add pretrained checkpoints
* Add default vocoder
* Change trainer package
* Fix grad clip issue for tacotron
* Fix scheduler issue with tacotron
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Enforce phonemizer definition for synthesis
* Fix train_tts, tokenizer init can now edit config
* Add small change to trigger CI pipeline
* fix wrong output path for one tts_test
* Fix style
* Test config overides by args and tokenizer
* Fix style
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config