* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
* fix imports in tune_wavegrad
* load_config returns Coqpit object instead None
* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program
* fix var order in the result of batch collating
* make style
* make style with black and isort
* Use fsspec and torch for embedding file
* Fixup
* Fix load and save files
* Fix compute embedding script
* Set use_cuda to true if available
* Add dummy speakers.pth file
* Make style
* Change default speakers file extension
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
* new CI config
* initial Capacitron implementation
* delete old unused file
* fix empty formatting changes
* update losses and training script
* fix previous commit
* fix commit
* Add Capacitron test and first round of test fixes
* revert formatter change
* add changes to the synthesizer
* add stepwise gradual lr scheduler and changes to the recipe
* add inference script for dev use
* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting
* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model
* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style
* chore: fix linting
* feat: add Tacotron 2 support
* leftover from dev
* chore:rename parser args
* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers
* chore: revert arbitrary trainer changes
* fmt: revert formatting bug
* formatting again
* formatting fixed
* fix: log func
* fix: update optimizer
- Implemented load_state_dict for continuing training
* fix: clean optimizer init for standard models
* improvement: purge espeak flags and add training scripts
* Delete capacitronT2.py
delete old training script, new one is pushed
* feat: capacitron trainer methods
- extracted capacitron specific training operations from the trainer into custom
methods in taco1 and taco2 models
* chore: renaming and merging capacitron and gst style args
* fix: bug fixes from the previous commit
* fix: implement state_dict method on CapacitronOptimizer
* fix: call method
* fix: inference naming
* Delete train_capacitron.py
* fix: synthesize
* feat: update tests
* chore: fix style
* Delete capacitron_inference.py
* fix: fix train tts t2 capacitron tests
* fix: double forward in T2 train step
* fix: double forward in T1 train step
* fix: run make style
* fix: remove unused import
* fix: test for T1 capacitron
* fix: make lint
* feat: add blizzard2013 recipes
* make style
* fix: update recipes
* chore: make style
* Plot test sentences in Tacotron
* chore: make style and fix import
* fix: call forward first before problematic floordiv op
* fix: update recipes
* feat: add min_audio_len to recipes
* aux_input["style_mel"]
* chore: make style
* Make capacitron T2 recipe more stable
* Remove T1 capacitron Ljspeech
* feat: implement new grad clipping routine and update configs
* make style
* Add pretrained checkpoints
* Add default vocoder
* Change trainer package
* Fix grad clip issue for tacotron
* Fix scheduler issue with tacotron
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Enforce phonemizer definition for synthesis
* Fix train_tts, tokenizer init can now edit config
* Add small change to trigger CI pipeline
* fix wrong output path for one tts_test
* Fix style
* Test config overides by args and tokenizer
* Fix style
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Fix the bug in find unique chars script
* Add OpenBible formatter
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Fix the bug in split_dataset
* Make eval_split_size configurable
* Change test_loader to use load_tts_samples function
* Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval
* Fix samplers unit test
* Add data unit test on GitHub workflow