* Draft ONNX export for VITS
Could not get it work to output variable length sequence
* Fixup for onnx constant output
* Make style
* Remove commented code
* initial commit
* Tortoise inference
* revert path change
* style fix
* remove accidental remove
* style fixes
* style fixes
* removed unwanted assests and deps
* remove changes
* remove cvvp
* style fix black
* added tortoise config and updated config and args, refactoring the code
* added tortoise to api
* Pull mel_norm from url
* Use TTS cleaners
* Let download model files
* add ability to pass tortoise presets through coqui api
* fix tests
* fix style and tests
* fix tts commandline for tortoise
* Add config.json to tortoise
* Use kwargs
* Use regular model api for loading tortoise
* Add load from dir to synthesizer
* Fix Tortoise floats
* Use model_dir when there are multiple urls
* Use `synthesize` when exists
* lint fixes and resolve preset bug
* resolve a download bug and update model link
* fix json
* do tortoise inference from voice dir
* fix
* fix test
* fix speaker id and remove assests
* update inference_tests.yml
* replace inference_test.yml
* fix extra dir as None
* fix tests
* remove space
* Reformat docstring
* Add docs
* Update docs
* lint fixes
---------
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
return model.train_step(*input_args)
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
mel_slice_hat = wav_to_mel(
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
spec = torch.stft(
File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
* Fix typo in function definiton
* Swap hasattr out
hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* Update the Trainer requirement version for a compatible one (#2276)
* Bump up to v0.10.2
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* fixing documentation
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fixed bug related to yourtts speaker embeddings issue
* Reverted code for base_tts
* Bug fix on VITS d_vector_file type
* Ignore the test speakers on YourTTS recipe
* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference
* Update YourTTS config file
* Update ModelManager._update_path to deal with list attributes
* Fix lint checks
* Remove unused code
* Fix unit tests
* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files
* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
* Adding pretrained Overflow model
* Stabilize HMM
* Fixup model manager
* Return `audio_unique_name` by default
* Distribute max split size over datasets
* Fixup eval_split_size
* Make style
* Adding encoder
* currently modifying hmm
* Adding hmm
* Adding overflow
* Adding overflow setting up flat start
* Removing runs
* adding normalization parameters
* Fixing models on same device
* Training overflow and plotting evaluations
* Adding inference
* At the end of epoch the test sentences are coming on cpu instead of gpu
* Adding figures from model during training to monitor
* reverting tacotron2 training recipe
* fixing inference on gpu for test sentences on config
* moving helpers and texts within overflows source code
* renaming to overflow
* moving loss to the model file
* Fixing the rename
* Model training but not plotting the test config sentences's audios
* Formatting logs
* Changing model name to camelcase
* Fixing test log
* Fixing plotting bug
* Adding some tests
* Adding more tests to overflow
* Adding all tests for overflow
* making changes to camel case in config
* Adding information about parameters and docstring
* removing compute_mel_statistics moved statistic computation to the model instead
* Added overflow in readme
* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter