* Draft ONNX export for VITS
Could not get it work to output variable length sequence
* Fixup for onnx constant output
* Make style
* Remove commented code
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
return model.train_step(*input_args)
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
mel_slice_hat = wav_to_mel(
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
spec = torch.stft(
File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
* Fixed bug related to yourtts speaker embeddings issue
* Reverted code for base_tts
* Bug fix on VITS d_vector_file type
* Ignore the test speakers on YourTTS recipe
* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference
* Update YourTTS config file
* Update ModelManager._update_path to deal with list attributes
* Fix lint checks
* Remove unused code
* Fix unit tests
* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files
* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config