* Don't install MeCab by default
* Add optional [ja] deps, like [dev] etc
* Add JA requirements file
* Add JA requirements to requirements_all
This should help the tests run.
commit dd612fd72e
Author: JiangCheng <jiangcheng@kezaihui.com>
Date: Mon Jun 5 16:04:54 2023 +0800
Failed to download the file and need to delete the created file path
* Draft ONNX export for VITS
Could not get it work to output variable length sequence
* Fixup for onnx constant output
* Make style
* Remove commented code
* initial commit
* Tortoise inference
* revert path change
* style fix
* remove accidental remove
* style fixes
* style fixes
* removed unwanted assests and deps
* remove changes
* remove cvvp
* style fix black
* added tortoise config and updated config and args, refactoring the code
* added tortoise to api
* Pull mel_norm from url
* Use TTS cleaners
* Let download model files
* add ability to pass tortoise presets through coqui api
* fix tests
* fix style and tests
* fix tts commandline for tortoise
* Add config.json to tortoise
* Use kwargs
* Use regular model api for loading tortoise
* Add load from dir to synthesizer
* Fix Tortoise floats
* Use model_dir when there are multiple urls
* Use `synthesize` when exists
* lint fixes and resolve preset bug
* resolve a download bug and update model link
* fix json
* do tortoise inference from voice dir
* fix
* fix test
* fix speaker id and remove assests
* update inference_tests.yml
* replace inference_test.yml
* fix extra dir as None
* fix tests
* remove space
* Reformat docstring
* Add docs
* Update docs
* lint fixes
---------
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
Python 3.11 complains about the mutable default and other members
were already adapted to use the factory, so I expect this line just
went unnoticed until now.
* Warn when lang is not avail
* Make style
* Implement Coqui Studio API
* Test
* Update docs
* Set action
* Make style
* Make lint
* Update README
* Make style
* Fix action
* Run actions
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
return model.train_step(*input_args)
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
mel_slice_hat = wav_to_mel(
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
spec = torch.stft(
File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
* added basic Mary-TTS API endpoints to server
- imported `parse_qs` from `urllib.parse` to parse HTTP POST parameters
- imported `render_template_string` from `flask` to return text as endpoint result
- added new routes:
- `/locales` - returns list of locales (currently locale of active model)
- `/voices` - returns list of voices (currently locale and name of active model)
- `/process` - accepts synth. request (GET and POST) with parameter `INPUT_TEXT` (other parameters ignored since we have only one active model)
* better log messages for Mary-TTS API
- smaller tweaks to log output
* use f-string in log print to please linter
* updated server.py to match 'make style' result
* Fix typo in function definiton
* Swap hasattr out
hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
* Use packaging.version for version comparisons
The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.
With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:
> TypeError: '<' not supported between instances of 'str' and 'int'
This is fixed by this migration.
[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/
* Improve espeak version detection robustness
On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.
This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.
This improves the version detection by simply looking for the version
after the "text-to-speech:" token.
* Replace distuils.copy_tree with shutil.copytree
The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* Update the Trainer requirement version for a compatible one (#2276)
* Bump up to v0.10.2
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* fixing documentation
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fixed bug related to yourtts speaker embeddings issue
* Reverted code for base_tts
* Bug fix on VITS d_vector_file type
* Ignore the test speakers on YourTTS recipe
* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference
* Update YourTTS config file
* Update ModelManager._update_path to deal with list attributes
* Fix lint checks
* Remove unused code
* Fix unit tests
* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files
* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
* Adding pretrained Overflow model
* Stabilize HMM
* Fixup model manager
* Return `audio_unique_name` by default
* Distribute max split size over datasets
* Fixup eval_split_size
* Make style
* Add YourTTS VCTK recipe
* Fix lint
* Add compute_embeddings and resample_files functions to be able to reuse it
* Add automatic download and speaker embedding computation for YourTTS VCTK recipe
* Add parameter for eval metadata file on compute embeddings function
* Adding encoder
* currently modifying hmm
* Adding hmm
* Adding overflow
* Adding overflow setting up flat start
* Removing runs
* adding normalization parameters
* Fixing models on same device
* Training overflow and plotting evaluations
* Adding inference
* At the end of epoch the test sentences are coming on cpu instead of gpu
* Adding figures from model during training to monitor
* reverting tacotron2 training recipe
* fixing inference on gpu for test sentences on config
* moving helpers and texts within overflows source code
* renaming to overflow
* moving loss to the model file
* Fixing the rename
* Model training but not plotting the test config sentences's audios
* Formatting logs
* Changing model name to camelcase
* Fixing test log
* Fixing plotting bug
* Adding some tests
* Adding more tests to overflow
* Adding all tests for overflow
* making changes to camel case in config
* Adding information about parameters and docstring
* removing compute_mel_statistics moved statistic computation to the model instead
* Added overflow in readme
* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* mailabs formatter: back/forward slash in file path fix
* formatters.mailabs() path rework for Windows os
* new formatter added "mailabs_win"
* lint test fix commit
* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility
* Black small style fix
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter