* Don't install MeCab by default
* Add optional [ja] deps, like [dev] etc
* Add JA requirements file
* Add JA requirements to requirements_all
This should help the tests run.
commit dd612fd72e
Author: JiangCheng <jiangcheng@kezaihui.com>
Date: Mon Jun 5 16:04:54 2023 +0800
Failed to download the file and need to delete the created file path
* Draft ONNX export for VITS
Could not get it work to output variable length sequence
* Fixup for onnx constant output
* Make style
* Remove commented code
* initial commit
* Tortoise inference
* revert path change
* style fix
* remove accidental remove
* style fixes
* style fixes
* removed unwanted assests and deps
* remove changes
* remove cvvp
* style fix black
* added tortoise config and updated config and args, refactoring the code
* added tortoise to api
* Pull mel_norm from url
* Use TTS cleaners
* Let download model files
* add ability to pass tortoise presets through coqui api
* fix tests
* fix style and tests
* fix tts commandline for tortoise
* Add config.json to tortoise
* Use kwargs
* Use regular model api for loading tortoise
* Add load from dir to synthesizer
* Fix Tortoise floats
* Use model_dir when there are multiple urls
* Use `synthesize` when exists
* lint fixes and resolve preset bug
* resolve a download bug and update model link
* fix json
* do tortoise inference from voice dir
* fix
* fix test
* fix speaker id and remove assests
* update inference_tests.yml
* replace inference_test.yml
* fix extra dir as None
* fix tests
* remove space
* Reformat docstring
* Add docs
* Update docs
* lint fixes
---------
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
Python 3.11 complains about the mutable default and other members
were already adapted to use the factory, so I expect this line just
went unnoticed until now.
* Warn when lang is not avail
* Make style
* Implement Coqui Studio API
* Test
* Update docs
* Set action
* Make style
* Make lint
* Update README
* Make style
* Fix action
* Run actions
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
return model.train_step(*input_args)
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
mel_slice_hat = wav_to_mel(
File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
spec = torch.stft(
File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
* added basic Mary-TTS API endpoints to server
- imported `parse_qs` from `urllib.parse` to parse HTTP POST parameters
- imported `render_template_string` from `flask` to return text as endpoint result
- added new routes:
- `/locales` - returns list of locales (currently locale of active model)
- `/voices` - returns list of voices (currently locale and name of active model)
- `/process` - accepts synth. request (GET and POST) with parameter `INPUT_TEXT` (other parameters ignored since we have only one active model)
* better log messages for Mary-TTS API
- smaller tweaks to log output
* use f-string in log print to please linter
* updated server.py to match 'make style' result
* Fix typo in function definiton
* Swap hasattr out
hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
* Use packaging.version for version comparisons
The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.
With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:
> TypeError: '<' not supported between instances of 'str' and 'int'
This is fixed by this migration.
[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/
* Improve espeak version detection robustness
On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.
This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.
This improves the version detection by simply looking for the version
after the "text-to-speech:" token.
* Replace distuils.copy_tree with shutil.copytree
The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* Update the Trainer requirement version for a compatible one (#2276)
* Bump up to v0.10.2
* Adding neural HMM TTS
* Adding tests
* Adding neural hmm on readme
* renaming training recipe
* Removing overflow\s decoder parameters from the config
* fixing documentation
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fixed bug related to yourtts speaker embeddings issue
* Reverted code for base_tts
* Bug fix on VITS d_vector_file type
* Ignore the test speakers on YourTTS recipe
* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference
* Update YourTTS config file
* Update ModelManager._update_path to deal with list attributes
* Fix lint checks
* Remove unused code
* Fix unit tests
* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files
* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
* Adding pretrained Overflow model
* Stabilize HMM
* Fixup model manager
* Return `audio_unique_name` by default
* Distribute max split size over datasets
* Fixup eval_split_size
* Make style
* Add YourTTS VCTK recipe
* Fix lint
* Add compute_embeddings and resample_files functions to be able to reuse it
* Add automatic download and speaker embedding computation for YourTTS VCTK recipe
* Add parameter for eval metadata file on compute embeddings function
* Adding encoder
* currently modifying hmm
* Adding hmm
* Adding overflow
* Adding overflow setting up flat start
* Removing runs
* adding normalization parameters
* Fixing models on same device
* Training overflow and plotting evaluations
* Adding inference
* At the end of epoch the test sentences are coming on cpu instead of gpu
* Adding figures from model during training to monitor
* reverting tacotron2 training recipe
* fixing inference on gpu for test sentences on config
* moving helpers and texts within overflows source code
* renaming to overflow
* moving loss to the model file
* Fixing the rename
* Model training but not plotting the test config sentences's audios
* Formatting logs
* Changing model name to camelcase
* Fixing test log
* Fixing plotting bug
* Adding some tests
* Adding more tests to overflow
* Adding all tests for overflow
* making changes to camel case in config
* Adding information about parameters and docstring
* removing compute_mel_statistics moved statistic computation to the model instead
* Added overflow in readme
* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* mailabs formatter: back/forward slash in file path fix
* formatters.mailabs() path rework for Windows os
* new formatter added "mailabs_win"
* lint test fix commit
* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility
* Black small style fix
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
* Update requirements.txt
install jamo for korean
* Update formatters.py
add KSS formatter
KSS is a korean single speech dataset (12hours)
* Add files via upload
add phonemizer for korean
* Add files via upload
add korean phonemizer
* Update requirements.txt
* change code style with `black` and `pylint`
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation-2
* isort
* edit about separator
write test case and add 'nltk' for requirements.txt
* add korean g2p (g2pkk)
* isort
* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)
TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)
* black
* fix imports in tune_wavegrad
* load_config returns Coqpit object instead None
* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program
* fix var order in the result of batch collating
* make style
* make style with black and isort
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.
This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].
This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.
[1] https://github.com/mozilla/TTS/issues/734
* Update wavenet.py
Current version does not use "in_channels" argument.
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor.
However, since it is a generic implementation, I believe it is better to update it for a more general use.
* "in_channels -> hidden_channels"
* Fix for Floor Function Warning
Fix for Floor Function Warning
* Adding double quotes to fix formatting
Adding double quotes to fix formatting
* Update glow_tts.py
* Update glow_tts.py