Commit Graph

834 Commits

Author SHA1 Message Date
Eren Gölge 1a6a5710fd Make lint 2023-04-17 15:02:56 +02:00
Eren Gölge 2533a18d62 Add BN tests 2023-04-17 13:37:10 +02:00
Eren Gölge 2d49c05259 Remove import 2023-04-17 13:05:29 +02:00
Eren Gölge cd83991067 Add BN phonemizer 2023-04-17 12:54:00 +02:00
Matthew Boakes 4c829e74a1 Update Librosa Version To V0.10.0 2023-04-05 00:59:20 +01:00
Yingzhi WANG 95fa2c9fd6
fix typo (#2475) 2023-04-03 23:31:09 +02:00
p0p 91cf1b2da9
[minor] batch["speaker_ids"] getting set two times (#2470)
* [minor] batch["speaker_ids"] getting set two times

just to make it consistent with language_ids

* Update vits.py

style.
2023-04-03 11:35:21 +02:00
Eren Gölge d309f50e53
Implement FreeVC (#2451)
* Update .gitignore

* Draft FreeVC implementation

* Tests and relevant updates

* Update API tests

* Add missings

* Update requirements

* :(

* Lazy handle for vc

* Update docs for voice conversion

* Make style
2023-03-25 18:33:23 +01:00
Khalid Bashir 14c80dd1fd
vits.py training fixed due to return_complex (#2418)
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
    return model.train_step(*input_args)
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
    mel_slice_hat = wav_to_mel(
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
    spec = torch.stft(
  File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
2023-03-19 00:22:04 +01:00
Roee Shenberg 3c15f0619a
Bug fixes in OverFlow audio generation (#2380) 2023-03-15 12:02:11 +01:00
Daniel Vera Nieto dfb48737fb Style fixed 2023-03-13 16:11:15 +01:00
Dani Vera 0d12229b64
Update vits.py
This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.
2023-03-10 18:35:16 +01:00
manmay nakhashi 624513018d
add energy by default to Fastspeech2 config (#2326)
* add energy by default

* added energy to base tts

* fix energy dataset

* fix styles

* fix test
2023-03-06 10:20:25 +01:00
thennal10 d39bc74f57
OverFlow with test sentences (#2253)
* Fix typo in function definiton

* Swap hasattr out

hasattr(self, "speaker_manager")  and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
2023-03-01 09:11:30 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
Martin Weinelt 994be163e1
Use packaging.version for version comparisons (#2310)
* Use packaging.version for version comparisons

The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.

With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:

> TypeError: '<' not supported between instances of 'str' and 'int'

This is fixed by this migration.

[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/

* Improve espeak version detection robustness

On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.

This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.

This improves the version detection by simply looking for the version
after the "text-to-speech:" token.

* Replace distuils.copy_tree with shutil.copytree

The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
2023-01-29 23:47:00 +01:00
Gerard Sant Muniesa c59b3f75b8
Add Catalan text cleaners for Catalan support (#2295) 2023-01-23 11:56:30 +01:00
Shivam Mehta d83ee8fe45
Adding neural HMM TTS Model (#2272)
* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* Update the Trainer requirement version for a compatible one (#2276)

* Bump up to v0.10.2

* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* fixing documentation

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-01-23 11:53:04 +01:00
Eren Gölge 497f22b20b
Cache speaker encoder model (#2284) 2023-01-23 11:49:51 +01:00
Eren G??lge 6e3f74fc29 Fix #2191 2023-01-15 23:11:57 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Eren Gölge a9167cf239
Fixup overflow (#2218)
* Update overflow config

* Pulling shuffle and drop_last  from config

* Print training stats for overflow
2022-12-15 00:56:48 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
p0p4k 2e153d54a8
Adding missing key to formatter (#2194)
quick fix for #2156.
 added 'root_path' key.
2022-12-12 12:25:37 +01:00
Eren Gölge fdeefcc612
Handle espeak 1.48.15 (#2203) 2022-12-12 11:23:45 +01:00
Edresson Casanova ee20e30958 Fix VITS multi-speaker voice conversion inference 2022-12-05 09:15:01 -03:00
Eren Gölge 9321b22203
Fix scheduler order 2022-12-05 12:26:15 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
freezerain fcbfca869f
Fix back/forward slash in file path in mailabs formatter (#1938)
* mailabs formatter: back/forward slash in file path fix

* formatters.mailabs() path rework for Windows os

* new formatter added "mailabs_win"

* lint test fix commit

* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility

* Black small style fix
2022-11-01 12:54:40 +01:00
Victor Shepardson 5307a2229b
Fix Capacitron training (#2086) 2022-11-01 12:52:06 +01:00
Eren Gölge dae79b0acd
Remove `/` prefix from the relative path (#2065) 2022-10-10 13:32:27 +02:00
Eren Gölge 843fa6f3fa
Check num of columns in coqui format (#2066)
* Check 4 colums in coqui format

* Fix encoding

* Fixup
2022-10-10 12:13:32 +02:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Edresson Casanova d6ad9a05b4
Fix colliding dataset cache file names (#1994)
* Fix colliding dataset cache file names

* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Julian Weber 896e46d0e5
Fix vc (#1971) 2022-09-16 12:01:26 +02:00
Eren Gölge b95cf3363c
Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
happylittlecat 4546b4cbd8
Add espeak support for Chinese (#1905)
* fix description

* add espeak support for chinese

* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman 5abbe56642
Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Eren Gölge fcb0bb58ae
Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge 4333492341
Fix BCE loss issue (#1872)
* Fix BCE loss issue

* Remove import
2022-08-15 11:27:21 +02:00
manmay nakhashi e4db7c51b5
Update capacitron_layers.py (#1664)
crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
Eren Gölge d46fbc240c
Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi 7fd9b89ebf
fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia 75ac9e3f0c
Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00