Commit Graph

1652 Commits

Author SHA1 Message Date
Eren Gölge 14d45b5347
Bump up to v0.10.2 2023-01-11 01:06:02 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Eren G??lge f814d52394 Bump up to v0.10.1 2022-12-26 14:29:46 +01:00
Eren G??lge 8c32a6998a Add pth files to manager 2022-12-26 14:29:25 +01:00
Eren G??lge cf765cb3f2 Add ca and fa models 2022-12-26 14:29:10 +01:00
Eren G??lge 46b0ad37e7 Bump up to v0.10.0 2022-12-15 11:19:23 +01:00
Eren Gölge a9167cf239
Fixup overflow (#2218)
* Update overflow config

* Pulling shuffle and drop_last  from config

* Print training stats for overflow
2022-12-15 00:56:48 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
Edresson Casanova 3b1a28fa95
Add YourTTS VCTK recipe (#2198)
* Add YourTTS VCTK recipe

* Fix lint

* Add compute_embeddings and resample_files functions to be able to reuse it

* Add automatic download and speaker embedding computation for YourTTS VCTK recipe

* Add parameter for eval metadata file on compute embeddings function
2022-12-12 16:14:25 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
p0p4k 2e153d54a8
Adding missing key to formatter (#2194)
quick fix for #2156.
 added 'root_path' key.
2022-12-12 12:25:37 +01:00
Eren Gölge 1ddc484b49
Python API implementation (#2195)
* Draft implementation

* Fix style

* Add api tests

* Fix lint

* Update docs

* Update tests

* Set env

* Fixup

* Fixup

* Fix lint

* Revert
2022-12-12 12:04:20 +01:00
Eren Gölge fdeefcc612
Handle espeak 1.48.15 (#2203) 2022-12-12 11:23:45 +01:00
Edresson Casanova ee20e30958 Fix VITS multi-speaker voice conversion inference 2022-12-05 09:15:01 -03:00
Eren Gölge 9321b22203
Fix scheduler order 2022-12-05 12:26:15 +01:00
Eren G??lge bc6120c330 [ci skip]Bump up to v0.9.0 2022-11-16 16:45:02 +01:00
logan hart ff9b63d02a
Add neon models (#2140)
* Add neon ljspeech vits model

* Add neon german model

* Update .models.json

* Add neon spanish model

* Add french model

* Add Dutch model

* Add Hungarian model

* Add Greek model

* Remove uneeded description

* Update .models.json

* Update .models.json

* Handling neon models

* Add all neon models

* Update .models.json

* Split zoo_tests

* Update test names

* Update model testing

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-11-16 16:12:39 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
Eren G??lge b686c09704 Fix #2062 2022-11-07 09:22:43 +01:00
freezerain fcbfca869f
Fix back/forward slash in file path in mailabs formatter (#1938)
* mailabs formatter: back/forward slash in file path fix

* formatters.mailabs() path rework for Windows os

* new formatter added "mailabs_win"

* lint test fix commit

* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility

* Black small style fix
2022-11-01 12:54:40 +01:00
Victor Shepardson 5307a2229b
Fix Capacitron training (#2086) 2022-11-01 12:52:06 +01:00
Eren Gölge dae79b0acd
Remove `/` prefix from the relative path (#2065) 2022-10-10 13:32:27 +02:00
Eren Gölge 843fa6f3fa
Check num of columns in coqui format (#2066)
* Check 4 colums in coqui format

* Fix encoding

* Fixup
2022-10-10 12:13:32 +02:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Eren Gölge 5f5d441ee5
Write non-speech files in a TXT (#2048)
* Write non-speech files in a txt

* Save 16-bit wav out of vad
2022-10-06 13:25:54 +02:00
Edresson Casanova d6ad9a05b4
Fix colliding dataset cache file names (#1994)
* Fix colliding dataset cache file names

* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Eren Gölge 0a112f7841
Add metafile arg (#1977) 2022-09-16 14:41:49 +02:00
Julian Weber 896e46d0e5
Fix vc (#1971) 2022-09-16 12:01:26 +02:00
Eren Gölge b95cf3363c
Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova 371772c355
Replace pyworld by pyin (#1946)
* Replace pyworld by pyin

* Fix unit tests
2022-09-09 10:43:14 +02:00
happylittlecat 4546b4cbd8
Add espeak support for Chinese (#1905)
* fix description

* add espeak support for chinese

* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman 5abbe56642
Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Edresson Casanova 159eeeef64
Fix find unique phonemes script (#1928)
* Fix find unique phonemes script

* Fix unit tests
2022-09-08 10:17:35 +02:00
KyuubiYoru 3b7dff568a
Fixes a race condition with multiple simultaneous get requests. (#1807)
* Fixes a race condition with multiple simultaneous get requests.

* Removed unused import

* Removed unused threading import

* Changed lock style to notation

* make style

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-09-08 10:16:16 +02:00
Julian Weber bb59718c03
Add capacitron v2 model (#1768)
* Add capacitron v2 in .models.json

* Put right commit hash
2022-09-08 09:43:56 +02:00
Edresson Casanova 096b35f639
Add VCTK speaker encoder recipe (#1912) 2022-08-26 16:19:03 +02:00
Eren Gölge e5430a6519
Add new DE Thorsten models (#1898)
- Tacotron2-DDC
- HifiGAN vocoder
2022-08-22 11:27:39 +02:00
Eren G??lge 8845f06fd9 Bump up to v0.8.0 2022-08-22 11:26:47 +02:00
Stanislav Kachnov 2c9f00a808
Fix tune wavegrad (#1844)
* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort
2022-08-22 09:55:32 +02:00
Eren Gölge fcb0bb58ae
Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge 7442bcefa5
Remove deprecated files (#1873)
- samplers.py is moved
- distribute.py is replaces by the 👟Trainer
2022-08-15 12:16:37 +02:00
Eren Gölge 4333492341
Fix BCE loss issue (#1872)
* Fix BCE loss issue

* Remove import
2022-08-15 11:27:21 +02:00
manmay nakhashi e4db7c51b5
Update capacitron_layers.py (#1664)
crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
Eren Gölge d46fbc240c
Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi 7fd9b89ebf
fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia 75ac9e3f0c
Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00