Commit Graph

150 Commits

Author SHA1 Message Date
Edresson Casanova 4d3f23b5d3
Add CML-TTS dataset YourTTS training recipe (#2934) 2023-09-12 11:49:14 +02:00
Eren Gölge 69f080eb47
Fix DelightfulTTS (#2823)
* Fix tests

* Make style
2023-07-31 13:52:45 +02:00
Aleś Bułojčyk d124f78430
Recipe for Belarusian TTS (#2756)
* Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com>

* Recipe for Belarusian TTS

---------

Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>
2023-07-31 10:26:21 +02:00
logan hart 6fdb88f8e2
Add Delightful-TTS implementation (#2095)
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
manmay nakhashi 624513018d
add energy by default to Fastspeech2 config (#2326)
* add energy by default

* added energy to base tts

* fix energy dataset

* fix styles

* fix test
2023-03-06 10:20:25 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
p0p4k 2e153d54a8
Adding missing key to formatter (#2194)
quick fix for #2156.
 added 'root_path' key.
2022-12-12 12:25:37 +01:00
freezerain fcbfca869f
Fix back/forward slash in file path in mailabs formatter (#1938)
* mailabs formatter: back/forward slash in file path fix

* formatters.mailabs() path rework for Windows os

* new formatter added "mailabs_win"

* lint test fix commit

* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility

* Black small style fix
2022-11-01 12:54:40 +01:00
Eren Gölge dae79b0acd
Remove `/` prefix from the relative path (#2065) 2022-10-10 13:32:27 +02:00
Eren Gölge 843fa6f3fa
Check num of columns in coqui format (#2066)
* Check 4 colums in coqui format

* Fix encoding

* Fixup
2022-10-10 12:13:32 +02:00
Edresson Casanova d6ad9a05b4
Fix colliding dataset cache file names (#1994)
* Fix colliding dataset cache file names

* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Eren Gölge b95cf3363c
Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
harmlessman 5abbe56642
Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
WeberJulian 6126c23498
Add synpaflex formatter (#1616)
* Add synpaflex formatter

* Fix formatter

* Make style
2022-06-20 13:36:26 +02:00
WeberJulian f09ea11c71
Internal formatter (#1629)
* Add coqui formatter

* Make style
2022-06-08 14:31:03 +02:00
Noran Raskin a790df4e94
Training recipes for thorsten dataset (#1020)
* Fix style

* Fix isort

* Remove tensorboardX from requirements

Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com>
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2022-05-30 12:07:31 +02:00
Eren Gölge fd56fabb21
Fix #1380 (#1409) 2022-03-16 12:38:27 +01:00
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova 36e9ea2f97
Open bible dataset formatter (#1365)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference

* Fix the bug in find unique chars script

* Add OpenBible formatter

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-03-11 10:43:31 +01:00
Edresson Casanova f381e29b91
REBASED: Add support for the speaker encoder training using torch spectrograms (#1348)
* Add support for the speaker encoder training using torch spectrograms

* Remove useless function in speaker encoder dataset class
2022-03-10 14:54:51 +01:00
Eren Gölge c670365507 Fix VCTK recipe and formatter 2022-03-08 14:20:34 +01:00
Eren Gölge 1425a023fe Make style and lint 2022-03-02 13:25:35 +01:00
Eren Gölge 6a9f8074f0 Fix TTSDataset 2022-03-01 07:57:48 +01:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge ff23dce081 Update TTSDataset 2022-02-25 11:31:20 +01:00
Eren Gölge 750903d2ba Add VCTK formatter docstring 2022-02-25 11:30:24 +01:00
Eren Gölge 7dfd753d91 Add a cheap trick to avoid short audio clips 2022-02-25 11:26:59 +01:00
Eren Gölge 1f0c8179da Make style 2022-02-25 11:26:59 +01:00
Eren Gölge 1932401e8d Fix dataset preprocessing 2022-02-25 11:26:59 +01:00
Eren Gölge ef63c99524 Implement `start_by_longest` option for TTSDatase 2022-02-25 11:26:18 +01:00
Eren Gölge 47fbddc8d4 Fix docstring 2022-02-25 11:25:48 +01:00
Eren Gölge 2fe16de8e3 Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge 5176ae9e53 Fixes small compat. issues 2022-02-25 11:21:19 +01:00
Eren Gölge c0746f23df Fix `too many open files` 2022-02-25 11:16:30 +01:00
Eren Gölge df0d58bf09 Update VCTK recipes 2022-02-25 11:16:30 +01:00
Eren Gölge 28d98da422 Update VCTK formatter 2022-02-25 11:15:46 +01:00
Eren Gölge c9972e6f14 Make lint 2022-02-25 11:07:34 +01:00
Eren Gölge 90cc45dd4e Update data loader tests 2022-02-25 11:05:54 +01:00
Eren Gölge 04df0a3d9f Refactor TTSDataset 2022-02-25 11:05:06 +01:00
Eren Gölge b2bb954a51 Refactor TTSDataset to use TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge 196ae74273 Update data loader tests 2022-02-25 11:05:06 +01:00
Eren Gölge 98057a00ae Make style 2022-02-25 10:57:35 +01:00
Eren Gölge 176b712c1a Refactor TTSDataset 2022-02-25 10:57:35 +01:00
Eren Gölge e4049aa31a Refactor TTSDataset to use TTSTokenizer 2022-02-25 10:27:46 +01:00