Edresson Casanova
5dcc16d193
Bug fix in MP3 and FLAC compute length on TTSDataset ( #3092 )
...
* Bug Fix on XTTS load
* Bug fix in MP3 length on TTSDataset
* Update TTS/tts/datasets/dataset.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Uses mutagen for all audio formats
* Add dataloader test wit hall supported audio formats
* Use mutagen.File
* Update
* Fix aux unit tests
* Bug fixe on unit tests
---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
2023-12-27 13:23:43 -03:00
logan hart
6fdb88f8e2
Add Delightful-TTS implementation ( #2095 )
...
* add configs
* Update config file
* Add model configs
* Add model layers
* Add layer files
* Add layer modules
* change config names
* Add emotion manager
* fIX missing ap bug
* Fix missing ap bug
* Add base TTS e2e class
* Fix wrong variable name in load_tts_samples
* Add training script
* Remove range predictor and gaussian upsampling
* Add helper function
* Add vctk recipe
* Add conformer docs
* Fix linting in conformer.py
* Add Docs
* remove duplicate import
* refactor args
* Fix bugs
* Removew emotion embedding
* remove unused arg
* Remove emotion embedding arg
* Remove emotion embedding arg
* fix style issues
* Fix bugs
* Fix bugs
* Add unittests
* make style
* fix formatter bug
* fix test
* Add pyworld compute pitch func
* Update requirments.txt
* Fix dataset Bug
* Chnge layer norm to instance norm
* Add missing import
* Remove emotions.py
* remove ssim loss
* Add init layers func to aligner
* refactor model layers
* remove audio_config arg
* Rename loss func
* Rename to delightful-tts
* Rename loss func
* Remove unused modules
* refactor imports
* replace audio config with audio processor
* Add change sample rate option
* remove broken resample func
* update recipe
* fix style, add config docs
* fix tests and multispeaker embd dim
* remove pyworld
* Make style and fix inference
* Split tts tests
* Fixup
* Fixup
* Fixup
* Add argument names
* Set "random" speaker in the model Tortoise/Bark
* Use a diff f0_cache path for delightfull tts
* Fix delightful speaker handling
* Fix lint
* Make style
---------
Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
manmay nakhashi
624513018d
add energy by default to Fastspeech2 config ( #2326 )
...
* add energy by default
* added energy to base tts
* fix energy dataset
* fix styles
* fix test
2023-03-06 10:20:25 +01:00
Eren Gölge
914280a556
Bump up to v0.11.0 ( #2329 )
...
* Make style
* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
manmay nakhashi
bc422f2f3c
Fastspeech2 ( #2073 )
...
* added EnergyDataset
* add energy to Dataset
* add comupte_energy
* added energy params
* added energy to forward_tts
* added plot_avg_energy for visualisation
* Update forward_tts.py
* create file
* added fastspeech2 recipe
* add fastspeech2 config
* removed energy from fast pitch
* add energy loss to forward tts
* Update fastspeech2_config.py
* change run_name
* Update numpy_transforms.py
* fix typo
* fix typo
* fix typo
* linting issues
* use_energy default value --> False
* Update numpy_transforms.py
* linting fixes
* fix typo
* liniting_fix
* liniting_fix
* fix
* fixes
* fixes
* lint fix
* lint fixws
* added training test
* wrong import
* wrong import
* trailing whitespace
* style fix
* changed class name because of error
* class name change
* class name change
* change class name
* fixed styles
2023-01-15 22:39:22 +01:00
Julian Weber
a07397733b
Multilingual tokenizer ( #2229 )
...
* Implement multilingual tokenizer
* Add multi_phonemizer receipe
* Fix lint
* Add TestMultiPhonemizer
* Fix lint
* make style
2023-01-02 10:03:19 +01:00
Eren Gölge
ecea43ec81
Adding pre-trained Overflow model ( #2211 )
...
* Adding pretrained Overflow model
* Stabilize HMM
* Fixup model manager
* Return `audio_unique_name` by default
* Distribute max split size over datasets
* Fixup eval_split_size
* Make style
2022-12-14 16:55:48 +01:00
Edresson Casanova
d6ad9a05b4
Fix colliding dataset cache file names ( #1994 )
...
* Fix colliding dataset cache file names
* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova
3faccbda97
Fix dataset handling with the new embedding file keys ( #1991 )
2022-09-19 23:44:14 +02:00
Eren Gölge
6a9f8074f0
Fix TTSDataset
2022-03-01 07:57:48 +01:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
ff23dce081
Update TTSDataset
2022-02-25 11:31:20 +01:00
Eren Gölge
7dfd753d91
Add a cheap trick to avoid short audio clips
2022-02-25 11:26:59 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
1932401e8d
Fix dataset preprocessing
2022-02-25 11:26:59 +01:00
Eren Gölge
ef63c99524
Implement `start_by_longest` option for TTSDatase
2022-02-25 11:26:18 +01:00
Eren Gölge
5176ae9e53
Fixes small compat. issues
2022-02-25 11:21:19 +01:00
Eren Gölge
c0746f23df
Fix `too many open files`
2022-02-25 11:16:30 +01:00
Eren Gölge
c9972e6f14
Make lint
2022-02-25 11:07:34 +01:00
Eren Gölge
90cc45dd4e
Update data loader tests
2022-02-25 11:05:54 +01:00
Eren Gölge
04df0a3d9f
Refactor TTSDataset ⚡ ️
2022-02-25 11:05:06 +01:00
Eren Gölge
b2bb954a51
Refactor TTSDataset to use TTSTokenizer
2022-02-25 11:05:06 +01:00
Eren Gölge
196ae74273
Update data loader tests
2022-02-25 11:05:06 +01:00
Eren Gölge
98057a00ae
Make style
2022-02-25 10:57:35 +01:00
Eren Gölge
176b712c1a
Refactor TTSDataset ⚡ ️
2022-02-25 10:57:35 +01:00
Eren Gölge
e4049aa31a
Refactor TTSDataset to use TTSTokenizer
2022-02-25 10:27:46 +01:00
Eren Gölge
127118c637
Update TTS.tts formatters ( #1228 )
...
* Return Dict from tts formatters
* Make style
2022-02-11 23:03:43 +01:00
Eren Gölge
d724984be1
Fix language assignment
2022-01-02 11:11:24 +00:00
WeberJulian
a63998c048
Fix phoneme language
2022-01-01 21:08:13 +01:00
Eren Gölge
704dddcffa
Make style
2021-12-20 11:54:10 +00:00
WeberJulian
631addf33b
fix d-vector
2021-12-20 11:54:10 +00:00
WeberJulian
120332d53f
Fix phonemes
2021-12-20 11:54:10 +00:00
WeberJulian
1340938159
fix phonemes per language
2021-12-20 11:54:10 +00:00
WeberJulian
1472b6df49
make style
2021-12-20 11:54:10 +00:00
WeberJulian
0804806727
fix f0_cache_path in dataset
2021-12-20 11:54:10 +00:00
WeberJulian
3b5592abcf
fix test vits
2021-12-20 11:54:10 +00:00
WeberJulian
2a2b5767c2
fix collate_fn
2021-12-20 11:54:10 +00:00
Julian WEBER
78c2d12a91
PitchExtractor
2021-12-20 11:54:10 +00:00
Julian WEBER
b3abd01793
Merge dataset
2021-12-20 11:54:10 +00:00
Edresson
f1f016314e
Fix the bug in M-AILABS formatter
2021-12-20 11:54:09 +00:00
Edresson
f996afedb0
Implement multilingual dataloader support
2021-12-20 11:54:09 +00:00
Eren Gölge
82fed4add2
Make style
2021-10-21 16:05:51 +00:00
Eren Gölge
a0a5d580e9
Approximate audio length from file size
2021-10-18 08:54:02 +00:00
Eren Gölge
8ada870a57
Refactor `trainer.py` for v2
2021-09-30 14:16:34 +00:00