Edresson Casanova
f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support ( #1349 )
...
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova
36e9ea2f97
Open bible dataset formatter ( #1365 )
...
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Fix the bug in find unique chars script
* Add OpenBible formatter
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-03-11 10:43:31 +01:00
Edresson Casanova
dbe9da7f15
Add Voice conversion inference support ( #1337 )
...
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova
917f417ac4
Add alphas to control language and speaker balancer ( #1216 )
...
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Edresson Casanova
f381e29b91
REBASED: Add support for the speaker encoder training using torch spectrograms ( #1348 )
...
* Add support for the speaker encoder training using torch spectrograms
* Remove useless function in speaker encoder dataset class
2022-03-10 14:54:51 +01:00
Eren Gölge
c670365507
Fix VCTK recipe and formatter
2022-03-08 14:20:34 +01:00
Eren Gölge
8feb41d361
Bump up to v0.6.1
2022-03-07 15:57:44 +01:00
Eren Gölge
ee02bc3823
Bump up to v0.6.0
2022-03-07 12:08:22 +01:00
Eren Gölge
dc280819be
Add new models
2022-03-07 12:08:09 +01:00
Eren Gölge
e9d9028b4d
Revert cleaner name
2022-03-06 12:57:06 +01:00
Eren Gölge
764c7fa4a4
Rename phoneme_cleaners
2022-03-06 12:09:54 +01:00
Eren Gölge
dd4287de1f
Update models
2022-03-03 20:23:00 +01:00
Eren Gölge
6cb00be795
Update your_tts model URL
2022-03-02 18:04:49 +01:00
Eren Gölge
1425a023fe
Make style and lint
2022-03-02 13:25:35 +01:00
Eren Gölge
c68885b3fd
Update Vits speaker encoder init
2022-03-02 13:20:23 +01:00
Eren Gölge
27b67b7945
Fix import
2022-03-02 09:15:20 +01:00
Eren Gölge
942df0fb05
Update vits dataset
2022-03-02 09:14:32 +01:00
Eren Gölge
6a9f8074f0
Fix TTSDataset
2022-03-01 07:57:48 +01:00
Eren Gölge
690de1ab06
Update Characters and add more tests
2022-02-25 11:32:44 +01:00
Eren Gölge
9063397892
Fix FastSpeech config
2022-02-25 11:31:56 +01:00
Eren Gölge
1e414b3a09
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
acc83cd3e6
Update Vits model API
2022-02-25 11:31:56 +01:00
Eren Gölge
fe656659be
Implement BaseTTS
2022-02-25 11:31:56 +01:00
Eren Gölge
bed4afd4ee
Implement BaseVocabulary
2022-02-25 11:31:56 +01:00
Eren Gölge
e0f9be76c0
Update test_run in wavernn and wavegrad
2022-02-25 11:31:56 +01:00
Eren Gölge
bf540f4323
Update imports for trainer
2022-02-25 11:31:56 +01:00
Eren Gölge
4c43eda414
Update BaseTrainerModel
2022-02-25 11:31:56 +01:00
Eren Gölge
83c5ddc5b7
Update imports
2022-02-25 11:31:56 +01:00
Eren Gölge
14c117978d
Fix return outputs
2022-02-25 11:31:56 +01:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
8b3ba02c95
Add vocab_dict to model config
2022-02-25 11:31:20 +01:00
Eren Gölge
ff23dce081
Update TTSDataset
2022-02-25 11:31:20 +01:00
Eren Gölge
750903d2ba
Add VCTK formatter docstring
2022-02-25 11:30:24 +01:00
Eren Gölge
52a7896668
Update VITS loss
2022-02-25 11:30:24 +01:00
Eren Gölge
c68962c574
Update forward tts binary loss
2022-02-25 11:30:24 +01:00
Eren Gölge
c11944022d
Revert back again rand_segment
2022-02-25 11:30:24 +01:00
Eren Gölge
00c7600103
Update Vits model API
2022-02-25 11:30:24 +01:00
Eren Gölge
935a604046
Delete trainer_utils
2022-02-25 11:29:41 +01:00
Eren Gölge
d0c27a9661
Update synthesis.py
2022-02-25 11:29:41 +01:00
Eren Gölge
35fc7270ff
Implement BaseTTS
2022-02-25 11:28:47 +01:00
Eren Gölge
2bad098625
Implement BaseVocabulary
2022-02-25 11:28:47 +01:00
Eren Gölge
833de62e30
Update base_vocoder
2022-02-25 11:28:14 +01:00
Eren Gölge
fc3b6d2861
Update gan
2022-02-25 11:28:14 +01:00
Eren Gölge
20a677c623
Update test_run in wavernn and wavegrad
2022-02-25 11:28:14 +01:00
Eren Gölge
be3a03126a
Update imports for trainer
2022-02-25 11:28:14 +01:00
Eren Gölge
c911729896
Update BaseTrainerModel
2022-02-25 11:28:14 +01:00
Eren Gölge
1e219fef0a
Revert drop_last
2022-02-25 11:26:59 +01:00
Eren Gölge
7dfd753d91
Add a cheap trick to avoid short audio clips
2022-02-25 11:26:59 +01:00
Eren Gölge
1a43e05460
Fix VITS loss bug
...
Fake and real features were given in the wrong args order to
the loss function
2022-02-25 11:26:59 +01:00
Eren Gölge
4b96bfe925
Fix train logging
2022-02-25 11:26:59 +01:00
Eren Gölge
ab8a4ca2c3
Revert random segment
2022-02-25 11:26:59 +01:00
Eren Gölge
8622226f3f
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
27db089d6c
Change TrainingArgs -> TrainerArgs
2022-02-25 11:26:59 +01:00
Eren Gölge
aa81454721
Update BaseTrainingConfig
2022-02-25 11:26:59 +01:00
Eren Gölge
d3a58ed07a
Fix default values
2022-02-25 11:26:59 +01:00
Eren Gölge
54c6bb2a8c
Fix add speaker VITS
2022-02-25 11:26:59 +01:00
Eren Gölge
590b04fb89
Fix espeak_wrapper
2022-02-25 11:26:59 +01:00
Eren Gölge
a013566d15
Delete trainer related code
2022-02-25 11:26:59 +01:00
Eren Gölge
38314194e7
Set `drop_last`
2022-02-25 11:26:59 +01:00
Eren Gölge
f70e4bb8c6
Add new speakers to the vits model
2022-02-25 11:26:59 +01:00
Eren Gölge
d5c0e17548
Load right char class dynamically
2022-02-25 11:26:59 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
b3ed6ff6b7
Update FastPitchConfig
2022-02-25 11:26:59 +01:00
Eren Gölge
1932401e8d
Fix dataset preprocessing
2022-02-25 11:26:59 +01:00
Eren Gölge
34c4be5e49
Update forwardtts
2022-02-25 11:26:59 +01:00
Eren Gölge
bb37462794
Update language manager
2022-02-25 11:26:59 +01:00
Eren Gölge
5169d4eb32
Plot pitch over input characters
2022-02-25 11:26:59 +01:00
Eren Gölge
cd5d1497cf
Add pitch_fmin pitch_fmax args to the audio
2022-02-25 11:26:59 +01:00
Eren Gölge
1445a46e9e
Update synthesizer to use iinit_from_config
2022-02-25 11:26:59 +01:00
Eren Gölge
7058fcc3ff
Take file extension as an argument
2022-02-25 11:26:59 +01:00
Eren Gölge
13482dde1f
Update GAN model
2022-02-25 11:26:59 +01:00
Eren Gölge
2829027d8b
Refactor VITS model
2022-02-25 11:26:59 +01:00
Eren Gölge
ef63c99524
Implement `start_by_longest` option for TTSDatase
2022-02-25 11:26:18 +01:00
Eren Gölge
c4c471d61d
Allow padding for shorter segments
2022-02-25 11:25:48 +01:00
Eren Gölge
47fbddc8d4
Fix docstring
2022-02-25 11:25:48 +01:00
Eren Gölge
bc2243bac4
Fix tests
2022-02-25 11:25:00 +01:00
Eren Gölge
146fbfd7c9
Extend unittests
2022-02-25 11:25:00 +01:00
Eren Gölge
2fe16de8e3
Make lint
2022-02-25 11:25:00 +01:00
Eren Gölge
7b49a4aa2b
Fix glow_tts_config missing field
2022-02-25 11:24:13 +01:00
Eren Gölge
07b0a80d57
Fix tokenizer init_from_config
2022-02-25 11:24:13 +01:00
Eren Gölge
50e17097a7
Add verbose option to AudioProcessor
2022-02-25 11:24:13 +01:00
Eren Gölge
235f7d9b02
Extend glow_tts model tests
2022-02-25 11:24:13 +01:00
Eren Gölge
8e248913d6
Update train_tts for the new API
2022-02-25 11:24:13 +01:00
Eren Gölge
001da8afc8
Update Vits for the new model API
2022-02-25 11:21:19 +01:00
Eren Gölge
5176ae9e53
Fixes small compat. issues
2022-02-25 11:21:19 +01:00
Eren Gölge
131bc0cfc0
Fix synthesis.py 🔧
2022-02-25 11:18:00 +01:00
Eren Gölge
c0746f23df
Fix `too many open files`
2022-02-25 11:16:30 +01:00
Eren Gölge
df0d58bf09
Update VCTK recipes
2022-02-25 11:16:30 +01:00
Eren Gölge
730f7c0df4
Add file_ext args to resample.py
2022-02-25 11:15:46 +01:00
Eren Gölge
28d98da422
Update VCTK formatter
2022-02-25 11:15:46 +01:00
Eren Gölge
4d99fee3e2
Update spec extractor
2022-02-25 11:12:44 +01:00
Eren Gölge
38a0b3b6c7
Update train_tts.py
2022-02-25 11:11:35 +01:00
Eren Gölge
cfaa51fddc
Update BaseTTS config
2022-02-25 11:11:35 +01:00
Eren Gölge
4c5cb44eeb
Update setup_model
2022-02-25 11:11:35 +01:00
Eren Gölge
7c4243fba7
Update GlowTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
bacf79f4fb
Update AlignTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
18f726af65
Update ForwardTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
d0ec4b91e5
Update Tacotron models
2022-02-25 11:11:35 +01:00
Eren Gölge
ea965a5683
Update VITS for the new API
2022-02-25 11:11:35 +01:00
Eren Gölge
f802a931a3
Pass samples to init_from_config in SpeakerManager
2022-02-25 11:07:34 +01:00