Edresson Casanova
8d228ab22a
Trick to Upsampling to High sampling rates using VITS model ( #1456 )
...
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova
060e0f9368
Add EmbeddingManager and BaseIDManager ( #1374 )
2022-03-31 13:41:16 +02:00
Edresson Casanova
37896e1743
Bug fix in freeze encoder ( #1391 )
...
* Fix the bug in freeze encoder
* Remove emb_l definition for non-multilingual training
* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge
0870a4faa2
Make style ( #1405 )
2022-03-16 12:13:55 +01:00
Edresson Casanova
dbe9da7f15
Add Voice conversion inference support ( #1337 )
...
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova
917f417ac4
Add alphas to control language and speaker balancer ( #1216 )
...
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge
dd4287de1f
Update models
2022-03-03 20:23:00 +01:00
Eren Gölge
1425a023fe
Make style and lint
2022-03-02 13:25:35 +01:00
Eren Gölge
c68885b3fd
Update Vits speaker encoder init
2022-03-02 13:20:23 +01:00
Eren Gölge
942df0fb05
Update vits dataset
2022-03-02 09:14:32 +01:00
Eren Gölge
1e414b3a09
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
acc83cd3e6
Update Vits model API
2022-02-25 11:31:56 +01:00
Eren Gölge
83c5ddc5b7
Update imports
2022-02-25 11:31:56 +01:00
Eren Gölge
14c117978d
Fix return outputs
2022-02-25 11:31:56 +01:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
00c7600103
Update Vits model API
2022-02-25 11:30:24 +01:00
Eren Gölge
4b96bfe925
Fix train logging
2022-02-25 11:26:59 +01:00
Eren Gölge
ab8a4ca2c3
Revert random segment
2022-02-25 11:26:59 +01:00
Eren Gölge
8622226f3f
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
54c6bb2a8c
Fix add speaker VITS
2022-02-25 11:26:59 +01:00
Eren Gölge
f70e4bb8c6
Add new speakers to the vits model
2022-02-25 11:26:59 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
2829027d8b
Refactor VITS model
2022-02-25 11:26:59 +01:00
Eren Gölge
146fbfd7c9
Extend unittests
2022-02-25 11:25:00 +01:00
Eren Gölge
2fe16de8e3
Make lint
2022-02-25 11:25:00 +01:00
Eren Gölge
001da8afc8
Update Vits for the new model API
2022-02-25 11:21:19 +01:00
Eren Gölge
ea965a5683
Update VITS for the new API
2022-02-25 11:11:35 +01:00
Eren Gölge
93957d58a1
Refactorin VITS for the tokenizer API
2022-02-25 11:05:06 +01:00
Eren Gölge
7575367b9f
Refactorin VITS for the tokenizer API
2022-02-25 10:57:35 +01:00
Eren Gölge
127118c637
Update TTS.tts formatters ( #1228 )
...
* Return Dict from tts formatters
* Make style
2022-02-11 23:03:43 +01:00
WeberJulian
e778bad626
Add argument to enable dp speaker conditioning
2022-01-06 15:07:27 +01:00
WeberJulian
e1accb6e28
Fix train_tts.py and uncomment code ( #1051 )
...
* Fix SE loading and language embedding logic
* remove trailing white space
* Uncomment resmapling code for SCL
2022-01-03 17:44:57 +01:00
Eren Gölge
36cef5966b
Fix resnet speaker encoder
2021-12-30 15:36:35 +00:00
Eren Gölge
348b5c96a2
Fix speaker encoder test
2021-12-30 15:36:35 +00:00
Eren Gölge
7129b04d46
Update VITS model
2021-12-30 14:08:17 +00:00
Eren Gölge
d29c3780d1
Use speaker_encoder from speaker manager in Vits
2021-12-20 11:54:10 +00:00
Eren Gölge
649dc9e9da
Remove redundant code
2021-12-20 11:54:10 +00:00
Eren Gölge
704dddcffa
Make style
2021-12-20 11:54:10 +00:00
WeberJulian
6b03943526
Move multilingual logic out of the trainer
2021-12-20 11:54:10 +00:00
Edresson
67dda0abe1
Add the SCL resample TODO
2021-12-20 11:54:10 +00:00
WeberJulian
8b52fb89d1
Fix merge bug
2021-12-20 11:54:10 +00:00
WeberJulian
09eda31a3f
Fix tests
2021-12-20 11:54:10 +00:00
Edresson
78a23e19df
Fix pylint checks
2021-12-20 11:54:10 +00:00
WeberJulian
4cd0e4eb0d
Remove self.audio_config from VITS
2021-12-20 11:54:10 +00:00
Edresson
d39200e69b
Remove torchaudio requeriment
2021-12-20 11:54:10 +00:00
WeberJulian
2e516869a1
Fix trailing whitespace
2021-12-20 11:54:10 +00:00
WeberJulian
ffc269eaf4
Update docstring
2021-12-20 11:54:10 +00:00
Edresson
12968532fe
Add the language embedding dim in the duration predictor class
2021-12-20 11:54:10 +00:00
Edresson
f34596d957
Fix function name
2021-12-20 11:54:10 +00:00
Edresson
9daa33d1fd
Remove unusable speaker manager function
2021-12-20 11:54:10 +00:00
Edresson
6fc3b9e679
Remove the unusable fine-tuning model
2021-12-20 11:54:10 +00:00
WeberJulian
da6c1e858c
Fix small issues
2021-12-20 11:54:10 +00:00
WeberJulian
e8af6a9f08
Fix use_speaker_embedding logic
2021-12-20 11:54:10 +00:00
WeberJulian
120332d53f
Fix phonemes
2021-12-20 11:54:10 +00:00
WeberJulian
e995a63bd6
fix linter
2021-12-20 11:54:10 +00:00
WeberJulian
1472b6df49
make style
2021-12-20 11:54:10 +00:00
WeberJulian
3b5592abcf
fix test vits
2021-12-20 11:54:10 +00:00
Julian WEBER
9a2f91327c
get_aux_input
2021-12-20 11:54:10 +00:00
Edresson
1bd1a0546b
Add audio resample in the speaker consistency loss
2021-12-20 11:54:10 +00:00
Edresson
1c6bcda950
Add freeze vocoder generator and flow-based decoder option
2021-12-20 11:54:10 +00:00
WeberJulian
2b952d8b97
freeze vits parts
2021-12-20 11:54:10 +00:00
Edresson
9de4539422
Update the VITS model docs
2021-12-20 11:54:10 +00:00
Edresson
eeb8ac07d9
Add voice conversion fine tuning mode
2021-12-20 11:54:10 +00:00
Edresson
690b37d0ab
Add support to use the speaker encoder as loss function in VITS model
2021-12-20 11:54:09 +00:00
Edresson
de78556655
Fix the optimizer parameters bug in multilingual and multispeaker training
2021-12-20 11:54:09 +00:00
Edresson
9be5b75da3
Fix bug after merge
2021-12-20 11:54:09 +00:00
Edresson
7ef3ddc6ff
Fix unit tests
2021-12-20 11:54:09 +00:00
Edresson
36dcd11453
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
c53693c155
Implement vocoder Fine Tuning like SC-GlowTTS paper
2021-12-20 11:54:09 +00:00
Edresson
c334d39acc
Add voice conversion support for the model VITS trained with external speaker embedding
2021-12-20 11:54:09 +00:00
Edresson
e997889ba8
Fix bug in VITS multilingual inference
2021-12-20 11:54:09 +00:00
Edresson
7c0b8ec572
Fix bugs in the non-multilingual VITS inference
2021-12-20 11:54:09 +00:00
Edresson
3fbbebd74d
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
ac9416fb86
Add multilingual inference support
2021-12-20 11:54:09 +00:00
Edresson
dcb2374bc9
Add multilingual training support to the VITS model
2021-12-20 11:54:09 +00:00
Edresson
5f1c18187f
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
d91c595c5a
Implement training support with d_vecs in the VITS model
2021-12-20 11:54:09 +00:00
Edresson
e0ad838066
Select randomly a speaker from the speaker manager for the test setences
2021-12-20 11:54:09 +00:00
Edresson
eb3e8affe1
Save speakers embeddings/ids before starting training
2021-12-20 11:54:09 +00:00
Eren Gölge
2df0752e73
Model zoo tests ( #900 )
...
* Fix VITS model multi-speaker init
* Remove gdrive support in model manager
* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge
00becf2671
Fix import statements
2021-10-25 19:29:16 +02:00
Eren Gölge
82fed4add2
Make style
2021-10-21 16:05:51 +00:00
Eren Gölge
3da79a4de4
Comment Tacotron2 model
2021-10-20 18:14:04 +00:00
Eren Gölge
c514351c0e
Refactor multi-speaker init in BaseTTS-Tacotron1-2
2021-10-18 08:55:45 +00:00
Eren Gölge
fcbfc53cb7
Fix linter
2021-10-15 10:24:19 +00:00
Eren Gölge
073a2d2eb0
Refactor VITS multi-speaker initialization
2021-10-15 10:20:00 +00:00
Eren Gölge
0565457faa
Fix #846
2021-10-14 14:46:14 +00:00
Eren Gölge
37959ad0c7
Make linter
2021-09-30 23:02:16 +00:00
Eren Gölge
45889804c2
Update VITS
2021-09-30 14:47:56 +00:00
Eren Gölge
3c16013199
Fix Vits imports
2021-09-10 08:26:34 +00:00
Eren Gölge
bfc6ceac29
Move MAS to `TTS.tts.utils.helpers`
2021-09-09 10:57:19 +00:00
Eren Gölge
4761853c5c
Fix imports
2021-09-08 13:34:40 +00:00
Eren Gölge
c1513ec4cd
Plot pitch over spectrogram
2021-09-06 15:16:58 +00:00
Eren Gölge
2b7e55f01f
Fix vits args types
2021-08-30 23:24:20 +00:00
Eren Gölge
18da8f5dbd
Update pylint 2.10.2 and fix lint issues
2021-08-30 08:10:35 +00:00
Eren Gölge
2620f62ea8
Move duration_loss inside VitsGeneratorLoss
2021-08-27 07:07:07 +00:00
Eren Gölge
49e1181ea4
Fixes for the vits model
2021-08-26 17:15:09 +00:00
Eren Gölge
3ab8cef99e
Fix VITS model SPD
2021-08-18 14:55:46 +00:00
Eren Gölge
06018251e6
Add VITS and GlowTTS class docs 🗒️
2021-08-09 18:02:36 +00:00
Eren Gölge
f7a72552f1
Make duration predictor dropout configurable
2021-08-09 18:02:36 +00:00