Edresson
7ef3ddc6ff
Fix unit tests
2021-12-20 11:54:09 +00:00
Edresson
36dcd11453
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
c53693c155
Implement vocoder Fine Tuning like SC-GlowTTS paper
2021-12-20 11:54:09 +00:00
Edresson
c334d39acc
Add voice conversion support for the model VITS trained with external speaker embedding
2021-12-20 11:54:09 +00:00
Edresson
e997889ba8
Fix bug in VITS multilingual inference
2021-12-20 11:54:09 +00:00
Edresson
7c0b8ec572
Fix bugs in the non-multilingual VITS inference
2021-12-20 11:54:09 +00:00
Edresson
3fbbebd74d
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
ac9416fb86
Add multilingual inference support
2021-12-20 11:54:09 +00:00
Edresson
dcb2374bc9
Add multilingual training support to the VITS model
2021-12-20 11:54:09 +00:00
Edresson
f996afedb0
Implement multilingual dataloader support
2021-12-20 11:54:09 +00:00
Edresson
5f1c18187f
Fix pylint issues
2021-12-20 11:54:09 +00:00
Edresson
d91c595c5a
Implement training support with d_vecs in the VITS model
2021-12-20 11:54:09 +00:00
Edresson
e0ad838066
Select randomly a speaker from the speaker manager for the test setences
2021-12-20 11:54:09 +00:00
Edresson
eb3e8affe1
Save speakers embeddings/ids before starting training
2021-12-20 11:54:09 +00:00
Eren Gölge
2ed9e3c241
Fix constant use of noise augment
2021-11-08 09:20:34 +01:00
Eren Gölge
2df0752e73
Model zoo tests ( #900 )
...
* Fix VITS model multi-speaker init
* Remove gdrive support in model manager
* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge
00becf2671
Fix import statements
2021-10-25 19:29:16 +02:00
Eren Gölge
2b7d159383
Update BaseTTS for multi-speaker training
2021-10-21 16:29:06 +00:00
Eren Gölge
82fed4add2
Make style
2021-10-21 16:05:51 +00:00
Eren Gölge
cea8e1739b
Update AlignTTS to use SpeakerManager
2021-10-20 18:22:41 +00:00
Eren Gölge
0e768dd4c5
Update comments
2021-10-20 18:21:26 +00:00
Eren Gölge
7c2cb7cc30
Update BaseTTS
2021-10-20 18:18:22 +00:00
Eren Gölge
330ee7d208
Comment BaseTacotron and remove unused funcs
2021-10-20 18:17:25 +00:00
Eren Gölge
aa25f70b95
Update ForwardTTS for multi-speaker
2021-10-20 18:16:41 +00:00
Eren Gölge
0ebc2a400e
Implement `_set_speaker_embedding` in GlowTTS
2021-10-20 18:15:20 +00:00
Eren Gölge
3da79a4de4
Comment Tacotron2 model
2021-10-20 18:14:04 +00:00
Eren Gölge
c514351c0e
Refactor multi-speaker init in BaseTTS-Tacotron1-2
2021-10-18 08:55:45 +00:00
Eren Gölge
127571423c
Update multi-speaker init in BaseTTS
2021-10-18 08:54:41 +00:00
Eren Gölge
a0a5d580e9
Approximate audio length from file size
2021-10-18 08:54:02 +00:00
Eren Gölge
fcbfc53cb7
Fix linter
2021-10-15 10:24:19 +00:00
Eren Gölge
073a2d2eb0
Refactor VITS multi-speaker initialization
2021-10-15 10:20:00 +00:00
Eren Gölge
0565457faa
Fix #846
2021-10-14 14:46:14 +00:00
Eren Gölge
4dbe7ed0de
Fix all-zero duration case for GlowTTS
2021-10-01 09:24:26 +00:00
Eren Gölge
37959ad0c7
Make linter
2021-09-30 23:02:16 +00:00
Eren Gölge
4163b4f2e4
Update Tacotron models
2021-09-30 14:47:56 +00:00
Eren Gölge
45889804c2
Update VITS
2021-09-30 14:47:56 +00:00
Eren Gölge
fd95926009
Update GlowTTS
2021-09-30 14:47:56 +00:00
Eren Gölge
a156a40b47
Update ForwardTTS for Trainer_v2
2021-09-30 14:19:19 +00:00
Eren Gölge
d9df33f837
Update `align_tts` for trainer_v2
2021-09-30 14:18:10 +00:00
Eren Gölge
8ada870a57
Refactor `trainer.py` for v2
2021-09-30 14:16:34 +00:00
Eren Gölge
2766dd1d6e
Fix #813 - GlowTTS training ( #814 )
...
* Fix #813
* Update glow_tts recipe
* Fix glow-tts test
* Linter fix
* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge
cbbc9e0172
Add FastSpeechConfig
2021-09-11 10:20:37 +00:00
Eren Gölge
d97952611d
Remove unused import
2021-09-10 17:31:41 +00:00
Eren Gölge
d5f256b34c
Update tacotron `r` init
2021-09-10 17:26:23 +00:00
Eren Gölge
ab37fa9c39
Edit AlignTTS
2021-09-10 17:25:00 +00:00
Eren Gölge
66732025e1
Add `base_model` field to `forward_tts` configs
2021-09-10 17:23:48 +00:00
Eren Gölge
a89eb12aca
Fix glow_tts imports
2021-09-10 08:29:51 +00:00
Eren Gölge
0541a25e90
Remove `fastpitch.py` and `speedy_speech.py`
2021-09-10 08:27:48 +00:00
Eren Gölge
3c16013199
Fix Vits imports
2021-09-10 08:26:34 +00:00
Eren Gölge
8b7e094bde
Implement `forward_tts`
...
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)
- Tests for `forward-tts`
- Edit FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge
bfc6ceac29
Move MAS to `TTS.tts.utils.helpers`
2021-09-09 10:57:19 +00:00
Eren Gölge
4761853c5c
Fix imports
2021-09-08 13:34:40 +00:00
Eren Gölge
c1513ec4cd
Plot pitch over spectrogram
2021-09-06 15:16:58 +00:00
Eren Gölge
d847a68e42
Reformat multi-speaker handling in GlowTTS
2021-09-06 15:16:58 +00:00
Eren Gölge
8d41060d36
Plot unnormalized pitch by `FastPitch`
2021-09-06 15:16:58 +00:00
Eren Gölge
2b59da802c
Fix loader setup in `base_tts`
2021-09-06 15:16:58 +00:00
Eren Gölge
2bf9e83c49
FastPitch refactor and commenting
2021-09-06 15:16:58 +00:00
Eren Gölge
648655fa03
Add `PitchExtractor` and return dict by `collate`
2021-09-06 15:16:58 +00:00
Eren Gölge
59d52a4cd8
Disable autcast for criterions
2021-09-06 15:16:58 +00:00
Eren Gölge
98a7271ce8
Refactor FastPitchv2
2021-09-06 15:16:58 +00:00
Eren Gölge
e429afbce4
Enable aligner for FastPitch
2021-09-06 15:16:58 +00:00
Eren Gölge
81c228a2d8
Update FastPitch don't detach duration network inputs
2021-09-06 15:16:58 +00:00
Eren Gölge
ca29033ef4
Refactor FastPitch model
2021-09-06 15:16:58 +00:00
Eren Gölge
5d59100a88
Don't use align_score for models with duration predictor
2021-09-06 15:16:58 +00:00
Eren Gölge
b7caad39e0
Make optional to detach duration predictor input
2021-09-06 15:16:58 +00:00
Eren Gölge
bc396c393f
Add FastPitch model and FastPitchconfig
2021-09-06 15:16:58 +00:00
Eren Gölge
e802b24ad0
Compute mean and std pitch
2021-09-06 15:16:58 +00:00
Eren Gölge
d085642ac1
Cache pitch features
...
Cache the features at the beginning of `BaseTTS` training.
2021-09-06 15:16:58 +00:00
Eren Gölge
7590c7db7a
Fix `base_tacotron` `aux_input` handling
2021-09-06 15:16:58 +00:00
Eren Gölge
994f2be2c1
Add comput_f0 field
2021-09-06 15:16:58 +00:00
Eren Gölge
2b7e55f01f
Fix vits args types
2021-08-30 23:24:20 +00:00
Eren Gölge
18da8f5dbd
Update pylint 2.10.2 and fix lint issues
2021-08-30 08:10:35 +00:00
Eren Gölge
f186856e5d
Add option to sort input sequnce by audio len
2021-08-30 08:10:35 +00:00
Eren Gölge
2620f62ea8
Move duration_loss inside VitsGeneratorLoss
2021-08-27 07:07:07 +00:00
Eren Gölge
49e1181ea4
Fixes for the vits model
2021-08-26 17:15:09 +00:00
Eren Gölge
3ab8cef99e
Fix VITS model SPD
2021-08-18 14:55:46 +00:00
Eren Gölge
7c0d564965
Syncronize DDP processes
2021-08-13 10:40:50 +00:00
Eren Gölge
ecf5f17dca
Fix distribute.py and ddp training
2021-08-12 22:22:32 +00:00
Eren Gölge
c8b9ca3d71
Fix Tacotron num_char init
2021-08-10 08:56:34 +00:00
Eren Gölge
6af03ac476
Fix `num_char` init in Tacotron models
2021-08-09 21:46:15 +00:00
Eren Gölge
06018251e6
Add VITS and GlowTTS class docs 🗒️
2021-08-09 18:02:36 +00:00
Eren Gölge
f7a72552f1
Make duration predictor dropout configurable
2021-08-09 18:02:36 +00:00
Eren Gölge
c312acac7d
Implement VITS model 🚀
...
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge
232a5abb6a
Update `tts.setup_model`
...
Run `model.make_symbols()` if availabe to set the symbol list
2021-08-09 18:02:36 +00:00
Eren Gölge
e4648ffef1
Fix multi-speaker init of Tacotron models & tests
2021-08-09 18:02:36 +00:00
Eren Gölge
01324c8e70
Update `base_tts.py`
...
Enable calling `make_symbols()` from the model if defined.
Compatibility changes for end2end `tts` models in batch formatting.
Changes in multi-speaker initialization.
Modify `test_run()` to work with dict output iof `synthesis`
2021-08-09 18:02:36 +00:00
Agrin Hilmkil
ced4cfdbbf
Allow saving / loading checkpoints from cloud paths ( #683 )
...
* Allow saving / loading checkpoints from cloud paths
Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.
Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.
* Append suffix _fsspec to save/load function names
* Add a lower bound to the fsspec dependency
Skips the 0 major version.
* Add missing changes from refactor
* Use fsspec for remaining artifacts
* Add test case with path requiring fsspec
* Avoid writing logs to file unless output_path is local
* Document the possibility of using paths supported by fsspec
* Fix style and lint
* Add missing lint fixes
* Add type annotations to new functions
* Use Coqpit method for converting config to dict
* Fix type annotation in semi-new function
* Add return type for load_fsspec
* Fix bug where fs not always created
* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge
d9e18e009b
Skip phoneme cache pre-compute if the path exists
2021-08-09 18:02:36 +00:00
Eren Gölge
fc0c4600bd
Fix stopnet training
2021-07-24 11:39:54 +02:00
WeberJulian
25832eb97b
Changes for review
2021-07-15 11:38:45 +02:00
WeberJulian
c79a82ed07
refix linter
2021-07-13 23:12:18 +02:00
WeberJulian
7d92b30946
Fix tests
2021-07-13 23:00:34 +02:00
WeberJulian
32974dd6a9
Fix test sentences synthesis
2021-07-13 16:07:13 +02:00
eren golge
3c0454490f
Fix #616
2021-07-06 14:44:03 +02:00
Eren Gölge
f382e4c700
Fix linter warnings
2021-07-03 13:30:24 +02:00
Eren Gölge
196876feb1
Fix `ModelManager` model download
2021-07-02 10:47:05 +02:00
Eren Gölge
9352cb4136
Format Align TTS docstrings
2021-07-02 10:45:58 +02:00
Eren Gölge
95ad72f38f
Fix glow tts initialization
2021-07-02 10:45:37 +02:00
Eren Gölge
40b0b5365e
Let `get_characters` return `num_chars`
2021-07-02 10:45:00 +02:00
Eren Gölge
2e1a428b83
Update glowtts docstrings and docs
2021-06-30 14:30:55 +02:00
Eren Gölge
9790eddada
Fix wrong argument name 🛠️
2021-06-28 17:03:47 +02:00
Eren Gölge
51005cdab4
Update `tts.models.setup_model`
2021-06-28 17:03:19 +02:00
Eren Gölge
7b8c15ac49
Create base 🐸 TTS model abstraction for tts models
2021-06-28 17:03:19 +02:00
Eren Gölge
c7aad884cd
Implement unified trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
6d7b5fbcde
`tts` model abstraction with `TTSModel`
2021-06-28 17:03:19 +02:00
Eren Gölge
00c82c516d
rename to
2021-06-28 17:03:19 +02:00
Eren Gölge
25238e0658
fix glow-tts `inference()`
2021-06-28 17:03:19 +02:00
Eren Gölge
419735f440
refactor and fix multi-speaker training in Trainer and Tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
269e5a734e
add max_decoder_steps argument to tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
db6a97d1a2
rename external speaker embedding arguments as `d_vectors`
2021-06-28 17:03:19 +02:00
Eren Gölge
f82f1970b8
change `to(device)` to `type_as` in models
2021-06-28 17:03:19 +02:00
Eren Gölge
1fa15c195a
docstring fix
2021-06-28 17:03:19 +02:00
Eren Gölge
1c8a3d7c86
make style
2021-06-28 17:03:19 +02:00
Eren Gölge
b22b7620c3
update glow-tts output shapes to match [B, T, C]
2021-06-28 17:03:19 +02:00
Eren Gölge
8381379938
formating `cond_input` with a function in Tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
6c495c6a6e
fix glow-tts inference and forward functions for handling `cond_input`
...
and refactor its test
2021-06-28 17:03:19 +02:00
Eren Gölge
421194880d
linter fixes
2021-06-28 17:03:19 +02:00
Eren Gölge
d96ebcd6d3
make style
2021-06-28 17:03:19 +02:00
Eren Gölge
b500338faa
make style
2021-06-28 17:03:19 +02:00
Eren Gölge
bb355b7441
update align_tts.py model for the trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
c70d0c9dae
update `speedy_speech.py` model for trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
4e910993f1
update tacotron model to return `model_outputs`
2021-06-28 17:03:19 +02:00
Eren Gölge
bb4deee64c
update glow-tts for the trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
9134c7dfb6
update `sequence_mask` import globally
2021-06-28 17:03:19 +02:00
Eren Gölge
535a458f40
update Tacotron models for the trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
bdbfc95618
add `gradual_training` argument to tacotron.py
2021-06-28 17:03:19 +02:00
Eren Gölge
5a2e75f0ee
import missings for tacotron.py
2021-06-28 17:03:19 +02:00
Eren Gölge
da7d10e53c
mode `setup_model()` to `models/__init__.py`
2021-06-28 17:03:19 +02:00
Alexander Korolev
c1eb9bdcca
fix speaker dim inference
2021-06-01 15:15:26 +02:00
Alexander Korolev
5b89ef2c6e
fix speaker-embeddings dimension during inference
2021-06-01 11:06:35 +02:00
Eren Gölge
c57f0b46bb
reintro use_gst for backwars compat
2021-05-11 11:29:18 +02:00
Eren Gölge
05d9543ed8
init GST module using gst config in Tacotron models
2021-05-11 11:29:17 +02:00
Eren Gölge
a21c0b5585
config update 2 WIP
2021-05-11 11:28:35 +02:00
Eren Gölge
f7582107da
Merge pull request #453 from Edresson/dev
...
Script for spectrogram extraction using teacher forcing and Glow-TTS inference with MAS.
2021-05-06 17:53:28 +02:00
Eren Gölge
8cb27267a4
formatting
2021-05-03 14:26:35 +02:00
Edresson
8228091f92
add script for extraction of tts spectrograms
2021-04-23 14:17:46 -03:00
Eren Gölge
d42748082a
update argument name external_speaker_embedding_dim -> speaker_embedding_dim
...
add inference_noise_scale argument to glow-tts
2021-04-23 18:04:37 +02:00
Eren Gölge
c955a12428
set the default layer size compatible with scglow
2021-04-23 18:04:37 +02:00
Eren Gölge
9cc17be53a
formatting and a small bug fix in Tacotron model
2021-04-15 16:36:51 +02:00
Eren Gölge
3de5a89154
optionally enable prenet dropout at inference time for tacotron models
2021-04-13 13:24:56 +02:00
Eren Gölge
b735076bb4
linter fixes
2021-04-12 13:14:11 +02:00
Eren Gölge
a7f6045644
Merge branch 'reformat' into hifigan-reformat
2021-04-12 12:00:17 +02:00
Eren Gölge
f519012dea
reformatting and styling
2021-04-12 11:47:39 +02:00
Eren Gölge
e5b9607bc3
isort all imports
2021-04-09 00:45:20 +02:00
Eren Gölge
0e79fa86ad
format with black and pylint 2.7.3
2021-04-09 00:38:08 +02:00
Eren Gölge
a3a840fd78
linter fixes
2021-03-30 14:39:16 +02:00
Eren Gölge
6b2e13bf62
compute normalized logp using torch primitives
2021-03-30 14:39:16 +02:00
Eren Gölge
7a382a5c2b
stowed aligntts commit and small refactoring with feed_forward layers
2021-03-30 14:39:16 +02:00
Eren Gölge
aec0b78aff
duration predictor fix 2
2021-03-30 14:39:16 +02:00
Eren Gölge
07269e639b
fix duration predictor in AlignTTS
2021-03-30 14:39:16 +02:00
Eren Gölge
2b3e12ea49
correct imports after refactoring, add AlignTTS (old SSMAS) and some formatting
2021-03-30 14:39:16 +02:00
Eren Gölge
ecb6b0d6ad
rename GlowTtts as GlowTTS
2021-03-30 14:39:16 +02:00
Eren Gölge
0514330869
fix mozilla/TTS#685
2021-03-18 13:33:23 +01:00
Eren Gölge
5c657715f2
fix #382
2021-03-16 17:31:48 +01:00
Eren Gölge
9a48ba3821
a ton of linter updates
2021-03-08 05:06:54 +01:00
Eren Gölge
c990b3a59c
linter fixes and test fixes
2021-01-22 02:32:35 +01:00
root
1faf565e3a
add load_checkpoint func to tts models
2021-01-20 02:10:56 +00:00
erogol
428c224b88
commet update
2021-01-12 17:31:04 +01:00
erogol
79c841ccd3
mass refactoring and update
2021-01-11 17:26:58 +01:00
erogol
d382d759b3
small fixes and test fixes
2021-01-08 15:48:40 +01:00
erogol
a6259041d3
docstring for speedyspeech
2021-01-07 14:35:22 +01:00
erogol
14d33662ea
input shapes for tacotron models
2021-01-06 13:19:40 +01:00
erogol
f288e9a260
docstrings for taoctron models
2021-01-06 13:19:40 +01:00
erogol
7586fbc4de
SS refactoring
2021-01-06 13:19:40 +01:00
erogol
e82d31b6ac
glow ttss refactoring
2021-01-06 13:19:40 +01:00
erogol
aa40fe1aa0
SS model refacotring for multi speaker
2021-01-06 13:19:40 +01:00
erogol
eb555855e4
small fixes
2021-01-06 13:19:40 +01:00
erogol
ac5c9217d1
positional encoding masking for SS
2021-01-06 13:19:40 +01:00
erogol
fede46e96e
pylint and test fixes
2021-01-06 13:19:40 +01:00
erogol
cf869e8922
add SS files
2021-01-06 13:19:40 +01:00
erogol
fa6907fa0e
update glow-tts parameters and fix rel-attn-win size
2021-01-06 13:19:40 +01:00
erogol
7b20d8cbd3
implement residual BN convolution and add it as an alternative encoder for glow-tts. also generic layers to layers/generic
2021-01-06 13:19:40 +01:00
erogol
7c3cdced1a
make speaker_mapping a global variable to prevent reload. Fix glow-tts training
2020-12-01 03:23:25 +01:00
Edresson
07345099ee
GlowTTS zero-shot TTS Support
2020-10-24 15:58:39 -03:00
Edresson
b7f9ebd32b
add check arguments for GlowTTS and multispeaker training bug fix
2020-10-19 17:17:58 -03:00
Edresson
99d5a0ac07
add Speaker Conditional GST support
2020-09-29 16:09:27 -03:00
erogol
10258724d1
linter fixes
2020-09-22 03:54:16 +02:00
erogol
e0b9fa887f
glow-tts modules added
2020-09-21 14:15:40 +02:00
erogol
c008003506
do not check sample rate as loading stats file for normalization to enable interpolation for different sample rate vocoder
2020-09-18 12:52:19 +02:00
erogol
3660c57f1e
time seperable convolution encoder, huber loss for duration predictor
2020-09-17 03:10:58 +02:00
erogol
f1a75468c2
fix arguments
2020-09-12 04:00:25 +02:00
erogol
45fbc0d003
convolution encoder with GLU and res connections
2020-09-12 03:40:21 +02:00
erogol
15e6ab3912
glow-tts module renaming updates
2020-09-12 03:33:36 +02:00
erogol
df19428ec6
rename the project to old TTS
2020-09-09 12:27:23 +02:00