Eren Gölge
f7587fc134
Fix SSIM loss correction
2022-07-13 10:47:12 +02:00
Eren Gölge
bc1f93c299
Fix device allocation
2022-07-12 19:05:25 +02:00
Eren Gölge
49bac724c0
Implement VitsAudioConfig ( #1556 )
...
* Implement VitsAudioConfig
* Update VITS LJSpeech recipe
* Update VITS VCTK recipe
* Make style
* Add missing decorator
* Add missing param
* Make style
* Update recipes
* Fix test
* Bug fix
* Exclude tests folder
* Make linter
* Make style
2022-07-12 18:49:58 +02:00
Eren G??lge
48a4f3647f
Make lint
2022-07-12 14:58:26 +02:00
Eren G??lge
2cf89b88c9
Make style
2022-07-12 14:12:57 +02:00
Eren G??lge
a6f73a18cb
Fix BCELoss adressing #1192
2022-07-12 14:11:34 +02:00
Eren G??lge
c17ff17a18
Fix SSIM loss
2022-07-12 12:35:24 +02:00
a-froghyar
8be21ec387
Capacitron ( #977 )
...
* new CI config
* initial Capacitron implementation
* delete old unused file
* fix empty formatting changes
* update losses and training script
* fix previous commit
* fix commit
* Add Capacitron test and first round of test fixes
* revert formatter change
* add changes to the synthesizer
* add stepwise gradual lr scheduler and changes to the recipe
* add inference script for dev use
* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting
* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model
* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style
* chore: fix linting
* feat: add Tacotron 2 support
* leftover from dev
* chore:rename parser args
* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers
* chore: revert arbitrary trainer changes
* fmt: revert formatting bug
* formatting again
* formatting fixed
* fix: log func
* fix: update optimizer
- Implemented load_state_dict for continuing training
* fix: clean optimizer init for standard models
* improvement: purge espeak flags and add training scripts
* Delete capacitronT2.py
delete old training script, new one is pushed
* feat: capacitron trainer methods
- extracted capacitron specific training operations from the trainer into custom
methods in taco1 and taco2 models
* chore: renaming and merging capacitron and gst style args
* fix: bug fixes from the previous commit
* fix: implement state_dict method on CapacitronOptimizer
* fix: call method
* fix: inference naming
* Delete train_capacitron.py
* fix: synthesize
* feat: update tests
* chore: fix style
* Delete capacitron_inference.py
* fix: fix train tts t2 capacitron tests
* fix: double forward in T2 train step
* fix: double forward in T1 train step
* fix: run make style
* fix: remove unused import
* fix: test for T1 capacitron
* fix: make lint
* feat: add blizzard2013 recipes
* make style
* fix: update recipes
* chore: make style
* Plot test sentences in Tacotron
* chore: make style and fix import
* fix: call forward first before problematic floordiv op
* fix: update recipes
* feat: add min_audio_len to recipes
* aux_input["style_mel"]
* chore: make style
* Make capacitron T2 recipe more stable
* Remove T1 capacitron Ljspeech
* feat: implement new grad clipping routine and update configs
* make style
* Add pretrained checkpoints
* Add default vocoder
* Change trainer package
* Fix grad clip issue for tacotron
* Fix scheduler issue with tacotron
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
code-review-doctor
fa887ef5f9
Fix issue probably-meant-fstring found at https://codereview.doctor ( #1532 )
2022-05-07 13:33:40 +02:00
Edresson Casanova
8d228ab22a
Trick to Upsampling to High sampling rates using VITS model ( #1456 )
...
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
52a7896668
Update VITS loss
2022-02-25 11:30:24 +01:00
Eren Gölge
1a43e05460
Fix VITS loss bug
...
Fake and real features were given in the wrong args order to
the loss function
2022-02-25 11:26:59 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
34c4be5e49
Update forwardtts
2022-02-25 11:26:59 +01:00
Eren Gölge
146fbfd7c9
Extend unittests
2022-02-25 11:25:00 +01:00
Eren Gölge
127118c637
Update TTS.tts formatters ( #1228 )
...
* Return Dict from tts formatters
* Make style
2022-02-11 23:03:43 +01:00
Edresson Casanova
0860d73cf8
Remove Tensorflow requeriment ( #1225 )
...
* Remove TF modules
* Remove TF unit tests
* Remove TF vocoder modules
* Remove TF convert scripts
* Remove TF requirement
* Remove the Docs TF instructions
* Remove TF inference support
2022-02-10 16:14:54 +01:00
Eren Gölge
704dddcffa
Make style
2021-12-20 11:54:10 +00:00
Edresson
12968532fe
Add the language embedding dim in the duration predictor class
2021-12-20 11:54:10 +00:00
Edresson
8c22d5ac49
Turn more clear the VITS loss function
2021-12-20 11:54:10 +00:00
Edresson
6fc3b9e679
Remove the unusable fine-tuning model
2021-12-20 11:54:10 +00:00
WeberJulian
1472b6df49
make style
2021-12-20 11:54:10 +00:00
Edresson
eeb8ac07d9
Add voice conversion fine tuning mode
2021-12-20 11:54:10 +00:00
Edresson
690b37d0ab
Add support to use the speaker encoder as loss function in VITS model
2021-12-20 11:54:09 +00:00
Edresson
c53693c155
Implement vocoder Fine Tuning like SC-GlowTTS paper
2021-12-20 11:54:09 +00:00
Edresson
dcb2374bc9
Add multilingual training support to the VITS model
2021-12-20 11:54:09 +00:00
Eren Gölge
b6b14a76af
Fix VITS stochastic duration predictor
2021-11-08 09:20:11 +01:00
Eren Gölge
0e768dd4c5
Update comments
2021-10-20 18:21:26 +00:00
Eren Gölge
fd95926009
Update GlowTTS
2021-09-30 14:47:56 +00:00
Eren Gölge
2766dd1d6e
Fix #813 - GlowTTS training ( #814 )
...
* Fix #813
* Update glow_tts recipe
* Fix glow-tts test
* Linter fix
* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge
26f76fce22
Remove SpeedySpeech from .models.json
2021-09-10 17:47:27 +00:00
Eren Gölge
d6e29ef98a
Style update
2021-09-10 08:30:33 +00:00
Eren Gölge
570d5971be
Implement `ForwardTTSLoss`
2021-09-10 08:29:12 +00:00
Eren Gölge
bfc6ceac29
Move MAS to `TTS.tts.utils.helpers`
2021-09-09 10:57:19 +00:00
Eren Gölge
4761853c5c
Fix imports
2021-09-08 13:34:40 +00:00
Eren Gölge
2b59da802c
Fix loader setup in `base_tts`
2021-09-06 15:16:58 +00:00
Eren Gölge
29248536c9
Update `PositionalEncoding`
2021-09-06 15:16:58 +00:00
Eren Gölge
4672889549
Update `generic.FFTransformer`
2021-09-06 15:16:58 +00:00
Eren Gölge
2bf9e83c49
FastPitch refactor and commenting
2021-09-06 15:16:58 +00:00
Eren Gölge
59b24e66cf
Add `AlignerNetwork`
2021-09-06 15:16:58 +00:00
Eren Gölge
debf772ec5
Implement binary alignment loss
2021-09-06 15:16:58 +00:00
Eren Gölge
e429afbce4
Enable aligner for FastPitch
2021-09-06 15:16:58 +00:00
Eren Gölge
fac9dbe661
Update FastPitchLoss
2021-09-06 15:16:58 +00:00
Eren Gölge
b81560607b
Update docstrings
2021-09-06 15:16:58 +00:00
Eren Gölge
8fffd4e813
Don't print computed phonemes
...
It causes noise in logs
2021-09-06 15:16:58 +00:00
Eren Gölge
db32162eae
Fix `FastPitchLoss`
2021-09-06 15:16:58 +00:00
Eren Gölge
c8d999b010
Add FastPitchLoss
2021-09-06 15:16:58 +00:00
Eren Gölge
18da8f5dbd
Update pylint 2.10.2 and fix lint issues
2021-08-30 08:10:35 +00:00
Eren Gölge
2620f62ea8
Move duration_loss inside VitsGeneratorLoss
2021-08-27 07:07:07 +00:00
Eren Gölge
49e1181ea4
Fixes for the vits model
2021-08-26 17:15:09 +00:00
Eren Gölge
3ab8cef99e
Fix VITS model SPD
2021-08-18 14:55:46 +00:00
Eren Gölge
c312acac7d
Implement VITS model 🚀
...
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge
e4648ffef1
Fix multi-speaker init of Tacotron models & tests
2021-08-09 18:02:36 +00:00
Eren Gölge
fc0c4600bd
Fix stopnet training
2021-07-24 11:39:54 +02:00
Eren Gölge
2e1a428b83
Update glowtts docstrings and docs
2021-06-30 14:30:55 +02:00
Eren Gölge
ae6405bb76
Docstrings for `Trainer`
2021-06-28 17:03:47 +02:00
Eren Gölge
d42d1c02ea
Use `torch.linalg.qr` for pytorch > `v1.9.0`
2021-06-28 17:03:47 +02:00
Eren Gölge
a5d5bc9063
Print `max_decoder_steps` when model reaches the limit
2021-06-28 17:03:47 +02:00
Eren Gölge
269e5a734e
add max_decoder_steps argument to tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
db6a97d1a2
rename external speaker embedding arguments as `d_vectors`
2021-06-28 17:03:19 +02:00
Eren Gölge
b500338faa
make style
2021-06-28 17:03:19 +02:00
Eren Gölge
9203b863d9
update align_tts_loss for trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
9134c7dfb6
update `sequence_mask` import globally
2021-06-28 17:03:19 +02:00
Eren Gölge
ca302db7b0
add sequence_mask to `utils.data`
2021-06-28 17:03:19 +02:00
Eren Gölge
844abb3b1d
`setup_loss()` in `layer/__init__.py`
2021-06-28 17:03:19 +02:00
Adam Froghyar
7ddc885f37
deleted a line the broke GravesAttention
2021-05-10 15:42:59 +02:00
Eren Gölge
8cb27267a4
formatting
2021-05-03 14:26:35 +02:00
Eren Gölge
9cc17be53a
formatting and a small bug fix in Tacotron model
2021-04-15 16:36:51 +02:00
Eren Gölge
3de5a89154
optionally enable prenet dropout at inference time for tacotron models
2021-04-13 13:24:56 +02:00
Eren Gölge
87ee6ceb57
style update #3
2021-04-09 01:17:15 +02:00
Eren Gölge
18d9ec8036
format with black
2021-04-09 00:54:59 +02:00
Eren Gölge
e5b9607bc3
isort all imports
2021-04-09 00:45:20 +02:00
Eren Gölge
0e79fa86ad
format with black and pylint 2.7.3
2021-04-09 00:38:08 +02:00
Eren Gölge
44b4cb5ba5
DCA comment
2021-04-06 16:24:50 +02:00
Eren Gölge
a3a840fd78
linter fixes
2021-03-30 14:39:16 +02:00
Eren Gölge
6b2e13bf62
compute normalized logp using torch primitives
2021-03-30 14:39:16 +02:00
Eren Gölge
7a382a5c2b
stowed aligntts commit and small refactoring with feed_forward layers
2021-03-30 14:39:16 +02:00
Eren Gölge
d542a50818
fix losses for alignTTS
2021-03-30 14:39:16 +02:00
Eren Gölge
18cc7b95ec
update l1 and huber to mse loss
2021-03-30 14:39:16 +02:00
Eren Gölge
896d33ed49
update losses to hande alingtts phases
2021-03-30 14:39:16 +02:00
Eren Gölge
c2d29e5cd4
FFTransformer encoder for aligntts
2021-03-30 14:39:16 +02:00
Eren Gölge
460a2d3e26
FFTransformer Decoder for AlignTTS
2021-03-30 14:39:16 +02:00
Eren Gölge
aa29f5b199
aligntts loss
2021-03-30 14:39:16 +02:00
Eren Gölge
a831468cab
align tts MDN layer
2021-03-30 14:39:16 +02:00
Eren Gölge
4396f8e2da
continue refactoring
2021-03-30 14:39:16 +02:00
Eren Gölge
2b3e12ea49
correct imports after refactoring, add AlignTTS (old SSMAS) and some formatting
2021-03-30 14:39:16 +02:00
Eren Gölge
d9c405f0c3
create feedforward folder for SS layers
2021-03-30 14:39:16 +02:00
Eren Gölge
a8cf1ae6b4
fix wavenet running with no input mask
2021-03-30 14:39:16 +02:00
Eren Gölge
32e8b56c45
linter fix
2021-03-18 13:33:23 +01:00
Eren Gölge
65533f33e9
fix #374
2021-03-18 13:33:00 +01:00
Eren Gölge
9a48ba3821
a ton of linter updates
2021-03-08 05:06:54 +01:00
Eren Gölge
08581deb61
linter updates
2021-03-08 02:53:02 +01:00
Eren Gölge
b464cab9b8
setup.py update and pylint fixes
2021-01-26 02:57:50 +01:00
Eren Gölge
660d61aeeb
maximum_path_numpy and CYTHON adabtable import
2021-01-26 02:57:07 +01:00
root
5c87753e88
glow-tts fix for saving inverse weight
2021-01-20 02:09:42 +00:00
erogol
bbc8d665a1
move attention layers to a sperate file
2021-01-11 17:27:30 +01:00
erogol
79c841ccd3
mass refactoring and update
2021-01-11 17:26:58 +01:00
erogol
1d961d6f8a
cladd renaming
2021-01-11 17:26:11 +01:00
erogol
c0a2aa68d3
formatting
2021-01-11 17:25:39 +01:00
erogol
b206162d11
more docstrings
2021-01-11 17:25:04 +01:00
erogol
6e9043c5d2
rename convbnblocks and handle none mask
2021-01-11 17:22:34 +01:00
erogol
921fa5db92
remove attentions from common layers
2021-01-11 15:06:42 +01:00
erogol
cc2b1e043d
docstrings for common layers
2021-01-11 15:06:12 +01:00
erogol
a6f40fef2e
stage missing files
2021-01-08 16:02:56 +01:00
erogol
d382d759b3
small fixes and test fixes
2021-01-08 15:48:40 +01:00
erogol
a6259041d3
docstring for speedyspeech
2021-01-07 14:35:22 +01:00
erogol
de2a542f83
glow-tts bug fix
2021-01-07 13:40:32 +01:00
erogol
5a45af48f1
fix
2021-01-06 13:19:40 +01:00
erogol
e7fad928e7
doc strings for the all glow-tts layers
2021-01-06 13:19:40 +01:00
erogol
d3b7284be4
glow-tts comments and refactoring
2021-01-06 13:19:40 +01:00
erogol
7586fbc4de
SS refactoring
2021-01-06 13:19:40 +01:00
erogol
e82d31b6ac
glow ttss refactoring
2021-01-06 13:19:40 +01:00
erogol
29f4329d7f
update glow-tts layers and add some comments
2021-01-06 13:19:40 +01:00
erogol
eb555855e4
small fixes
2021-01-06 13:19:40 +01:00
erogol
5901a00576
argument rename
2021-01-06 13:19:40 +01:00
erogol
4ef083f0f1
select decoder type for SS
2021-01-06 13:19:40 +01:00
erogol
3fa408a5ea
change order BN + ReLU to ReLU + BN for SS
2021-01-06 13:19:40 +01:00
erogol
ac5c9217d1
positional encoding masking for SS
2021-01-06 13:19:40 +01:00
erogol
fede46e96e
pylint and test fixes
2021-01-06 13:19:40 +01:00
erogol
cf869e8922
add SS files
2021-01-06 13:19:40 +01:00
erogol
dc4a16d62e
speedy speehc losses
2021-01-06 13:19:40 +01:00
erogol
d62cac7252
fix glow-tts prenet bug fix
2021-01-06 13:19:40 +01:00
erogol
fa6907fa0e
update glow-tts parameters and fix rel-attn-win size
2021-01-06 13:19:40 +01:00
erogol
7b20d8cbd3
implement residual BN convolution and add it as an alternative encoder for glow-tts. also generic layers to layers/generic
2021-01-06 13:19:40 +01:00
erogol
070146e143
add monotonic dynamic convolution attention
2021-01-06 13:18:41 +01:00
erogol
f6c96b0ac2
Merge branch 'dev'
2020-11-25 15:29:06 +01:00
erogol
d94782a076
reset the way ga_loss is stored in return_dict
2020-11-02 13:18:56 +01:00
erogol
a108d0ee81
check nan loss in glow-tts loss
2020-11-02 13:12:19 +01:00
erogol
b8ac9aba9d
check against NaN loss in tacotron_loss
2020-11-02 12:44:41 +01:00
erogol
183fe56d95
Merge branch 'ssim_loss' into dev
2020-10-29 23:49:09 +01:00
erogol
946a0c0fb9
bug fixes for single speaker glow-tts, enable torch based amp. Make amp optional for wavegrad. Bug fixes for synthesis setup for glow-tts
2020-10-29 15:45:50 +01:00
erogol
fdaed45f58
optional loss masking for stoptoken predictor
2020-10-28 18:40:54 +01:00
erogol
e49cc3bbcd
bug fix
2020-10-28 18:34:34 +01:00
erogol
9cef923d99
ssim loss for tacotron models
2020-10-28 15:24:18 +01:00
erogol
a6f564c8c8
pylint fixes
2020-10-27 12:35:10 +01:00
erogol
8de7c13708
fix no loss masking loss computation
2020-10-27 12:17:38 +01:00
Alexander Korolev
47d74ced1c
Update losses.py
...
Seems like in the latest dev merge, this change was reverted. Any specific reason for this?
Without it the problem as stated here https://github.com/mozilla/TTS/issues/473 occurs.
2020-10-23 14:15:01 +02:00
erogol
c2c4126a18
remove merge conflicts
2020-10-08 01:35:27 +02:00
erogol
6f0654f9a8
differential spectral loss
2020-10-08 01:30:42 +02:00
erogol
4e93f90108
bug fix
2020-10-08 01:29:30 +02:00
erogol
bb9b70ee27
differential spectral loss and loss weight settings
2020-10-08 01:29:30 +02:00
Edresson
99d5a0ac07
add Speaker Conditional GST support
2020-09-29 16:09:27 -03:00
erogol
6a70c63f24
correct glow-tts loss
2020-09-27 03:28:42 +02:00
erogol
665f7ca714
linter fix
2020-09-24 12:57:54 +02:00
erogol
10258724d1
linter fixes
2020-09-22 03:54:16 +02:00
erogol
e0b9fa887f
glow-tts modules added
2020-09-21 14:15:40 +02:00
erogol
e4c6386603
change import for normalization layer
2020-09-21 13:09:52 +02:00
erogol
c008003506
do not check sample rate as loading stats file for normalization to enable interpolation for different sample rate vocoder
2020-09-18 12:52:19 +02:00
erogol
3660c57f1e
time seperable convolution encoder, huber loss for duration predictor
2020-09-17 03:10:58 +02:00