coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren Gölge	f7587fc134	Fix SSIM loss correction	2022-07-13 10:47:12 +02:00
Eren Gölge	bc1f93c299	Fix device allocation	2022-07-12 19:05:25 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
Eren G??lge	48a4f3647f	Make lint	2022-07-12 14:58:26 +02:00
Eren G??lge	2cf89b88c9	Make style	2022-07-12 14:12:57 +02:00
Eren G??lge	a6f73a18cb	Fix BCELoss adressing #1192	2022-07-12 14:11:34 +02:00
Eren G??lge	c17ff17a18	Fix SSIM loss	2022-07-12 12:35:24 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
code-review-doctor	fa887ef5f9	Fix issue probably-meant-fstring found at https://codereview.doctor (#1532 )	2022-05-07 13:33:40 +02:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	52a7896668	Update VITS loss	2022-02-25 11:30:24 +01:00
Eren Gölge	1a43e05460	Fix VITS loss bug Fake and real features were given in the wrong args order to the loss function	2022-02-25 11:26:59 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	34c4be5e49	Update forwardtts	2022-02-25 11:26:59 +01:00
Eren Gölge	146fbfd7c9	Extend unittests	2022-02-25 11:25:00 +01:00
Eren Gölge	127118c637	Update TTS.tts formatters (#1228 ) * Return Dict from tts formatters * Make style	2022-02-11 23:03:43 +01:00
Edresson Casanova	0860d73cf8	Remove Tensorflow requeriment (#1225 ) * Remove TF modules * Remove TF unit tests * Remove TF vocoder modules * Remove TF convert scripts * Remove TF requirement * Remove the Docs TF instructions * Remove TF inference support	2022-02-10 16:14:54 +01:00
Eren Gölge	704dddcffa	Make style	2021-12-20 11:54:10 +00:00
Edresson	12968532fe	Add the language embedding dim in the duration predictor class	2021-12-20 11:54:10 +00:00
Edresson	8c22d5ac49	Turn more clear the VITS loss function	2021-12-20 11:54:10 +00:00
Edresson	6fc3b9e679	Remove the unusable fine-tuning model	2021-12-20 11:54:10 +00:00
WeberJulian	1472b6df49	make style	2021-12-20 11:54:10 +00:00
Edresson	eeb8ac07d9	Add voice conversion fine tuning mode	2021-12-20 11:54:10 +00:00
Edresson	690b37d0ab	Add support to use the speaker encoder as loss function in VITS model	2021-12-20 11:54:09 +00:00
Edresson	c53693c155	Implement vocoder Fine Tuning like SC-GlowTTS paper	2021-12-20 11:54:09 +00:00
Edresson	dcb2374bc9	Add multilingual training support to the VITS model	2021-12-20 11:54:09 +00:00
Eren Gölge	b6b14a76af	Fix VITS stochastic duration predictor	2021-11-08 09:20:11 +01:00
Eren Gölge	0e768dd4c5	Update comments	2021-10-20 18:21:26 +00:00
Eren Gölge	fd95926009	Update GlowTTS	2021-09-30 14:47:56 +00:00
Eren Gölge	2766dd1d6e	Fix #813 - GlowTTS training (#814 ) * Fix #813 * Update glow_tts recipe * Fix glow-tts test * Linter fix * Run data dep init only in training	2021-09-17 20:06:55 +02:00
Eren Gölge	26f76fce22	Remove SpeedySpeech from .models.json	2021-09-10 17:47:27 +00:00
Eren Gölge	d6e29ef98a	Style update	2021-09-10 08:30:33 +00:00
Eren Gölge	570d5971be	Implement `ForwardTTSLoss`	2021-09-10 08:29:12 +00:00
Eren Gölge	bfc6ceac29	Move MAS to `TTS.tts.utils.helpers`	2021-09-09 10:57:19 +00:00
Eren Gölge	4761853c5c	Fix imports	2021-09-08 13:34:40 +00:00
Eren Gölge	2b59da802c	Fix loader setup in `base_tts`	2021-09-06 15:16:58 +00:00
Eren Gölge	29248536c9	Update `PositionalEncoding`	2021-09-06 15:16:58 +00:00
Eren Gölge	4672889549	Update `generic.FFTransformer`	2021-09-06 15:16:58 +00:00
Eren Gölge	2bf9e83c49	FastPitch refactor and commenting	2021-09-06 15:16:58 +00:00
Eren Gölge	59b24e66cf	Add `AlignerNetwork`	2021-09-06 15:16:58 +00:00
Eren Gölge	debf772ec5	Implement binary alignment loss	2021-09-06 15:16:58 +00:00
Eren Gölge	e429afbce4	Enable aligner for FastPitch	2021-09-06 15:16:58 +00:00
Eren Gölge	fac9dbe661	Update FastPitchLoss	2021-09-06 15:16:58 +00:00
Eren Gölge	b81560607b	Update docstrings	2021-09-06 15:16:58 +00:00
Eren Gölge	8fffd4e813	Don't print computed phonemes It causes noise in logs	2021-09-06 15:16:58 +00:00
Eren Gölge	db32162eae	Fix `FastPitchLoss`	2021-09-06 15:16:58 +00:00
Eren Gölge	c8d999b010	Add FastPitchLoss	2021-09-06 15:16:58 +00:00
Eren Gölge	18da8f5dbd	Update pylint 2.10.2 and fix lint issues	2021-08-30 08:10:35 +00:00
Eren Gölge	2620f62ea8	Move duration_loss inside VitsGeneratorLoss	2021-08-27 07:07:07 +00:00
Eren Gölge	49e1181ea4	Fixes for the vits model	2021-08-26 17:15:09 +00:00
Eren Gölge	3ab8cef99e	Fix VITS model SPD	2021-08-18 14:55:46 +00:00
Eren Gölge	c312acac7d	Implement VITS model 🚀 VITS model implementation built on Glow TTS and HiFiGAN layers.	2021-08-09 18:02:36 +00:00
Eren Gölge	e4648ffef1	Fix multi-speaker init of Tacotron models & tests	2021-08-09 18:02:36 +00:00
Eren Gölge	fc0c4600bd	Fix stopnet training	2021-07-24 11:39:54 +02:00
Eren Gölge	2e1a428b83	Update glowtts docstrings and docs	2021-06-30 14:30:55 +02:00
Eren Gölge	ae6405bb76	Docstrings for `Trainer`	2021-06-28 17:03:47 +02:00
Eren Gölge	d42d1c02ea	Use `torch.linalg.qr` for pytorch > `v1.9.0`	2021-06-28 17:03:47 +02:00
Eren Gölge	a5d5bc9063	Print `max_decoder_steps` when model reaches the limit	2021-06-28 17:03:47 +02:00
Eren Gölge	269e5a734e	add max_decoder_steps argument to tacotron models	2021-06-28 17:03:19 +02:00
Eren Gölge	db6a97d1a2	rename external speaker embedding arguments as `d_vectors`	2021-06-28 17:03:19 +02:00
Eren Gölge	b500338faa	make style	2021-06-28 17:03:19 +02:00
Eren Gölge	9203b863d9	update align_tts_loss for trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	9134c7dfb6	update `sequence_mask` import globally	2021-06-28 17:03:19 +02:00
Eren Gölge	ca302db7b0	add sequence_mask to `utils.data`	2021-06-28 17:03:19 +02:00
Eren Gölge	844abb3b1d	`setup_loss()` in `layer/__init__.py`	2021-06-28 17:03:19 +02:00
Adam Froghyar	7ddc885f37	deleted a line the broke GravesAttention	2021-05-10 15:42:59 +02:00
Eren Gölge	8cb27267a4	formatting	2021-05-03 14:26:35 +02:00
Eren Gölge	9cc17be53a	formatting and a small bug fix in Tacotron model	2021-04-15 16:36:51 +02:00
Eren Gölge	3de5a89154	optionally enable prenet dropout at inference time for tacotron models	2021-04-13 13:24:56 +02:00
Eren Gölge	87ee6ceb57	style update #3	2021-04-09 01:17:15 +02:00
Eren Gölge	18d9ec8036	format with black	2021-04-09 00:54:59 +02:00
Eren Gölge	e5b9607bc3	isort all imports	2021-04-09 00:45:20 +02:00
Eren Gölge	0e79fa86ad	format with black and pylint 2.7.3	2021-04-09 00:38:08 +02:00
Eren Gölge	44b4cb5ba5	DCA comment	2021-04-06 16:24:50 +02:00
Eren Gölge	a3a840fd78	linter fixes	2021-03-30 14:39:16 +02:00
Eren Gölge	6b2e13bf62	compute normalized logp using torch primitives	2021-03-30 14:39:16 +02:00
Eren Gölge	7a382a5c2b	stowed aligntts commit and small refactoring with feed_forward layers	2021-03-30 14:39:16 +02:00
Eren Gölge	d542a50818	fix losses for alignTTS	2021-03-30 14:39:16 +02:00
Eren Gölge	18cc7b95ec	update l1 and huber to mse loss	2021-03-30 14:39:16 +02:00
Eren Gölge	896d33ed49	update losses to hande alingtts phases	2021-03-30 14:39:16 +02:00
Eren Gölge	c2d29e5cd4	FFTransformer encoder for aligntts	2021-03-30 14:39:16 +02:00
Eren Gölge	460a2d3e26	FFTransformer Decoder for AlignTTS	2021-03-30 14:39:16 +02:00
Eren Gölge	aa29f5b199	aligntts loss	2021-03-30 14:39:16 +02:00
Eren Gölge	a831468cab	align tts MDN layer	2021-03-30 14:39:16 +02:00
Eren Gölge	4396f8e2da	continue refactoring	2021-03-30 14:39:16 +02:00
Eren Gölge	2b3e12ea49	correct imports after refactoring, add AlignTTS (old SSMAS) and some formatting	2021-03-30 14:39:16 +02:00
Eren Gölge	d9c405f0c3	create feedforward folder for SS layers	2021-03-30 14:39:16 +02:00
Eren Gölge	a8cf1ae6b4	fix wavenet running with no input mask	2021-03-30 14:39:16 +02:00
Eren Gölge	32e8b56c45	linter fix	2021-03-18 13:33:23 +01:00
Eren Gölge	65533f33e9	fix #374	2021-03-18 13:33:00 +01:00
Eren Gölge	9a48ba3821	a ton of linter updates	2021-03-08 05:06:54 +01:00
Eren Gölge	08581deb61	linter updates	2021-03-08 02:53:02 +01:00
Eren Gölge	b464cab9b8	setup.py update and pylint fixes	2021-01-26 02:57:50 +01:00
Eren Gölge	660d61aeeb	maximum_path_numpy and CYTHON adabtable import	2021-01-26 02:57:07 +01:00
root	5c87753e88	glow-tts fix for saving inverse weight	2021-01-20 02:09:42 +00:00
erogol	bbc8d665a1	move attention layers to a sperate file	2021-01-11 17:27:30 +01:00
erogol	79c841ccd3	mass refactoring and update	2021-01-11 17:26:58 +01:00
erogol	1d961d6f8a	cladd renaming	2021-01-11 17:26:11 +01:00
erogol	c0a2aa68d3	formatting	2021-01-11 17:25:39 +01:00
erogol	b206162d11	more docstrings	2021-01-11 17:25:04 +01:00
erogol	6e9043c5d2	rename convbnblocks and handle none mask	2021-01-11 17:22:34 +01:00
erogol	921fa5db92	remove attentions from common layers	2021-01-11 15:06:42 +01:00
erogol	cc2b1e043d	docstrings for common layers	2021-01-11 15:06:12 +01:00
erogol	a6f40fef2e	stage missing files	2021-01-08 16:02:56 +01:00
erogol	d382d759b3	small fixes and test fixes	2021-01-08 15:48:40 +01:00
erogol	a6259041d3	docstring for speedyspeech	2021-01-07 14:35:22 +01:00
erogol	de2a542f83	glow-tts bug fix	2021-01-07 13:40:32 +01:00
erogol	5a45af48f1	fix	2021-01-06 13:19:40 +01:00
erogol	e7fad928e7	doc strings for the all glow-tts layers	2021-01-06 13:19:40 +01:00
erogol	d3b7284be4	glow-tts comments and refactoring	2021-01-06 13:19:40 +01:00
erogol	7586fbc4de	SS refactoring	2021-01-06 13:19:40 +01:00
erogol	e82d31b6ac	glow ttss refactoring	2021-01-06 13:19:40 +01:00
erogol	29f4329d7f	update glow-tts layers and add some comments	2021-01-06 13:19:40 +01:00
erogol	eb555855e4	small fixes	2021-01-06 13:19:40 +01:00
erogol	5901a00576	argument rename	2021-01-06 13:19:40 +01:00
erogol	4ef083f0f1	select decoder type for SS	2021-01-06 13:19:40 +01:00
erogol	3fa408a5ea	change order BN + ReLU to ReLU + BN for SS	2021-01-06 13:19:40 +01:00
erogol	ac5c9217d1	positional encoding masking for SS	2021-01-06 13:19:40 +01:00
erogol	fede46e96e	pylint and test fixes	2021-01-06 13:19:40 +01:00
erogol	cf869e8922	add SS files	2021-01-06 13:19:40 +01:00
erogol	dc4a16d62e	speedy speehc losses	2021-01-06 13:19:40 +01:00
erogol	d62cac7252	fix glow-tts prenet bug fix	2021-01-06 13:19:40 +01:00
erogol	fa6907fa0e	update glow-tts parameters and fix rel-attn-win size	2021-01-06 13:19:40 +01:00
erogol	7b20d8cbd3	implement residual BN convolution and add it as an alternative encoder for glow-tts. also generic layers to layers/generic	2021-01-06 13:19:40 +01:00
erogol	070146e143	add monotonic dynamic convolution attention	2021-01-06 13:18:41 +01:00
erogol	f6c96b0ac2	Merge branch 'dev'	2020-11-25 15:29:06 +01:00
erogol	d94782a076	reset the way ga_loss is stored in return_dict	2020-11-02 13:18:56 +01:00
erogol	a108d0ee81	check nan loss in glow-tts loss	2020-11-02 13:12:19 +01:00
erogol	b8ac9aba9d	check against NaN loss in tacotron_loss	2020-11-02 12:44:41 +01:00
erogol	183fe56d95	Merge branch 'ssim_loss' into dev	2020-10-29 23:49:09 +01:00
erogol	946a0c0fb9	bug fixes for single speaker glow-tts, enable torch based amp. Make amp optional for wavegrad. Bug fixes for synthesis setup for glow-tts	2020-10-29 15:45:50 +01:00
erogol	fdaed45f58	optional loss masking for stoptoken predictor	2020-10-28 18:40:54 +01:00
erogol	e49cc3bbcd	bug fix	2020-10-28 18:34:34 +01:00
erogol	9cef923d99	ssim loss for tacotron models	2020-10-28 15:24:18 +01:00
erogol	a6f564c8c8	pylint fixes	2020-10-27 12:35:10 +01:00
erogol	8de7c13708	fix no loss masking loss computation	2020-10-27 12:17:38 +01:00
Alexander Korolev	47d74ced1c	Update losses.py Seems like in the latest dev merge, this change was reverted. Any specific reason for this? Without it the problem as stated here https://github.com/mozilla/TTS/issues/473 occurs.	2020-10-23 14:15:01 +02:00
erogol	c2c4126a18	remove merge conflicts	2020-10-08 01:35:27 +02:00
erogol	6f0654f9a8	differential spectral loss	2020-10-08 01:30:42 +02:00
erogol	4e93f90108	bug fix	2020-10-08 01:29:30 +02:00
erogol	bb9b70ee27	differential spectral loss and loss weight settings	2020-10-08 01:29:30 +02:00
Edresson	99d5a0ac07	add Speaker Conditional GST support	2020-09-29 16:09:27 -03:00
erogol	6a70c63f24	correct glow-tts loss	2020-09-27 03:28:42 +02:00
erogol	665f7ca714	linter fix	2020-09-24 12:57:54 +02:00
erogol	10258724d1	linter fixes	2020-09-22 03:54:16 +02:00
erogol	e0b9fa887f	glow-tts modules added	2020-09-21 14:15:40 +02:00
erogol	e4c6386603	change import for normalization layer	2020-09-21 13:09:52 +02:00
erogol	c008003506	do not check sample rate as loading stats file for normalization to enable interpolation for different sample rate vocoder	2020-09-18 12:52:19 +02:00
erogol	3660c57f1e	time seperable convolution encoder, huber loss for duration predictor	2020-09-17 03:10:58 +02:00

1 2 3 4 5 ...

254 Commits