coqui-tts

Commit Graph

Author	SHA1	Message	Date
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	e5d8ec2402	Change the VITS upsampling interpolation trick to linear (#1564 )	2022-05-13 10:52:39 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00
Eren Gölge	6e460b7e42	Add an assert for the upsampling trick (#1538 )	2022-05-12 19:55:24 +02:00
Eren Gölge	4857967063	🐍 Python 3.10.x support and drop Python 3.6 support (#1565 ) * Update requirements * Update CI for p3.10 * Update numpy requirement * Drop 🐍p3.6 support Numpy also dropped support for p3.6 * Bind cython v0.29.28 * Bind pyworld to v0.2.10 > 0.2.10 is not p3.10.x compatible * Update Dockerfile	2022-05-12 15:50:25 +02:00
Edresson Casanova	a97eed696a	Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560 )	2022-05-12 15:15:18 +02:00
Eren Gölge	e45ae57aef	Merge pull request #1550 from coqui-ai/fix-upsampling-asserts Fix VITS upsampling asserts	2022-05-12 14:51:41 +02:00
Edresson Casanova	175ca06388	Add reinit text encoder and duration predictor parameter (#1562 ) * Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook	2022-05-12 09:08:36 -03:00
Edresson Casanova	182711043c	Fix the VITS upsampling asserts Fix style	2022-05-12 09:08:29 -03:00
Eren Gölge	2fc38f67d2	Update SpeakerManager init in Synthesizer	2022-05-11 11:32:27 +02:00
Eren Gölge	c3f8c4d5eb	Return default SpeakerManager if no d_vector_file	2022-05-11 11:31:45 +02:00
Eren Gölge	121e9ed685	Pass use_cuda to init_encoder	2022-05-11 11:31:17 +02:00
Eren Gölge	c18bd21b3f	Return durations at VITS inference	2022-05-11 11:30:05 +02:00
Eren Gölge	5021a03de0	Use torch.no_grad for VITS inference	2022-05-11 11:29:36 +02:00
Eren Gölge	3f03e3012c	Fix batch_group_size in VITS	2022-05-07 13:44:44 +02:00
code-review-doctor	fa887ef5f9	Fix issue probably-meant-fstring found at https://codereview.doctor (#1532 )	2022-05-07 13:33:40 +02:00
Eren Gölge	a0a9279e4b	Fix GAN optimizer order commit `212d330929` Author: Edresson Casanova <edresson1@gmail.com> Date: Fri Apr 29 16:29:44 2022 -0300 Fix unit test commit `44456b0483` Author: Edresson Casanova <edresson1@gmail.com> Date: Fri Apr 29 07:28:39 2022 -0300 Fix style commit `d545beadb9` Author: Edresson Casanova <edresson1@gmail.com> Date: Thu Apr 28 17:08:04 2022 -0300 Change order of HIFI-GAN optimizers to be equal than the original repository commit `657c5442e5` Author: Edresson Casanova <edresson1@gmail.com> Date: Thu Apr 28 15:40:16 2022 -0300 Remove audio padding before mel spec extraction commit `76b274e690` Merge: `379ccd7b` `6233f4fc` Author: Edresson Casanova <edresson1@gmail.com> Date: Wed Apr 27 07:28:48 2022 -0300 Merge pull request #1541 from coqui-ai/comp_emb_fix Bug fix in compute embedding without eval partition commit `379ccd7ba6` Author: WeberJulian <julian.weber@hotmail.fr> Date: Wed Apr 27 10:42:26 2022 +0200 returns y_mask in VITS inference (#1540) * returns y_mask * make style	2022-05-07 13:29:11 +02:00
Edresson Casanova	60034674f9	Remove audio padding before mel spec extraction	2022-05-07 13:12:09 +02:00
WeberJulian	fbdf76b2fc	returns y_mask in VITS inference (#1540 ) * returns y_mask * make style	2022-05-03 13:49:24 +02:00
Edresson Casanova	6233f4fcd7	Bug fix in compute embedding without eval partition	2022-04-26 13:58:03 -03:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Eren Gölge	c410bc58ef	Bump to v0.6.2	2022-04-20 11:46:26 +02:00
WeberJulian	30bea7d53c	Update manage.py (#1514 )	2022-04-19 14:27:32 +02:00
Eren Gölge	7133f8f47d	Print Model's license when downloading (#1512 ) * Print model license while downloading * Make style * Add a new license link * Make style	2022-04-19 14:18:49 +02:00
WeberJulian	4953636b14	Add African models (#1511 ) * Add african models * Set default license for all models	2022-04-19 14:18:30 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
WeberJulian	1b22f03e98	Fix G2P backend of the released models (#1461 ) * Fix enforce phonemizer * Add new models * Fix .model.json	2022-03-30 12:47:11 +02:00
WeberJulian	c66a6241fd	Enforce phonemizer definition for synthesis (#1441 ) * Enforce phonemizer definition for synthesis * Fix train_tts, tokenizer init can now edit config * Add small change to trigger CI pipeline * fix wrong output path for one tts_test * Fix style * Test config overides by args and tokenizer * Fix style	2022-03-25 23:15:33 +01:00
Edresson Casanova	37896e1743	Bug fix in freeze encoder (#1391 ) * Fix the bug in freeze encoder * Remove emb_l definition for non-multilingual training * Fix unit tests	2022-03-24 18:16:04 +01:00
Edresson Casanova	3435bc8fca	Fix style tests	2022-03-23 15:05:32 -03:00
Edresson Casanova	0ae1e0248c	Fix the bug for emptly audio files	2022-03-23 14:39:31 -03:00
Edresson Casanova	ea53d6feb3	Replace webrtcvad by silero-vad	2022-03-23 14:39:31 -03:00
Eren Gölge	3af01cfe3b	Update base model wrt 👟 (#1406 )	2022-03-23 17:24:20 +01:00
Eren Gölge	1c3623af33	Fix model manager (#1436 ) * Fix manager * Make style	2022-03-23 12:57:14 +01:00
Eren Gölge	72d85e53c9	Update model file extension (#1422 ) * Update model file ext to ```.pth``` * Update docs * Rename more * Find model files	2022-03-22 17:55:00 +01:00
Eren Gölge	fd56fabb21	Fix #1380 (#1409 )	2022-03-16 12:38:27 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
WeberJulian	690c96ed28	Fix default phonemizer for ja and zh (#1399 )	2022-03-16 12:13:22 +01:00
Edresson Casanova	f81892483d	REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349 ) * Rename Speaker encoder module to encoder * Add a generic emotion dataset formatter * Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config * Add class map in emotion config * Add Base encoder config * Add evaluation encoder script * Fix the bug in plot_embeddings * Enable Weight decay for encoder training * Add argumnet to disable storage * Add Perfect Sampler and remove storage * Add evaluation during encoder training * Fix lint checks * Remove useless config parameter * Active evaluation in speaker encoder test and use multispeaker dataset for this test * Unit tests fixs * Remove useless tests for speedup the aux_tests * Use get_optimizer in Encoder * Add BaseEncoder Class * Fix the unitests * Add Perfect Batch Sampler unit test * Add compute encoder accuracy in a function	2022-03-11 14:43:40 +01:00
Edresson Casanova	36e9ea2f97	Open bible dataset formatter (#1365 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference * Fix the bug in find unique chars script * Add OpenBible formatter Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-03-11 10:43:31 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Edresson Casanova	917f417ac4	Add alphas to control language and speaker balancer (#1216 ) * Add alphas to control language and speaker balancer * Add docs for speaker and language samplers * Change the Samplers weights to float for save memory * Change the test_samplers to unittest format * Add get_sampler method in BaseTTS * Fix rebase issues * Add language and speaker samplers support for DDP training * Rename distributed sampler wrapper * Remove the DistributedSamplerWrapper and use the one from Trainer * Bugfix after rebase * Move the samplers config to tts config	2022-03-10 14:56:09 +01:00
Edresson Casanova	f381e29b91	REBASED: Add support for the speaker encoder training using torch spectrograms (#1348 ) * Add support for the speaker encoder training using torch spectrograms * Remove useless function in speaker encoder dataset class	2022-03-10 14:54:51 +01:00
Eren Gölge	c670365507	Fix VCTK recipe and formatter	2022-03-08 14:20:34 +01:00
Eren Gölge	8feb41d361	Bump up to v0.6.1	2022-03-07 15:57:44 +01:00
Eren Gölge	ee02bc3823	Bump up to v0.6.0	2022-03-07 12:08:22 +01:00
Eren Gölge	dc280819be	Add new models	2022-03-07 12:08:09 +01:00
Eren Gölge	e9d9028b4d	Revert cleaner name	2022-03-06 12:57:06 +01:00
Eren Gölge	764c7fa4a4	Rename phoneme_cleaners	2022-03-06 12:09:54 +01:00
Eren Gölge	dd4287de1f	Update models	2022-03-03 20:23:00 +01:00
Eren Gölge	6cb00be795	Update your_tts model URL	2022-03-02 18:04:49 +01:00
Eren Gölge	1425a023fe	Make style and lint	2022-03-02 13:25:35 +01:00
Eren Gölge	c68885b3fd	Update Vits speaker encoder init	2022-03-02 13:20:23 +01:00
Eren Gölge	27b67b7945	Fix import	2022-03-02 09:15:20 +01:00
Eren Gölge	942df0fb05	Update vits dataset	2022-03-02 09:14:32 +01:00
Eren Gölge	6a9f8074f0	Fix TTSDataset	2022-03-01 07:57:48 +01:00
Eren Gölge	690de1ab06	Update Characters and add more tests	2022-02-25 11:32:44 +01:00
Eren Gölge	9063397892	Fix FastSpeech config	2022-02-25 11:31:56 +01:00
Eren Gölge	1e414b3a09	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	acc83cd3e6	Update Vits model API	2022-02-25 11:31:56 +01:00
Eren Gölge	fe656659be	Implement BaseTTS	2022-02-25 11:31:56 +01:00
Eren Gölge	bed4afd4ee	Implement BaseVocabulary	2022-02-25 11:31:56 +01:00
Eren Gölge	e0f9be76c0	Update test_run in wavernn and wavegrad	2022-02-25 11:31:56 +01:00
Eren Gölge	bf540f4323	Update imports for trainer	2022-02-25 11:31:56 +01:00
Eren Gölge	4c43eda414	Update BaseTrainerModel	2022-02-25 11:31:56 +01:00
Eren Gölge	83c5ddc5b7	Update imports	2022-02-25 11:31:56 +01:00
Eren Gölge	14c117978d	Fix return outputs	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	8b3ba02c95	Add vocab_dict to model config	2022-02-25 11:31:20 +01:00
Eren Gölge	ff23dce081	Update TTSDataset	2022-02-25 11:31:20 +01:00
Eren Gölge	750903d2ba	Add VCTK formatter docstring	2022-02-25 11:30:24 +01:00
Eren Gölge	52a7896668	Update VITS loss	2022-02-25 11:30:24 +01:00
Eren Gölge	c68962c574	Update forward tts binary loss	2022-02-25 11:30:24 +01:00
Eren Gölge	c11944022d	Revert back again rand_segment	2022-02-25 11:30:24 +01:00
Eren Gölge	00c7600103	Update Vits model API	2022-02-25 11:30:24 +01:00
Eren Gölge	935a604046	Delete trainer_utils	2022-02-25 11:29:41 +01:00
Eren Gölge	d0c27a9661	Update synthesis.py	2022-02-25 11:29:41 +01:00
Eren Gölge	35fc7270ff	Implement BaseTTS	2022-02-25 11:28:47 +01:00
Eren Gölge	2bad098625	Implement BaseVocabulary	2022-02-25 11:28:47 +01:00
Eren Gölge	833de62e30	Update base_vocoder	2022-02-25 11:28:14 +01:00
Eren Gölge	fc3b6d2861	Update gan	2022-02-25 11:28:14 +01:00
Eren Gölge	20a677c623	Update test_run in wavernn and wavegrad	2022-02-25 11:28:14 +01:00
Eren Gölge	be3a03126a	Update imports for trainer	2022-02-25 11:28:14 +01:00
Eren Gölge	c911729896	Update BaseTrainerModel	2022-02-25 11:28:14 +01:00
Eren Gölge	1e219fef0a	Revert drop_last	2022-02-25 11:26:59 +01:00
Eren Gölge	7dfd753d91	Add a cheap trick to avoid short audio clips	2022-02-25 11:26:59 +01:00
Eren Gölge	1a43e05460	Fix VITS loss bug Fake and real features were given in the wrong args order to the loss function	2022-02-25 11:26:59 +01:00
Eren Gölge	4b96bfe925	Fix train logging	2022-02-25 11:26:59 +01:00
Eren Gölge	ab8a4ca2c3	Revert random segment	2022-02-25 11:26:59 +01:00
Eren Gölge	8622226f3f	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	27db089d6c	Change TrainingArgs -> TrainerArgs	2022-02-25 11:26:59 +01:00
Eren Gölge	aa81454721	Update BaseTrainingConfig	2022-02-25 11:26:59 +01:00
Eren Gölge	d3a58ed07a	Fix default values	2022-02-25 11:26:59 +01:00
Eren Gölge	54c6bb2a8c	Fix add speaker VITS	2022-02-25 11:26:59 +01:00
Eren Gölge	590b04fb89	Fix espeak_wrapper	2022-02-25 11:26:59 +01:00
Eren Gölge	a013566d15	Delete trainer related code	2022-02-25 11:26:59 +01:00
Eren Gölge	38314194e7	Set `drop_last`	2022-02-25 11:26:59 +01:00
Eren Gölge	f70e4bb8c6	Add new speakers to the vits model	2022-02-25 11:26:59 +01:00
Eren Gölge	d5c0e17548	Load right char class dynamically	2022-02-25 11:26:59 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	b3ed6ff6b7	Update FastPitchConfig	2022-02-25 11:26:59 +01:00
Eren Gölge	1932401e8d	Fix dataset preprocessing	2022-02-25 11:26:59 +01:00
Eren Gölge	34c4be5e49	Update forwardtts	2022-02-25 11:26:59 +01:00
Eren Gölge	bb37462794	Update language manager	2022-02-25 11:26:59 +01:00
Eren Gölge	5169d4eb32	Plot pitch over input characters	2022-02-25 11:26:59 +01:00
Eren Gölge	cd5d1497cf	Add pitch_fmin pitch_fmax args to the audio	2022-02-25 11:26:59 +01:00
Eren Gölge	1445a46e9e	Update synthesizer to use iinit_from_config	2022-02-25 11:26:59 +01:00
Eren Gölge	7058fcc3ff	Take file extension as an argument	2022-02-25 11:26:59 +01:00
Eren Gölge	13482dde1f	Update GAN model	2022-02-25 11:26:59 +01:00
Eren Gölge	2829027d8b	Refactor VITS model	2022-02-25 11:26:59 +01:00
Eren Gölge	ef63c99524	Implement `start_by_longest` option for TTSDatase	2022-02-25 11:26:18 +01:00
Eren Gölge	c4c471d61d	Allow padding for shorter segments	2022-02-25 11:25:48 +01:00
Eren Gölge	47fbddc8d4	Fix docstring	2022-02-25 11:25:48 +01:00
Eren Gölge	bc2243bac4	Fix tests	2022-02-25 11:25:00 +01:00
Eren Gölge	146fbfd7c9	Extend unittests	2022-02-25 11:25:00 +01:00
Eren Gölge	2fe16de8e3	Make lint	2022-02-25 11:25:00 +01:00
Eren Gölge	7b49a4aa2b	Fix glow_tts_config missing field	2022-02-25 11:24:13 +01:00
Eren Gölge	07b0a80d57	Fix tokenizer init_from_config	2022-02-25 11:24:13 +01:00
Eren Gölge	50e17097a7	Add verbose option to AudioProcessor	2022-02-25 11:24:13 +01:00
Eren Gölge	235f7d9b02	Extend glow_tts model tests	2022-02-25 11:24:13 +01:00
Eren Gölge	8e248913d6	Update train_tts for the new API	2022-02-25 11:24:13 +01:00
Eren Gölge	001da8afc8	Update Vits for the new model API	2022-02-25 11:21:19 +01:00
Eren Gölge	5176ae9e53	Fixes small compat. issues	2022-02-25 11:21:19 +01:00
Eren Gölge	131bc0cfc0	Fix synthesis.py 🔧	2022-02-25 11:18:00 +01:00
Eren Gölge	c0746f23df	Fix `too many open files`	2022-02-25 11:16:30 +01:00
Eren Gölge	df0d58bf09	Update VCTK recipes	2022-02-25 11:16:30 +01:00
Eren Gölge	730f7c0df4	Add file_ext args to resample.py	2022-02-25 11:15:46 +01:00
Eren Gölge	28d98da422	Update VCTK formatter	2022-02-25 11:15:46 +01:00
Eren Gölge	4d99fee3e2	Update spec extractor	2022-02-25 11:12:44 +01:00
Eren Gölge	38a0b3b6c7	Update train_tts.py	2022-02-25 11:11:35 +01:00
Eren Gölge	cfaa51fddc	Update BaseTTS config	2022-02-25 11:11:35 +01:00
Eren Gölge	4c5cb44eeb	Update setup_model	2022-02-25 11:11:35 +01:00
Eren Gölge	7c4243fba7	Update GlowTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	bacf79f4fb	Update AlignTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	18f726af65	Update ForwardTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	d0ec4b91e5	Update Tacotron models	2022-02-25 11:11:35 +01:00
Eren Gölge	ea965a5683	Update VITS for the new API	2022-02-25 11:11:35 +01:00
Eren Gölge	f802a931a3	Pass samples to init_from_config in SpeakerManager	2022-02-25 11:07:34 +01:00
Eren Gölge	bde68d9f25	Use the same phonemizer for `en` to `en-us`	2022-02-25 11:07:34 +01:00
Eren Gölge	8649d4fd36	Allow None pad and blank tokens	2022-02-25 11:07:34 +01:00
Eren Gölge	c9972e6f14	Make lint	2022-02-25 11:07:34 +01:00
Eren Gölge	30cfafce56	Add init_from_config	2022-02-25 11:05:54 +01:00
Eren Gölge	90cc45dd4e	Update data loader tests	2022-02-25 11:05:54 +01:00
Eren Gölge	93957d58a1	Refactorin VITS for the tokenizer API	2022-02-25 11:05:06 +01:00
Eren Gölge	04df0a3d9f	Refactor TTSDataset ⚡️	2022-02-25 11:05:06 +01:00
Eren Gölge	9bb347a52b	Update for tokenizer API	2022-02-25 11:05:06 +01:00
Eren Gölge	452dbc43d8	Update imports for symbols -> characters	2022-02-25 11:05:06 +01:00
Eren Gölge	8071fa0020	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	b6c2bfdf08	Refactor synthesis.py for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	b2bb954a51	Refactor TTSDataset to use TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	84091096a6	Refactor Synthesizer class for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	196ae74273	Update data loader tests	2022-02-25 11:05:06 +01:00
Eren Gölge	98057a00ae	Make style	2022-02-25 10:57:35 +01:00
Eren Gölge	7575367b9f	Refactorin VITS for the tokenizer API	2022-02-25 10:57:35 +01:00
Eren Gölge	4cd690e4c1	Updates BaseTTS and configs	2022-02-25 10:57:35 +01:00
Eren Gölge	176b712c1a	Refactor TTSDataset ⚡️	2022-02-25 10:57:35 +01:00
Eren Gölge	4597d4e5b6	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	1df1d6c4a9	Update for tokenizer API	2022-02-25 10:48:03 +01:00
Eren Gölge	2d8ce98d2a	Update imports for symbols -> characters	2022-02-25 10:48:03 +01:00
Eren Gölge	9a95e15483	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	d0eb642d88	Refactor synthesis.py for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	3476be30d7	Refactor Synthesizer class for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	9397a56b13	Allow init_from_config from model or audio config	2022-02-25 10:48:03 +01:00
Eren Gölge	a71a013276	Fix the wrong default loss name for GAN models	2022-02-25 10:48:03 +01:00
Eren Gölge	04202da1ac	Make style	2022-02-25 10:48:03 +01:00
Eren Gölge	3b63d713b9	Fix espeak wrapper cmd call	2022-02-25 10:48:03 +01:00
Eren Gölge	4894998e6b	Fix print_logs	2022-02-25 10:48:03 +01:00
Eren Gölge	4e8f9d6f10	Fix IPAPhonemes init_from_config	2022-02-25 10:48:03 +01:00
Eren Gölge	0fe39166fe	Discard OOV chars in tokenizer Discard but store OOV chars with a warninig message when the OOV char first recognized	2022-02-25 10:48:03 +01:00
Eren Gölge	c39aaafbfc	Update EspeakWrapper for espeak-ng	2022-02-25 10:48:03 +01:00
Eren Gölge	bb389479a4	Update setup_model for TTS.tts models	2022-02-25 10:48:03 +01:00
Eren Gölge	9b83e665fc	Add init_from_config as an abstract class	2022-02-25 10:48:03 +01:00
Eren Gölge	3eca5ad060	Update config fields for phonemizer	2022-02-25 10:48:03 +01:00
Eren Gölge	d2525abe8c	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	73d27ebd45	Fix GlowTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	87bf940676	Print duplicate characters	2022-02-25 10:48:03 +01:00
Eren Gölge	3de9f38d16	Add init_from_config to SpeakerManager	2022-02-25 10:48:03 +01:00
Eren Gölge	d8ec7086b6	Update `synthesis` for the new API	2022-02-25 10:48:03 +01:00
Eren Gölge	4e83bf3968	Allow choosing phonemizer	2022-02-25 10:48:02 +01:00
Eren Gölge	22f0c58fe1	Print language codes	2022-02-25 10:48:02 +01:00
Eren Gölge	693fb4dd39	Modify init_from_config for IPAPhonemes	2022-02-25 10:48:02 +01:00
Eren Gölge	acc6eef625	Update for tokenizer API	2022-02-25 10:48:02 +01:00
Eren Gölge	e1b4c4ca43	Add init_from_config to GAN	2022-02-25 10:48:02 +01:00
Eren Gölge	353f913efc	Fix #985	2022-02-25 10:48:02 +01:00
Eren Gölge	ba3b60c90f	Test TTSTokenizer	2022-02-25 10:48:02 +01:00
Eren Gölge	79a84410f2	Test punctuations	2022-02-25 10:48:02 +01:00
Eren Gölge	d8bdeb8b8f	Fix Punctuation	2022-02-25 10:48:02 +01:00
Eren Gölge	ff7c385838	Fix BasePhonemizer	2022-02-25 10:48:02 +01:00
Eren Gölge	10d435ce77	Fixup	2022-02-25 10:48:02 +01:00
Eren Gölge	f0655bfffc	Fix ja_jp_phonemizer	2022-02-25 10:48:02 +01:00
Eren Gölge	20e5dd3678	Add doc examples	2022-02-25 10:48:02 +01:00
Eren Gölge	fbad17e084	Update imports for symbols -> characters	2022-02-25 10:48:02 +01:00
Eren Gölge	a1df4f9887	Test character classes	2022-02-25 10:45:24 +01:00
Eren Gölge	bd461ace33	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:45:24 +01:00
Eren Gölge	5a9653978a	Refactor synthesis.py for TTSTokenizer	2022-02-25 10:45:24 +01:00
Eren Gölge	e5785b34b0	Style fix	2022-02-25 10:27:46 +01:00
Eren Gölge	e4049aa31a	Refactor TTSDataset to use TTSTokenizer	2022-02-25 10:27:46 +01:00
Eren Gölge	2480bbe937	Remove OLD TOKENIZATION ROUTINES	2022-02-25 09:32:54 +01:00
Eren Gölge	53f696615b	Add init_from_config to AudioProcessor	2022-02-25 09:32:54 +01:00
Eren Gölge	3d86edfc81	Refactor Synthesizer class for TTSTokenizer	2022-02-25 09:32:54 +01:00
Eren Gölge	8d85af84cd	Implement Punctuation class	2022-02-25 09:32:54 +01:00
Eren Gölge	1aca58afaf	Fix imports in cleaners.py	2022-02-25 09:32:54 +01:00
Eren Gölge	0344645e90	Implement TTSTokenizer	2022-02-25 09:32:54 +01:00
Eren Gölge	2fb1f70503	Implement BaseCharacters, IPAPhonemes, Graphemes	2022-02-25 09:32:54 +01:00
Eren Gölge	1bee40af40	Create language folders under `TTS.tts.utils.text`	2022-02-25 09:32:54 +01:00
Eren Gölge	c1119bc291	Implement BasePhonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	dcd01356e0	Create `text/english` folder	2022-02-25 09:32:54 +01:00
Eren Gölge	80867c8e8c	Implement multi-phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	5e4f78add3	Implement espeak wrapper	2022-02-25 09:32:54 +01:00
Eren Gölge	e03a05c816	Implement gruut wrapper	2022-02-25 09:32:54 +01:00
Eren Gölge	172ba0c5e7	Implement JA_JP phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	ca02b82218	Implement ZH_CH phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	a51b031bff	Merge branch 'dev' into dev-fix-glowtts-infer	2022-02-21 12:01:40 +03:00
Edresson Casanova	28a7464975	Fix the bug in split dataset function (#1251 ) * Fix the bug in split_dataset * Make eval_split_size configurable * Change test_loader to use load_tts_samples function * Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval * Fix samplers unit test * Add data unit test on GitHub workflow	2022-02-21 11:59:36 +03:00
Edresson Casanova	bc5db13d06	Fix the bug in extract tts spectrogram script	2022-02-19 19:24:00 +00:00
Edresson Casanova	ba6e56e01c	Fix Glow-TTS multi-speaker inference	2022-02-18 19:25:29 +00:00
Eren Gölge	127118c637	Update TTS.tts formatters (#1228 ) * Return Dict from tts formatters * Make style	2022-02-11 23:03:43 +01:00
Eren Gölge	5e3f499a69	Fix #1187 (#1227 )	2022-02-11 13:27:59 +01:00
Edresson Casanova	0860d73cf8	Remove Tensorflow requeriment (#1225 ) * Remove TF modules * Remove TF unit tests * Remove TF vocoder modules * Remove TF convert scripts * Remove TF requirement * Remove the Docs TF instructions * Remove TF inference support	2022-02-10 16:14:54 +01:00
Eren Gölge	44c7d1a826	Merge pull request #1054 from WeberJulian/partial_embedding_compute Partial embedding compute	2022-02-06 20:13:55 +01:00
WeberJulian	c7f5e005e1	Compute embedding for new audios only	2022-01-06 15:41:38 +01:00
WeberJulian	e778bad626	Add argument to enable dp speaker conditioning	2022-01-06 15:07:27 +01:00
WeberJulian	e1accb6e28	Fix train_tts.py and uncomment code (#1051 ) * Fix SE loading and language embedding logic * remove trailing white space * Uncomment resmapling code for SCL	2022-01-03 17:44:57 +01:00
Eren Gölge	58c38de58d	Bump up to v0.5.0	2022-01-03 15:04:03 +00:00
Eren Gölge	5840d89802	Keep proj_dim in speaker encoder models	2022-01-03 15:03:34 +00:00
Eren Gölge	03bcae1ba5	Merge pull request #1050 from coqui-ai/fix_synthesizer_init Fix if else statement	2022-01-03 15:59:29 +01:00
Eren Gölge	fc09e319d4	Prioritize the given encoder path over config	2022-01-03 14:24:19 +00:00
Eren Gölge	7fad969a1f	Fix if else statement	2022-01-03 14:16:11 +00:00
Eren Gölge	d724984be1	Fix language assignment	2022-01-02 11:11:24 +00:00
WeberJulian	a63998c048	Fix phoneme language	2022-01-01 21:08:13 +01:00
Eren Gölge	7ef458a59c	Updake default vocoder for uk model	2022-01-01 16:09:42 +00:00
Eren Gölge	e55f5ee59e	Make linter	2022-01-01 15:50:04 +00:00
Eren Gölge	38f5a11125	Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev	2022-01-01 15:38:46 +00:00
Eren Gölge	c5512af82b	Update uk vocoder url	2022-01-01 15:38:21 +00:00
Eren Gölge	d37cfe474a	Merge branch 'pr/Edresson/731-rebased' into dev	2022-01-01 15:37:35 +00:00
Eren Gölge	33711afa01	Update yourTTS url	2022-01-01 15:37:08 +00:00
Eren Gölge	8fd1ee1926	Print urls when BadZipError	2022-01-01 15:26:35 +00:00
Eren Gölge	61874bc0a0	Fix your_tts inference from the listed models	2021-12-31 13:45:05 +00:00
Eren Gölge	8100135a7e	Add the YourTTS entry to the models	2021-12-31 12:22:08 +00:00
Eren Gölge	36cef5966b	Fix resnet speaker encoder	2021-12-30 15:36:35 +00:00
Eren Gölge	348b5c96a2	Fix speaker encoder test	2021-12-30 15:36:35 +00:00
Eren Gölge	7129b04d46	Update VITS model	2021-12-30 14:08:17 +00:00
Eren Gölge	638091f41d	Update Speaker Encoder models	2021-12-30 12:02:06 +00:00
Eren Gölge	6189fdfaea	Fix Training HiFiGan -- avg loss not decreasing #1003	2021-12-30 10:48:55 +00:00
Eren Gölge	275c759993	Fix #1037	2021-12-23 15:57:10 +00:00
Eren Gölge	5c5ddd2ba7	Init speaker manager for speaker encoder	2021-12-22 15:51:53 +00:00
Eren Gölge	633dcc9c56	Implement RMS volume normalization	2021-12-22 15:51:14 +00:00
Eren Gölge	8d2bb284ac	Add UK vocoder models	2021-12-21 13:13:35 +00:00
Eren Gölge	56378b12f7	Fix speaker encoder init	2021-12-21 12:26:25 +00:00
Eren Gölge	c9c1fa0548	Fix multi-speaker init in Synthesizer	2021-12-21 09:44:07 +00:00
Eren Gölge	f769595112	Add more listing options to ModelManager	2021-12-20 11:54:10 +00:00
Eren Gölge	a25269d897	Remove commented code	2021-12-20 11:54:10 +00:00
Eren Gölge	473414d4af	Implement init_speaker_encoder and change arg names	2021-12-20 11:54:10 +00:00
Eren Gölge	d29c3780d1	Use speaker_encoder from speaker manager in Vits	2021-12-20 11:54:10 +00:00
Eren Gölge	4d13b887f5	Change speaker_idx to speaker_name	2021-12-20 11:54:10 +00:00
Eren Gölge	4c50f6f4df	Add functions to get and check and argument in config and config.model_args	2021-12-20 11:54:10 +00:00
Eren Gölge	3c6d7f495c	Fixup	2021-12-20 11:54:10 +00:00
Eren Gölge	3818bd0c23	Fixup	2021-12-20 11:54:10 +00:00
Eren Gölge	79de38ca76	Rename setup_model to setup_speaker_encoder_model	2021-12-20 11:54:10 +00:00
Eren Gölge	35a781fb90	Fix synthesizer reading `use_language_embedding`	2021-12-20 11:54:10 +00:00
Eren Gölge	7a987db62b	Use torchaudio for ResNet speaker encoder	2021-12-20 11:54:10 +00:00
Eren Gölge	649dc9e9da	Remove redundant code	2021-12-20 11:54:10 +00:00
Eren Gölge	704dddcffa	Make style	2021-12-20 11:54:10 +00:00
WeberJulian	54b7fb4e4a	Fix zoo tests	2021-12-20 11:54:10 +00:00
WeberJulian	a564eb9f54	Add support for multi-lingual models in CLI	2021-12-20 11:54:10 +00:00
WeberJulian	2bbcb558dc	Prevent weighted sampler use when num_gpus > 1	2021-12-20 11:54:10 +00:00
WeberJulian	74cedfac38	Revert init multispeaker change	2021-12-20 11:54:10 +00:00
WeberJulian	9cfbacc622	Fix trailing space	2021-12-20 11:54:10 +00:00
WeberJulian	6b03943526	Move multilingual logic out of the trainer	2021-12-20 11:54:10 +00:00
Edresson	818dc4ccd8	Add Docstring for TorchSTFT	2021-12-20 11:54:10 +00:00
Edresson	67dda0abe1	Add the SCL resample TODO	2021-12-20 11:54:10 +00:00
WeberJulian	8b52fb89d1	Fix merge bug	2021-12-20 11:54:10 +00:00
WeberJulian	09eda31a3f	Fix tests	2021-12-20 11:54:10 +00:00
Edresson	78a23e19df	Fix pylint checks	2021-12-20 11:54:10 +00:00
WeberJulian	4cd0e4eb0d	Remove self.audio_config from VITS	2021-12-20 11:54:10 +00:00
Edresson	d39200e69b	Remove torchaudio requeriment	2021-12-20 11:54:10 +00:00
WeberJulian	2e516869a1	Fix trailing whitespace	2021-12-20 11:54:10 +00:00
WeberJulian	ffc269eaf4	Update docstring	2021-12-20 11:54:10 +00:00
Edresson	12968532fe	Add the language embedding dim in the duration predictor class	2021-12-20 11:54:10 +00:00
Edresson	4196a42de7	Get the number speaker from the Speaker Manager property	2021-12-20 11:54:10 +00:00
Edresson	f394d60695	Fix the bug in multispeaker vits	2021-12-20 11:54:10 +00:00
Edresson	90eac13bb2	Rename ununsed_speakers to ignored_speakers	2021-12-20 11:54:10 +00:00
Edresson	f34596d957	Fix function name	2021-12-20 11:54:10 +00:00
Edresson	45d0b04179	Lint fixs	2021-12-20 11:54:10 +00:00
Edresson	85418ffeaa	Fix the bug in extract tts spectrograms	2021-12-20 11:54:10 +00:00
Edresson	2b2cecaea2	Set the new_fields in copy_model_files as None by default	2021-12-20 11:54:10 +00:00
Edresson	34749f8727	Remove the call to get_speaker_manager	2021-12-20 11:54:10 +00:00
Edresson	b769b49e34	Remove the data from the set_d_vectors_from_file function	2021-12-20 11:54:10 +00:00
Edresson	9daa33d1fd	Remove unusable speaker manager function	2021-12-20 11:54:10 +00:00
Edresson	8c22d5ac49	Turn more clear the VITS loss function	2021-12-20 11:54:10 +00:00
Edresson	6fc3b9e679	Remove the unusable fine-tuning model	2021-12-20 11:54:10 +00:00
Edresson	352aa69eca	Create a module for the VAD script	2021-12-20 11:54:10 +00:00
WeberJulian	631addf33b	fix d-vector	2021-12-20 11:54:10 +00:00
WeberJulian	da6c1e858c	Fix small issues	2021-12-20 11:54:10 +00:00
WeberJulian	e8af6a9f08	Fix use_speaker_embedding logic	2021-12-20 11:54:10 +00:00
WeberJulian	23d789c072	Fix continue path	2021-12-20 11:54:10 +00:00
WeberJulian	120332d53f	Fix phonemes	2021-12-20 11:54:10 +00:00
WeberJulian	846bf16f02	fix imports for load_meta_data	2021-12-20 11:54:10 +00:00
WeberJulian	1340938159	fix phonemes per language	2021-12-20 11:54:10 +00:00
WeberJulian	e995a63bd6	fix linter	2021-12-20 11:54:10 +00:00
WeberJulian	1472b6df49	make style	2021-12-20 11:54:10 +00:00
WeberJulian	4d721bcabd	fix test sentence synthesis	2021-12-20 11:54:10 +00:00
WeberJulian	0804806727	fix f0_cache_path in dataset	2021-12-20 11:54:10 +00:00
WeberJulian	3b5592abcf	fix test vits	2021-12-20 11:54:10 +00:00
WeberJulian	2a2b5767c2	fix collate_fn	2021-12-20 11:54:10 +00:00
Julian WEBER	78c2d12a91	PitchExtractor	2021-12-20 11:54:10 +00:00
Julian WEBER	9a2f91327c	get_aux_input	2021-12-20 11:54:10 +00:00
Julian WEBER	b3abd01793	Merge dataset	2021-12-20 11:54:10 +00:00
Edresson	10ff90d6d2	Add remove silence VAD script	2021-12-20 11:54:10 +00:00
Edresson	1bd1a0546b	Add audio resample in the speaker consistency loss	2021-12-20 11:54:10 +00:00
Edresson	1c6bcda950	Add freeze vocoder generator and flow-based decoder option	2021-12-20 11:54:10 +00:00
WeberJulian	2b952d8b97	freeze vits parts	2021-12-20 11:54:10 +00:00
WeberJulian	005bba60b0	get_speaker_weighted_sampler	2021-12-20 11:54:10 +00:00
Edresson	9de4539422	Update the VITS model docs	2021-12-20 11:54:10 +00:00
Edresson	eeb8ac07d9	Add voice conversion fine tuning mode	2021-12-20 11:54:10 +00:00
Edresson	690b37d0ab	Add support to use the speaker encoder as loss function in VITS model	2021-12-20 11:54:09 +00:00
Edresson	9b011b1cb3	Add H/ASP original checkpoint support	2021-12-20 11:54:09 +00:00
Edresson	0bdfd3cb50	Add the ValueError in the restore checkpoint exception to avoid problems with the optimizer restauration when new keys are addition	2021-12-20 11:54:09 +00:00
Edresson	de78556655	Fix the optimizer parameters bug in multilingual and multispeaker training	2021-12-20 11:54:09 +00:00
Edresson	9be5b75da3	Fix bug after merge	2021-12-20 11:54:09 +00:00
Edresson	76251b619a	Fix d-vector multispeaker training bug	2021-12-20 11:54:09 +00:00
Edresson	7ef3ddc6ff	Fix unit tests	2021-12-20 11:54:09 +00:00
Edresson	36dcd11453	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	c53693c155	Implement vocoder Fine Tuning like SC-GlowTTS paper	2021-12-20 11:54:09 +00:00
Edresson	f1f016314e	Fix the bug in M-AILABS formatter	2021-12-20 11:54:09 +00:00
Edresson	c334d39acc	Add voice conversion support for the model VITS trained with external speaker embedding	2021-12-20 11:54:09 +00:00
Edresson	e997889ba8	Fix bug in VITS multilingual inference	2021-12-20 11:54:09 +00:00
Edresson	7c0b8ec572	Fix bugs in the non-multilingual VITS inference	2021-12-20 11:54:09 +00:00
Edresson	3fbbebd74d	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	ac9416fb86	Add multilingual inference support	2021-12-20 11:54:09 +00:00
Edresson	dcb2374bc9	Add multilingual training support to the VITS model	2021-12-20 11:54:09 +00:00
Edresson	f996afedb0	Implement multilingual dataloader support	2021-12-20 11:54:09 +00:00
Edresson	5f1c18187f	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	d91c595c5a	Implement training support with d_vecs in the VITS model	2021-12-20 11:54:09 +00:00
Edresson	6a7db67a91	Allow ignore speakers for all multispeaker datasets	2021-12-20 11:54:09 +00:00
Edresson	e0ad838066	Select randomly a speaker from the speaker manager for the test setences	2021-12-20 11:54:09 +00:00
Edresson	eb3e8affe1	Save speakers embeddings/ids before starting training	2021-12-20 11:54:09 +00:00
Eren Gölge	37803467aa	Merge pull request #1021 from loganhart420/dataset_downloaders Add addtional datasets	2021-12-20 10:42:20 +01:00
Reuben Morais	859ac1a54c	Include usage instructions in README	2021-12-17 11:37:19 +01:00
loganhart420	103c010eca	Add addtional datasets	2021-12-16 07:21:27 -05:00
Jörg Thalheim	bce143c738	server: fix compatibility with tts_models/en/ljspeech/fast_pitch (#893 )	2021-12-07 14:36:29 +01:00
Eren Gölge	babdd84f91	Fix GST inference commit d3e477875a7e46a101fcf95a1794442823750fe2 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Wed Nov 3 10:16:12 2021 +0000 Read .wav for GST conditioning from CL commit 074e6d0874d3b34fb6a4991fc17d66dccd413fbb Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 14:43:47 2021 +0100 Fix GST during inference in Tacotron2 commit fdece14585ab5a36eed1061a9a838d8e48aa6882 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Wed Nov 3 10:16:12 2021 +0000 Read .wav for GST conditioning from CL commit cd29e21b8d0a541ee298d2bf5f67223ad60be38f Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 14:43:47 2021 +0100 Fix GST during inference in Tacotron2 commit 908ce39370eadcc9fa8510cdb26c9ead87305427 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 12:49:37 2021 +0100 Make trim_db value negative commit 1008a2e0f72fa7ca7f0307424f570386f2f16d42 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 12:22:24 2021 +0100 Set find_endpoint db threshold in config.json	2021-12-07 13:28:49 +00:00
Eren Gölge	ce45d9e1af	Make style and lint	2021-12-01 10:42:52 +00:00
Eren Gölge	40cb8ac966	Fix #958	2021-12-01 10:33:34 +00:00
Eren Gölge	512ada7548	Fix callbacks against multi-gpu training	2021-12-01 10:32:14 +00:00
Eren Gölge	2ed9e3c241	Fix constant use of noise augment	2021-11-08 09:20:34 +01:00
Eren Gölge	b6b14a76af	Fix VITS stochastic duration predictor	2021-11-08 09:20:11 +01:00
Eren Gölge	dc3dd55dd9	Add collect_env_info.py	2021-11-08 08:59:08 +01:00
Eren Gölge	faafea4cf2	Fix style	2021-11-04 17:04:40 +01:00

... 5 6 7 8 9 ...

1872 Commits