coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren Gölge	bd461ace33	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:45:24 +01:00
Eren Gölge	5a9653978a	Refactor synthesis.py for TTSTokenizer	2022-02-25 10:45:24 +01:00
Eren Gölge	e5785b34b0	Style fix	2022-02-25 10:27:46 +01:00
Eren Gölge	e4049aa31a	Refactor TTSDataset to use TTSTokenizer	2022-02-25 10:27:46 +01:00
Eren Gölge	2480bbe937	Remove OLD TOKENIZATION ROUTINES	2022-02-25 09:32:54 +01:00
Eren Gölge	8d85af84cd	Implement Punctuation class	2022-02-25 09:32:54 +01:00
Eren Gölge	1aca58afaf	Fix imports in cleaners.py	2022-02-25 09:32:54 +01:00
Eren Gölge	0344645e90	Implement TTSTokenizer	2022-02-25 09:32:54 +01:00
Eren Gölge	2fb1f70503	Implement BaseCharacters, IPAPhonemes, Graphemes	2022-02-25 09:32:54 +01:00
Eren Gölge	1bee40af40	Create language folders under `TTS.tts.utils.text`	2022-02-25 09:32:54 +01:00
Eren Gölge	c1119bc291	Implement BasePhonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	dcd01356e0	Create `text/english` folder	2022-02-25 09:32:54 +01:00
Eren Gölge	80867c8e8c	Implement multi-phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	5e4f78add3	Implement espeak wrapper	2022-02-25 09:32:54 +01:00
Eren Gölge	e03a05c816	Implement gruut wrapper	2022-02-25 09:32:54 +01:00
Eren Gölge	172ba0c5e7	Implement JA_JP phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	ca02b82218	Implement ZH_CH phonemizer	2022-02-25 09:32:54 +01:00
Eren Gölge	a51b031bff	Merge branch 'dev' into dev-fix-glowtts-infer	2022-02-21 12:01:40 +03:00
Edresson Casanova	28a7464975	Fix the bug in split dataset function (#1251 ) * Fix the bug in split_dataset * Make eval_split_size configurable * Change test_loader to use load_tts_samples function * Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval * Fix samplers unit test * Add data unit test on GitHub workflow	2022-02-21 11:59:36 +03:00
Edresson Casanova	ba6e56e01c	Fix Glow-TTS multi-speaker inference	2022-02-18 19:25:29 +00:00
Eren Gölge	127118c637	Update TTS.tts formatters (#1228 ) * Return Dict from tts formatters * Make style	2022-02-11 23:03:43 +01:00
Edresson Casanova	0860d73cf8	Remove Tensorflow requeriment (#1225 ) * Remove TF modules * Remove TF unit tests * Remove TF vocoder modules * Remove TF convert scripts * Remove TF requirement * Remove the Docs TF instructions * Remove TF inference support	2022-02-10 16:14:54 +01:00
WeberJulian	e778bad626	Add argument to enable dp speaker conditioning	2022-01-06 15:07:27 +01:00
WeberJulian	e1accb6e28	Fix train_tts.py and uncomment code (#1051 ) * Fix SE loading and language embedding logic * remove trailing white space * Uncomment resmapling code for SCL	2022-01-03 17:44:57 +01:00
Eren Gölge	d724984be1	Fix language assignment	2022-01-02 11:11:24 +00:00
WeberJulian	a63998c048	Fix phoneme language	2022-01-01 21:08:13 +01:00
Eren Gölge	36cef5966b	Fix resnet speaker encoder	2021-12-30 15:36:35 +00:00
Eren Gölge	348b5c96a2	Fix speaker encoder test	2021-12-30 15:36:35 +00:00
Eren Gölge	7129b04d46	Update VITS model	2021-12-30 14:08:17 +00:00
Eren Gölge	5c5ddd2ba7	Init speaker manager for speaker encoder	2021-12-22 15:51:53 +00:00
Eren Gölge	a25269d897	Remove commented code	2021-12-20 11:54:10 +00:00
Eren Gölge	d29c3780d1	Use speaker_encoder from speaker manager in Vits	2021-12-20 11:54:10 +00:00
Eren Gölge	79de38ca76	Rename setup_model to setup_speaker_encoder_model	2021-12-20 11:54:10 +00:00
Eren Gölge	649dc9e9da	Remove redundant code	2021-12-20 11:54:10 +00:00
Eren Gölge	704dddcffa	Make style	2021-12-20 11:54:10 +00:00
WeberJulian	a564eb9f54	Add support for multi-lingual models in CLI	2021-12-20 11:54:10 +00:00
WeberJulian	2bbcb558dc	Prevent weighted sampler use when num_gpus > 1	2021-12-20 11:54:10 +00:00
WeberJulian	74cedfac38	Revert init multispeaker change	2021-12-20 11:54:10 +00:00
WeberJulian	9cfbacc622	Fix trailing space	2021-12-20 11:54:10 +00:00
WeberJulian	6b03943526	Move multilingual logic out of the trainer	2021-12-20 11:54:10 +00:00
Edresson	67dda0abe1	Add the SCL resample TODO	2021-12-20 11:54:10 +00:00
WeberJulian	8b52fb89d1	Fix merge bug	2021-12-20 11:54:10 +00:00
WeberJulian	09eda31a3f	Fix tests	2021-12-20 11:54:10 +00:00
Edresson	78a23e19df	Fix pylint checks	2021-12-20 11:54:10 +00:00
WeberJulian	4cd0e4eb0d	Remove self.audio_config from VITS	2021-12-20 11:54:10 +00:00
Edresson	d39200e69b	Remove torchaudio requeriment	2021-12-20 11:54:10 +00:00
WeberJulian	2e516869a1	Fix trailing whitespace	2021-12-20 11:54:10 +00:00
WeberJulian	ffc269eaf4	Update docstring	2021-12-20 11:54:10 +00:00
Edresson	12968532fe	Add the language embedding dim in the duration predictor class	2021-12-20 11:54:10 +00:00
Edresson	90eac13bb2	Rename ununsed_speakers to ignored_speakers	2021-12-20 11:54:10 +00:00
Edresson	f34596d957	Fix function name	2021-12-20 11:54:10 +00:00
Edresson	45d0b04179	Lint fixs	2021-12-20 11:54:10 +00:00
Edresson	b769b49e34	Remove the data from the set_d_vectors_from_file function	2021-12-20 11:54:10 +00:00
Edresson	9daa33d1fd	Remove unusable speaker manager function	2021-12-20 11:54:10 +00:00
Edresson	8c22d5ac49	Turn more clear the VITS loss function	2021-12-20 11:54:10 +00:00
Edresson	6fc3b9e679	Remove the unusable fine-tuning model	2021-12-20 11:54:10 +00:00
WeberJulian	631addf33b	fix d-vector	2021-12-20 11:54:10 +00:00
WeberJulian	da6c1e858c	Fix small issues	2021-12-20 11:54:10 +00:00
WeberJulian	e8af6a9f08	Fix use_speaker_embedding logic	2021-12-20 11:54:10 +00:00
WeberJulian	120332d53f	Fix phonemes	2021-12-20 11:54:10 +00:00
WeberJulian	1340938159	fix phonemes per language	2021-12-20 11:54:10 +00:00
WeberJulian	e995a63bd6	fix linter	2021-12-20 11:54:10 +00:00
WeberJulian	1472b6df49	make style	2021-12-20 11:54:10 +00:00
WeberJulian	4d721bcabd	fix test sentence synthesis	2021-12-20 11:54:10 +00:00
WeberJulian	0804806727	fix f0_cache_path in dataset	2021-12-20 11:54:10 +00:00
WeberJulian	3b5592abcf	fix test vits	2021-12-20 11:54:10 +00:00
WeberJulian	2a2b5767c2	fix collate_fn	2021-12-20 11:54:10 +00:00
Julian WEBER	78c2d12a91	PitchExtractor	2021-12-20 11:54:10 +00:00
Julian WEBER	9a2f91327c	get_aux_input	2021-12-20 11:54:10 +00:00
Julian WEBER	b3abd01793	Merge dataset	2021-12-20 11:54:10 +00:00
Edresson	1bd1a0546b	Add audio resample in the speaker consistency loss	2021-12-20 11:54:10 +00:00
Edresson	1c6bcda950	Add freeze vocoder generator and flow-based decoder option	2021-12-20 11:54:10 +00:00
WeberJulian	2b952d8b97	freeze vits parts	2021-12-20 11:54:10 +00:00
WeberJulian	005bba60b0	get_speaker_weighted_sampler	2021-12-20 11:54:10 +00:00
Edresson	9de4539422	Update the VITS model docs	2021-12-20 11:54:10 +00:00
Edresson	eeb8ac07d9	Add voice conversion fine tuning mode	2021-12-20 11:54:10 +00:00
Edresson	690b37d0ab	Add support to use the speaker encoder as loss function in VITS model	2021-12-20 11:54:09 +00:00
Edresson	9b011b1cb3	Add H/ASP original checkpoint support	2021-12-20 11:54:09 +00:00
Edresson	de78556655	Fix the optimizer parameters bug in multilingual and multispeaker training	2021-12-20 11:54:09 +00:00
Edresson	9be5b75da3	Fix bug after merge	2021-12-20 11:54:09 +00:00
Edresson	76251b619a	Fix d-vector multispeaker training bug	2021-12-20 11:54:09 +00:00
Edresson	7ef3ddc6ff	Fix unit tests	2021-12-20 11:54:09 +00:00
Edresson	36dcd11453	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	c53693c155	Implement vocoder Fine Tuning like SC-GlowTTS paper	2021-12-20 11:54:09 +00:00
Edresson	f1f016314e	Fix the bug in M-AILABS formatter	2021-12-20 11:54:09 +00:00
Edresson	c334d39acc	Add voice conversion support for the model VITS trained with external speaker embedding	2021-12-20 11:54:09 +00:00
Edresson	e997889ba8	Fix bug in VITS multilingual inference	2021-12-20 11:54:09 +00:00
Edresson	7c0b8ec572	Fix bugs in the non-multilingual VITS inference	2021-12-20 11:54:09 +00:00
Edresson	3fbbebd74d	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	ac9416fb86	Add multilingual inference support	2021-12-20 11:54:09 +00:00
Edresson	dcb2374bc9	Add multilingual training support to the VITS model	2021-12-20 11:54:09 +00:00
Edresson	f996afedb0	Implement multilingual dataloader support	2021-12-20 11:54:09 +00:00
Edresson	5f1c18187f	Fix pylint issues	2021-12-20 11:54:09 +00:00
Edresson	d91c595c5a	Implement training support with d_vecs in the VITS model	2021-12-20 11:54:09 +00:00
Edresson	6a7db67a91	Allow ignore speakers for all multispeaker datasets	2021-12-20 11:54:09 +00:00
Edresson	e0ad838066	Select randomly a speaker from the speaker manager for the test setences	2021-12-20 11:54:09 +00:00
Edresson	eb3e8affe1	Save speakers embeddings/ids before starting training	2021-12-20 11:54:09 +00:00
Eren Gölge	babdd84f91	Fix GST inference commit d3e477875a7e46a101fcf95a1794442823750fe2 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Wed Nov 3 10:16:12 2021 +0000 Read .wav for GST conditioning from CL commit 074e6d0874d3b34fb6a4991fc17d66dccd413fbb Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 14:43:47 2021 +0100 Fix GST during inference in Tacotron2 commit fdece14585ab5a36eed1061a9a838d8e48aa6882 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Wed Nov 3 10:16:12 2021 +0000 Read .wav for GST conditioning from CL commit cd29e21b8d0a541ee298d2bf5f67223ad60be38f Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 14:43:47 2021 +0100 Fix GST during inference in Tacotron2 commit 908ce39370eadcc9fa8510cdb26c9ead87305427 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 12:49:37 2021 +0100 Make trim_db value negative commit 1008a2e0f72fa7ca7f0307424f570386f2f16d42 Author: George Rousssos <25833833+george-roussos@users.noreply.github.com> Date: Fri Oct 29 12:22:24 2021 +0100 Set find_endpoint db threshold in config.json	2021-12-07 13:28:49 +00:00
Eren Gölge	2ed9e3c241	Fix constant use of noise augment	2021-11-08 09:20:34 +01:00
Eren Gölge	b6b14a76af	Fix VITS stochastic duration predictor	2021-11-08 09:20:11 +01:00
Eren Gölge	faafea4cf2	Fix style	2021-11-04 17:04:40 +01:00
Eren Gölge	c5077c6c3f	Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev	2021-11-01 16:42:27 +01:00
Eren Gölge	20cebde1c9	Add docstring to MAI labs formatter	2021-11-01 16:41:55 +01:00
Eren Gölge	608f437545	Add a function to find unique chars	2021-11-01 16:41:33 +01:00
Eren Gölge	d6d780e758	Fix FastSpeech config	2021-11-01 16:41:15 +01:00
Michael Hansen	3bc043faeb	Upgrade to gruut 2.0 (#882 )	2021-10-31 11:41:55 +01:00
Eren Gölge	2df0752e73	Model zoo tests (#900 ) * Fix VITS model multi-speaker init * Remove gdrive support in model manager * Add model zoo tests	2021-10-29 17:54:16 +02:00
Eren Gölge	035ed432bc	Doc update (#889 ) * Link source files from the docs * Update glowTTS recipes for docs * Add dataset downloaders	2021-10-26 17:41:33 +02:00
Eren Gölge	0cac3f330a	Enable custom formatter in load_tts_samples	2021-10-26 13:07:11 +02:00
Eren Gölge	00becf2671	Fix import statements	2021-10-25 19:29:16 +02:00
Eren Gölge	2b7d159383	Update BaseTTS for multi-speaker training	2021-10-21 16:29:06 +00:00
Eren Gölge	e62d3c5cf7	Use absolute imports for tts configs and models	2021-10-21 16:29:06 +00:00
Eren Gölge	82fed4add2	Make style	2021-10-21 16:05:51 +00:00
Eren Gölge	3cb07fb6b5	Fix SpeakerManager init with data items	2021-10-21 13:54:39 +00:00
Eren Gölge	aea90e2501	Comment synthesis.py	2021-10-21 13:53:45 +00:00
Eren Gölge	3ab009ca8d	Edit model configs for multi-speaker	2021-10-21 13:51:37 +00:00
Eren Gölge	cea8e1739b	Update AlignTTS to use SpeakerManager	2021-10-20 18:22:41 +00:00
Eren Gölge	0e768dd4c5	Update comments	2021-10-20 18:21:26 +00:00
Eren Gölge	7c2cb7cc30	Update BaseTTS	2021-10-20 18:18:22 +00:00
Eren Gölge	330ee7d208	Comment BaseTacotron and remove unused funcs	2021-10-20 18:17:25 +00:00
Eren Gölge	aa25f70b95	Update ForwardTTS for multi-speaker	2021-10-20 18:16:41 +00:00
Eren Gölge	0ebc2a400e	Implement `_set_speaker_embedding` in GlowTTS	2021-10-20 18:15:20 +00:00
Eren Gölge	3da79a4de4	Comment Tacotron2 model	2021-10-20 18:14:04 +00:00
Eren Gölge	c514351c0e	Refactor multi-speaker init in BaseTTS-Tacotron1-2	2021-10-18 08:55:45 +00:00
Eren Gölge	127571423c	Update multi-speaker init in BaseTTS	2021-10-18 08:54:41 +00:00
Eren Gölge	a0a5d580e9	Approximate audio length from file size	2021-10-18 08:54:02 +00:00
Eren Gölge	fcbfc53cb7	Fix linter	2021-10-15 10:24:19 +00:00
Eren Gölge	073a2d2eb0	Refactor VITS multi-speaker initialization	2021-10-15 10:20:00 +00:00
Eren Gölge	0565457faa	Fix #846	2021-10-14 14:46:14 +00:00
Eren Gölge	4dbe7ed0de	Fix all-zero duration case for GlowTTS	2021-10-01 09:24:26 +00:00
Eren Gölge	37959ad0c7	Make linter	2021-09-30 23:02:16 +00:00
Eren Gölge	043dca61b4	Rename `load_meta_data` as `load_tts_data`	2021-09-30 14:47:56 +00:00
Eren Gölge	9f23ad6a0f	Fix imports	2021-09-30 14:47:56 +00:00
Eren Gölge	4163b4f2e4	Update Tacotron models	2021-09-30 14:47:56 +00:00
Eren Gölge	45889804c2	Update VITS	2021-09-30 14:47:56 +00:00
Eren Gölge	fd95926009	Update GlowTTS	2021-09-30 14:47:56 +00:00
Eren Gölge	a156a40b47	Update ForwardTTS for Trainer_v2	2021-09-30 14:19:19 +00:00
Eren Gölge	d9df33f837	Update `align_tts` for trainer_v2	2021-09-30 14:18:10 +00:00
Eren Gölge	8ada870a57	Refactor `trainer.py` for v2	2021-09-30 14:16:34 +00:00
Eren Gölge	2766dd1d6e	Fix #813 - GlowTTS training (#814 ) * Fix #813 * Update glow_tts recipe * Fix glow-tts test * Linter fix * Run data dep init only in training	2021-09-17 20:06:55 +02:00
Eren Gölge	1ea011571a	Update SpeedySpeech config	2021-09-12 15:33:27 +00:00
Eren Gölge	cbbc9e0172	Add FastSpeechConfig	2021-09-11 10:20:37 +00:00
Eren Gölge	26f76fce22	Remove SpeedySpeech from .models.json	2021-09-10 17:47:27 +00:00
Eren Gölge	d97952611d	Remove unused import	2021-09-10 17:31:41 +00:00
Eren Gölge	d5f256b34c	Update tacotron `r` init	2021-09-10 17:26:23 +00:00
Eren Gölge	ab37fa9c39	Edit AlignTTS	2021-09-10 17:25:00 +00:00
Eren Gölge	66732025e1	Add `base_model` field to `forward_tts` configs	2021-09-10 17:23:48 +00:00
Eren Gölge	d6e29ef98a	Style update	2021-09-10 08:30:33 +00:00
Eren Gölge	a89eb12aca	Fix glow_tts imports	2021-09-10 08:29:51 +00:00
Eren Gölge	570d5971be	Implement `ForwardTTSLoss`	2021-09-10 08:29:12 +00:00
Eren Gölge	0541a25e90	Remove `fastpitch.py` and `speedy_speech.py`	2021-09-10 08:27:48 +00:00
Eren Gölge	3c16013199	Fix Vits imports	2021-09-10 08:26:34 +00:00
Eren Gölge	ed4b1d8514	Test `TTS.tts.utils.helpers`	2021-09-10 08:25:21 +00:00
Eren Gölge	8b7e094bde	Implement `forward_tts` - Generic API for feed-forward TTS models (FastPitch, SpeedySpeech) - Tests for `forward-tts` - Edit FastPitchConfig and SpeedySpeechConfig to use `forward_tts`	2021-09-10 08:24:33 +00:00
Eren Gölge	bfc6ceac29	Move MAS to `TTS.tts.utils.helpers`	2021-09-09 10:57:19 +00:00
Eren Gölge	537c8576ec	Stage `TTS.tts.utils.helpers`	2021-09-08 13:35:18 +00:00
Eren Gölge	4761853c5c	Fix imports	2021-09-08 13:34:40 +00:00
Eren Gölge	c1513ec4cd	Plot pitch over spectrogram	2021-09-06 15:16:58 +00:00
Eren Gölge	d847a68e42	Reformat multi-speaker handling in GlowTTS	2021-09-06 15:16:58 +00:00
Eren Gölge	8d41060d36	Plot unnormalized pitch by `FastPitch`	2021-09-06 15:16:58 +00:00
Eren Gölge	2b59da802c	Fix loader setup in `base_tts`	2021-09-06 15:16:58 +00:00
Eren Gölge	76c4929ab2	Fix attn mask reading bug	2021-09-06 15:16:58 +00:00
Eren Gölge	91a70e80b2	Refactor TTSDataset Return a dict by `collate` Refactor batch handling in `collate` A couple of bug fixes	2021-09-06 15:16:58 +00:00
Eren Gölge	29248536c9	Update `PositionalEncoding`	2021-09-06 15:16:58 +00:00
Eren Gölge	4672889549	Update `generic.FFTransformer`	2021-09-06 15:16:58 +00:00
Eren Gölge	2bf9e83c49	FastPitch refactor and commenting	2021-09-06 15:16:58 +00:00
Eren Gölge	59b24e66cf	Add `AlignerNetwork`	2021-09-06 15:16:58 +00:00
Eren Gölge	648655fa03	Add `PitchExtractor` and return dict by `collate`	2021-09-06 15:16:58 +00:00
Eren Gölge	debf772ec5	Implement binary alignment loss	2021-09-06 15:16:58 +00:00
Eren Gölge	6e9d4062f2	Add `sort_by_audio_len` option	2021-09-06 15:16:58 +00:00
Eren Gölge	59d52a4cd8	Disable autcast for criterions	2021-09-06 15:16:58 +00:00
Eren Gölge	98a7271ce8	Refactor FastPitchv2	2021-09-06 15:16:58 +00:00
Eren Gölge	e429afbce4	Enable aligner for FastPitch	2021-09-06 15:16:58 +00:00
Eren Gölge	81c228a2d8	Update FastPitch don't detach duration network inputs	2021-09-06 15:16:58 +00:00
Eren Gölge	ca29033ef4	Refactor FastPitch model	2021-09-06 15:16:58 +00:00
Eren Gölge	42862f7fdb	Format style of the recipes	2021-09-06 15:16:58 +00:00
Eren Gölge	5d59100a88	Don't use align_score for models with duration predictor	2021-09-06 15:16:58 +00:00
Eren Gölge	fac9dbe661	Update FastPitchLoss	2021-09-06 15:16:58 +00:00
Eren Gölge	b81560607b	Update docstrings	2021-09-06 15:16:58 +00:00
Eren Gölge	57b3aec1b9	Update docstring format	2021-09-06 15:16:58 +00:00
Eren Gölge	7692bfe7f8	Update FastPitch config	2021-09-06 15:16:58 +00:00
Eren Gölge	b7caad39e0	Make optional to detach duration predictor input	2021-09-06 15:16:58 +00:00
Eren Gölge	545a00fc04	Use absolute paths of the attention masks	2021-09-06 15:16:58 +00:00
Eren Gölge	bc396c393f	Add FastPitch model and FastPitchconfig	2021-09-06 15:16:58 +00:00
Eren Gölge	e802b24ad0	Compute mean and std pitch	2021-09-06 15:16:58 +00:00
Eren Gölge	8fffd4e813	Don't print computed phonemes It causes noise in logs	2021-09-06 15:16:58 +00:00
Eren Gölge	d085642ac1	Cache pitch features Cache the features at the beginning of `BaseTTS` training.	2021-09-06 15:16:58 +00:00
Eren Gölge	7590c7db7a	Fix `base_tacotron` `aux_input` handling	2021-09-06 15:16:58 +00:00
Eren Gölge	db32162eae	Fix `FastPitchLoss`	2021-09-06 15:16:58 +00:00
Eren Gölge	994f2be2c1	Add comput_f0 field	2021-09-06 15:16:58 +00:00
Eren Gölge	c8d999b010	Add FastPitchLoss	2021-09-06 15:16:58 +00:00
Eren Gölge	fba257104d	Compute F0 using librosa	2021-09-06 15:16:58 +00:00
Katsuya Iida	165e5814af	Update Japanese phonemizer (#758 ) * Update default ja vocoder * update * Japanese phonemizer test * Run make style Co-authored-by: Eren Gölge <egolge@coqui.ai>	2021-09-01 09:33:15 +02:00
Eren Gölge	2b7e55f01f	Fix vits args types	2021-08-30 23:24:20 +00:00
Eren Gölge	18da8f5dbd	Update pylint 2.10.2 and fix lint issues	2021-08-30 08:10:35 +00:00
Eren Gölge	f186856e5d	Add option to sort input sequnce by audio len	2021-08-30 08:10:35 +00:00
Eren Gölge	2620f62ea8	Move duration_loss inside VitsGeneratorLoss	2021-08-27 07:07:07 +00:00
Eren Gölge	49e1181ea4	Fixes for the vits model	2021-08-26 17:15:09 +00:00
Eren Gölge	3ab8cef99e	Fix VITS model SPD	2021-08-18 14:55:46 +00:00
Eren Gölge	7c0d564965	Syncronize DDP processes	2021-08-13 10:40:50 +00:00
Eren Gölge	ecf5f17dca	Fix distribute.py and ddp training	2021-08-12 22:22:32 +00:00
Eren Gölge	c8b9ca3d71	Fix Tacotron num_char init	2021-08-10 08:56:34 +00:00
Eren Gölge	6af03ac476	Fix `num_char` init in Tacotron models	2021-08-09 21:46:15 +00:00
Eren Gölge	06018251e6	Add VITS and GlowTTS class docs 🗒️	2021-08-09 18:02:36 +00:00
Eren Gölge	6a7275881d	Add VitsConfig docstring	2021-08-09 18:02:36 +00:00
Eren Gölge	f7a72552f1	Make duration predictor dropout configurable	2021-08-09 18:02:36 +00:00
Eren Gölge	c312acac7d	Implement VITS model 🚀 VITS model implementation built on Glow TTS and HiFiGAN layers.	2021-08-09 18:02:36 +00:00
Eren Gölge	232a5abb6a	Update `tts.setup_model` Run `model.make_symbols()` if availabe to set the symbol list	2021-08-09 18:02:36 +00:00
Eren Gölge	f5a6aa974f	Modify `symbols.py` not to add _arpanet	2021-08-09 18:02:36 +00:00
Eren Gölge	003e5579e8	Enable `custom_symbols` in text processing Models can define their own custom symbols lists with custom `make_symbols()`	2021-08-09 18:02:36 +00:00
Eren Gölge	bd4e29b4dd	Add `compute_linear_spec=False` to `BaseTTSConfig`	2021-08-09 18:02:36 +00:00
Eren Gölge	e4648ffef1	Fix multi-speaker init of Tacotron models & tests	2021-08-09 18:02:36 +00:00
Eren Gölge	01324c8e70	Update `base_tts.py` Enable calling `make_symbols()` from the model if defined. Compatibility changes for end2end `tts` models in batch formatting. Changes in multi-speaker initialization. Modify `test_run()` to work with dict output iof `synthesis`	2021-08-09 18:02:36 +00:00
Agrin Hilmkil	ced4cfdbbf	Allow saving / loading checkpoints from cloud paths (#683 ) * Allow saving / loading checkpoints from cloud paths Allows saving and loading checkpoints directly from cloud paths like Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec. Note: The user will have to install the relevant dependency for each protocol. Otherwise fsspec will fail and specify which dependency is missing. * Append suffix _fsspec to save/load function names * Add a lower bound to the fsspec dependency Skips the 0 major version. * Add missing changes from refactor * Use fsspec for remaining artifacts * Add test case with path requiring fsspec * Avoid writing logs to file unless output_path is local * Document the possibility of using paths supported by fsspec * Fix style and lint * Add missing lint fixes * Add type annotations to new functions * Use Coqpit method for converting config to dict * Fix type annotation in semi-new function * Add return type for load_fsspec * Fix bug where fs not always created * Restore the experiment removal functionality	2021-08-09 18:02:36 +00:00
Eren Gölge	d9e18e009b	Skip phoneme cache pre-compute if the path exists	2021-08-09 18:02:36 +00:00
Eren Gölge	4b7b88dd3d	Add fullband-melgan DE vocoder	2021-07-26 15:38:30 +02:00
Eren Gölge	75b201c6c1	Merge pull request #673 from coqui-ai/fix_stopnet Fix stopnet training for Tacotron models	2021-07-24 12:25:38 +02:00
Eren Gölge	fc0c4600bd	Fix stopnet training	2021-07-24 11:39:54 +02:00
Eren Gölge	30eed347b6	Merge pull request #581 from Edresson/dev Compute speaker embeddings in batch for the LSTM Speaker Encoder and Compute embeddings/ finding chars using config file.	2021-07-23 17:22:51 +02:00
WeberJulian	25832eb97b	Changes for review	2021-07-15 11:38:45 +02:00
Edresson	b1620d1f3f	remove ignore generate eval flag	2021-07-15 03:34:28 -03:00
WeberJulian	c79a82ed07	refix linter	2021-07-13 23:12:18 +02:00
WeberJulian	7d92b30946	Fix tests	2021-07-13 23:00:34 +02:00
WeberJulian	32974dd6a9	Fix test sentences synthesis	2021-07-13 16:07:13 +02:00
Edresson	2e5baffa9c	Merge fix and eval split as argparse	2021-07-13 01:47:32 -03:00
eren golge	3c0454490f	Fix #616	2021-07-06 14:44:03 +02:00
Eren Gölge	c25a2184e7	Add docs for `SpeakerManager`	2021-07-03 13:55:27 +02:00
Eren Gölge	f382e4c700	Fix linter warnings	2021-07-03 13:30:24 +02:00
Eren Gölge	196876feb1	Fix `ModelManager` model download	2021-07-02 10:47:05 +02:00
Eren Gölge	9352cb4136	Format Align TTS docstrings	2021-07-02 10:45:58 +02:00
Eren Gölge	95ad72f38f	Fix glow tts initialization	2021-07-02 10:45:37 +02:00
Eren Gölge	40b0b5365e	Let `get_characters` return `num_chars`	2021-07-02 10:45:00 +02:00
Eren Gölge	0fa6a8c9b8	Fix glow tts default parameters	2021-07-02 10:44:23 +02:00
Eren Gölge	2e1a428b83	Update glowtts docstrings and docs	2021-06-30 14:30:55 +02:00
Eren Gölge	ae6405bb76	Docstrings for `Trainer`	2021-06-28 17:03:47 +02:00
Eren Gölge	d42d1c02ea	Use `torch.linalg.qr` for pytorch > `v1.9.0`	2021-06-28 17:03:47 +02:00
Eren Gölge	9790eddada	Fix wrong argument name 🛠️	2021-06-28 17:03:47 +02:00
Eren Gölge	932ab107ae	Docstring edit in `TTSDataset.py` ✍️	2021-06-28 17:03:47 +02:00
Eren Gölge	8c74f054f0	Enable support for 🐍 python 3.10 Bump up versions numpy 1.19.5 and TF 2.5.0	2021-06-28 17:03:47 +02:00
Eren Gölge	9455a2b01e	Apply small fixes for API compatibility	2021-06-28 17:03:47 +02:00
Eren Gölge	a5d5bc9063	Print `max_decoder_steps` when model reaches the limit	2021-06-28 17:03:47 +02:00
Eren Gölge	f23b228e24	Update `speaker_manager`	2021-06-28 17:03:47 +02:00
Eren Gölge	51005cdab4	Update `tts.models.setup_model`	2021-06-28 17:03:19 +02:00
Eren Gölge	7b8c15ac49	Create base 🐸TTS model abstraction for tts models	2021-06-28 17:03:19 +02:00
Eren Gölge	786170fe7d	Update tts model configs	2021-06-28 17:03:19 +02:00
Eren Gölge	98298ee671	Implement unified IO utils	2021-06-28 17:03:19 +02:00
Eren Gölge	c7aad884cd	Implement unified trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	6d7b5fbcde	`tts` model abstraction with `TTSModel`	2021-06-28 17:03:19 +02:00
Eren Gölge	d4dbd89752	fix calculation of `loader_start_time`	2021-06-28 17:03:19 +02:00
Eren Gölge	c754a0e17d	`TrainerAbstract` and related updates for `TrainerTTS`	2021-06-28 17:03:19 +02:00
Eren Gölge	00c82c516d	rename to	2021-06-28 17:03:19 +02:00
Eren Gölge	166f0aeb9a	merge if branches with the same implementation	2021-06-28 17:03:19 +02:00
Eren Gölge	03494ad642	adjust `distribute.py` for the `train_tts.py`	2021-06-28 17:03:19 +02:00
Eren Gölge	fdfb18d230	downsize melgan test model size	2021-06-28 17:03:19 +02:00
Eren Gölge	25238e0658	fix glow-tts `inference()`	2021-06-28 17:03:19 +02:00
Eren Gölge	419735f440	refactor and fix multi-speaker training in Trainer and Tacotron models	2021-06-28 17:03:19 +02:00
Eren Gölge	269e5a734e	add max_decoder_steps argument to tacotron models	2021-06-28 17:03:19 +02:00
Eren Gölge	2c38ef8441	use get_speaker_manager in Trainer and save speakers.json file when needed	2021-06-28 17:03:19 +02:00
Eren Gölge	802d461389	Compute d_vectors and speaker_ids separately in TTSDataset	2021-06-28 17:03:19 +02:00
Eren Gölge	db6a97d1a2	rename external speaker embedding arguments as `d_vectors`	2021-06-28 17:03:19 +02:00
Eren Gölge	9042ae9195	use `to_cuda()` for moving data in `format_batch()`	2021-06-28 17:03:19 +02:00
Eren Gölge	f82f1970b8	change `to(device)` to `type_as` in models	2021-06-28 17:03:19 +02:00
Eren Gölge	1fa15c195a	docstring fix	2021-06-28 17:03:19 +02:00
Eren Gölge	1c8a3d7c86	make style	2021-06-28 17:03:19 +02:00
Eren Gölge	30211512a4	fix type annotations	2021-06-28 17:03:19 +02:00
Eren Gölge	b22b7620c3	update glow-tts output shapes to match [B, T, C]	2021-06-28 17:03:19 +02:00
Eren Gölge	8381379938	formating `cond_input` with a function in Tacotron models	2021-06-28 17:03:19 +02:00
Eren Gölge	6c495c6a6e	fix glow-tts inference and forward functions for handling `cond_input` and refactor its test	2021-06-28 17:03:19 +02:00
Eren Gölge	f840268181	refactor `SpeakerManager`	2021-06-28 17:03:19 +02:00
Eren Gölge	421194880d	linter fixes	2021-06-28 17:03:19 +02:00
Eren Gölge	d96ebcd6d3	make style	2021-06-28 17:03:19 +02:00
Eren Gölge	b500338faa	make style	2021-06-28 17:03:19 +02:00
Eren Gölge	c680a07a20	fix `Synthesized` for the new `synthesis()`	2021-06-28 17:03:19 +02:00
Eren Gölge	bb355b7441	update align_tts.py model for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	9203b863d9	update align_tts_loss for trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	fc9a0fb8ce	update aling_tts_config for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	b8a4af4010	update `synthesis.py` for being more generic	2021-06-28 17:03:19 +02:00
Eren Gölge	c70d0c9dae	update `speedy_speech.py` model for trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	06ee57d816	update `speedy_speecy_config.py` for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	4e910993f1	update tacotron model to return `model_outputs`	2021-06-28 17:03:19 +02:00
Eren Gölge	bb4deee64c	update glow-tts for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	9134c7dfb6	update `sequence_mask` import globally	2021-06-28 17:03:19 +02:00
Eren Gölge	b2218e882a	update `glow_tts_config.py` for setting the optimizer and the scheduler	2021-06-28 17:03:19 +02:00
Eren Gölge	f4f83b6379	update `synthesis.py` for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	130781dab6	remove `tts.generic_utils` as all the functions are moved to other files	2021-06-28 17:03:19 +02:00
Eren Gölge	535a458f40	update Tacotron models for the trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	bdbfc95618	add `gradual_training` argument to tacotron.py	2021-06-28 17:03:19 +02:00
Eren Gölge	5a2e75f0ee	import missings for tacotron.py	2021-06-28 17:03:19 +02:00
Eren Gölge	da7d10e53c	mode `setup_model()` to `models/__init__.py`	2021-06-28 17:03:19 +02:00
Eren Gölge	ca302db7b0	add sequence_mask to `utils.data`	2021-06-28 17:03:19 +02:00
Eren Gölge	844abb3b1d	`setup_loss()` in `layer/__init__.py`	2021-06-28 17:03:19 +02:00
Eren Gölge	a20a1c7d06	rename preprocess.py -> formatters.py	2021-06-28 17:03:19 +02:00
Eren Gölge	b9bccbb243	move load_meta_data and related functions to `datasets/__init__.py`	2021-06-28 17:03:19 +02:00
Eren Gölge	d09385808a	set test_sentences in config	2021-06-28 17:03:19 +02:00
Eren Gölge	8def3c87af	trainer-API updates	2021-06-28 17:03:19 +02:00
Eren Gölge	42554cc711	rename MyDataset -> TTSDataset	2021-06-28 17:03:19 +02:00
Edresson	1c4e806f54	use speaker manager on compute embeddings script	2021-06-27 03:35:34 -03:00
Edresson Casanova	eb84bb2bc8	Merge branch 'dev' into dev	2021-06-26 15:32:19 -03:00
Michael Hansen	3f172b84d8	Fix linting issues	2021-06-25 14:41:31 +02:00
Michael Hansen	4d8426fa0a	Use eSpeak IPA lexicons by default for phoneme models	2021-06-25 14:41:05 +02:00
Michael Hansen	618b509204	Use combined characters available in TTS phonemes (like ç)	2021-06-25 14:41:05 +02:00
Michael Hansen	da6f6a4a01	Update docstring for clean_gruut_phonemes	2021-06-25 14:41:05 +02:00
Michael Hansen	47191f3ecc	Add tests for gruut phonemization	2021-06-25 14:41:05 +02:00
Michael Hansen	67869e77f9	Use gruut for phonemization	2021-06-25 14:41:05 +02:00
Edresson	28bec238ca	fix Lint checks	2021-06-18 14:33:50 -03:00
Edresson	83644056e3	fix Lint checks	2021-06-18 14:32:28 -03:00
Edresson Casanova	e78e3cd81e	Merge branch 'dev' into dev	2021-06-18 14:10:03 -03:00
Edresson	b74b510d3c	Compute embeddings and find characters using config file	2021-06-18 14:04:49 -03:00
Eren Gölge	49c5e5d820	maket style japanese PR	2021-06-02 11:44:46 +02:00
Eren Gölge	73b4083c6c	Merge pull request #502 from kaiidams/kaiidams/kokoro Japanese Tacotron 2 model	2021-06-02 10:20:08 +02:00
Alexander Korolev	c1eb9bdcca	fix speaker dim inference	2021-06-01 15:15:26 +02:00
Katsuya Iida	1cc18d1972	Move unittest of Japanese phonemizer.	2021-06-01 18:51:34 +09:00
Alexander Korolev	5b89ef2c6e	fix speaker-embeddings dimension during inference	2021-06-01 11:06:35 +02:00
Katsuya Iida	c4a5a73f18	update Kokoro config	2021-05-29 19:17:27 +09:00
Katsuya Iida	3a9ac2de4a	Merge remote-tracking branch 'coqui-ai/main' into kaiidams/kokoro	2021-05-29 09:39:23 +09:00
Katsuya Iida	d0c9c1ca5c	Move TTS/tts/utils/japanese	2021-05-29 09:21:47 +09:00
Edresson	099142d4dd	bug fix	2021-05-27 21:50:56 -03:00
Katsuya Iida	c4987e9d4e	Move import at the head of the file.	2021-05-28 00:22:57 +09:00
Eren Gölge	925c08cf95	replace unidecode with anyascii	2021-05-27 14:02:44 +02:00
Eren Gölge	c6f22aaa67	fix #509	2021-05-27 13:09:15 +02:00
Katsuya Iida	f921a05bdb	Fixed lint errors	2021-05-26 19:02:16 +09:00
Katsuya Iida	0536aa6d0f	Japanese Tacotron 2 model	2021-05-22 17:12:19 +09:00
Eren Gölge	5482a0f62d	type def for gradual_training	2021-05-19 14:03:26 +02:00
Eren Gölge	df6a98d0c3	type def for gradual_training	2021-05-19 14:00:44 +02:00
Eren Gölge	8a7c40736c	set use_phonemes false	2021-05-19 01:27:26 +02:00
Eren Gölge	ccfaa6b1d5	add `needs_phonemizer` field to models.json. If set true these models are only compatible with v0.0.13 or below.	2021-05-18 17:57:28 +02:00
Eren Gölge	a14fcf2a13	remove text_processing test	2021-05-18 17:57:28 +02:00
Eren Gölge	d7fae3f515	remove all espeaker and phonemizer deps	2021-05-18 17:57:28 +02:00
Eren Gölge	ced05e812a	move chinese phonemizer	2021-05-18 17:57:28 +02:00
Eren Gölge	218af1d9a2	change `list` to `List` in config	2021-05-18 17:30:27 +02:00
Eren Gölge	d1b469935d	tacotron DDC LJSpeech recipe	2021-05-17 11:42:14 +02:00
Eren Gölge	34a42d379f	update tacotron_config.py for checking `r` and the docstring	2021-05-17 11:35:30 +02:00
Eren Gölge	12722501bb	styling	2021-05-15 23:48:31 +02:00
Eren Gölge	8b1014d188	add docstrings with default value fixes	2021-05-15 23:45:10 +02:00
Eren Gölge	0213e1cbf4	update configs for tts models to match the field typed with the expected values	2021-05-12 00:57:38 +02:00
Eren Gölge	843d1b3d98	linter fixes	2021-05-11 11:30:00 +02:00
Eren Gölge	19fb1d743d	style update	2021-05-11 11:30:00 +02:00
Eren Gölge	21dd4d7960	fix load_config imports for Coqpit	2021-05-11 11:29:18 +02:00
Eren Gölge	c57f0b46bb	reintro use_gst for backwars compat	2021-05-11 11:29:18 +02:00
Eren Gölge	9ee70af9bb	code styling	2021-05-11 11:29:18 +02:00
Eren Gölge	7663bc63c1	add Coqpit configs for the TTS models	2021-05-11 11:29:17 +02:00
Eren Gölge	7227e8f1d2	update train_align_tts.py for coqpit	2021-05-11 11:29:17 +02:00
Eren Gölge	51a7e06945	glow_tts_config.py and train test on python	2021-05-11 11:29:17 +02:00
Eren Gölge	720fe13056	update glow_tts modules and training script for coqpit use	2021-05-11 11:29:17 +02:00
Eren Gölge	816e7ee698	remove default configs.json as replacing with Coqpit configs	2021-05-11 11:29:17 +02:00
Eren Gölge	647163397d	coqpit refactoring	2021-05-11 11:29:17 +02:00
Eren Gölge	eaa130e813	fix tacotron for coqpit	2021-05-11 11:29:17 +02:00
Eren Gölge	05d9543ed8	init GST module using gst config in Tacotron models	2021-05-11 11:29:17 +02:00
Eren Gölge	93a00373f6	move split_dataset	2021-05-11 11:29:17 +02:00
Eren Gölge	79d7215142	config refactor #5 WIP	2021-05-11 11:29:17 +02:00

... 5 6 7 8 9 ...

922 Commits