coqui-tts

Commit Graph

Author	SHA1	Message	Date
Jindrich Matousek	97de55595f	Merge remote-tracking branch 'upstream/main'	2022-08-24 12:21:18 +02:00
Eren Gölge	946afa8197	v0.8.0 (#1810 ) * Fix checkpointing GAN models (#1641) * checkpoint sae step crash fix * checkpoint save step crash fix * Update gan.py updated requested changes * crash fix * Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469) Co-authored-by: Eren Gölge <erogol@hotmail.com> * Fix Publish CI (#1597) * Try out manylinux * temporary removal of useless pipeline * remove check and use only manylinux * Try --plat-name * Add install requirements * Add back other actions * Add PR trigger * Remove conditions * Fix sythax * Roll back some changes * Add other python versions * Add test pypi upload * Add username * Add back __token__ as username * Modify name of entry to testpypi * Set it to release only * Fix version checking * Fix tokenizer for punc only (#1717) * Remove redundant config field * Fix SSIM loss * Separate loss tests * Fix BCELoss adressing #1192 * Make style * Add durations as aux input for VITS (#1694) * Add durations as aux input for VITS * Make style * Fix tts_tests * Fix test_get_aux_input * Make lint * feat: updated recipes and lr fix (#1718) - updated the recipes activating more losses for more stable training - re-enabling guided attention loss - fixed a bug about not the correct lr fetched for logging * Implement VitsAudioConfig (#1556) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style * Fix device allocation * Fix SSIM loss correction * Fix aux tests (#1753) * Set n_jobs to 1 for resample script * Delete resample test * Set n_jobs 1 in vad test * delete vad test * Revert "Delete resample test" This reverts commit `bb7c8466af`. * Remove tests with resample * Fix for FloorDiv Function Warning (#1760) * Fix for Floor Function Warning Fix for Floor Function Warning * Adding double quotes to fix formatting Adding double quotes to fix formatting * Update glow_tts.py * Update glow_tts.py * Fix type in download_vctk.sh (#1739) typo in comment * Update decoder.py (#1792) Minor comment correction. * Update requirements.txt (#1791) Support for #1775 * Update README.md (#1776) Fix typo in different and code sample * Fix & update WaveRNN vocoder model (#1749) * Fixes KeyError bug. Adding logging to dashboard. * Make pep8 compliant * Make style compliant * Still fixing style * Fix rand_segment edge case (input_len == seg_len - 1) * Update requirements.txt; inflect==5.6 (#1809) New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 Force inflect==5.6 (pydantic free) install to solve dependency issue. * Update README.md; download progress bar in CLI. (#1797) * Update README.md - minor PR - added model_info usage guide based on #1623 in README.md . * "added tqdm bar for model download" * Update manage.py * fixed style * fixed style * sort imports * Update wavenet.py (#1796) * Update wavenet.py Current version does not use "in_channels" argument. In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. However, since it is a generic implementation, I believe it is better to update it for a more general use. * "in_channels -> hidden_channels" * Adjust default to be able to process longer sentences (#1835) Running `tts --text "$text" --out_path …` with a somewhat longer sentences in the text will lead to warnings like “Decoder stopped with max_decoder_steps 500” and the sentences just being cut off in the resulting WAV file. This happens quite frequently when feeding longer texts (e.g. a blog post) to `tts`. It's particular frustrating since the error is not always obvious in the output. You have to notice that there are missing parts. This is something other users seem to have run into as well [1]. This patch simply increases the maximum number of steps allowed for the tacotron decoder to fix this issue, resulting in a smoother default behavior. [1] https://github.com/mozilla/TTS/issues/734 * Fix language flags generated by espeak-ng phonemizer (#1801) * fix language flags generated by espeak-ng phonemizer * Style * Updated language flag regex to consider all language codes alike * fix get_random_embeddings --> get_random_embedding (#1726) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding * Introduce numpy and torch transforms (#1705) * Refactor audio processing functions * Add tests for numpy transforms * Fix imports * Fix imports2 * Implement bucketed weighted sampling for VITS (#1871) * Update capacitron_layers.py (#1664) crashing because of dimension miss match at line no. 57 [batch, 256] vs [batch , 1, 512] enc_out = torch.cat([enc_out, speaker_embedding], dim=-1) * updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853) * Fix BCE loss issue (#1872) * Fix BCE loss issue * Remove import * Remove deprecated files (#1873) - samplers.py is moved - distribute.py is replaces by the 👟Trainer * Handle when no batch sampler (#1882) * Fix tune wavegrad (#1844) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort * Bump up to v0.8.0 * Add new DE Thorsten models (#1898) - Tacotron2-DDC - HifiGAN vocoder Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com> Co-authored-by: camillem <camillem@users.noreply.github.com> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: a-froghyar <adamfroghyar@gmail.com> Co-authored-by: ivan provalov <iprovalo@yahoo.com> Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com> Co-authored-by: p0p4k <rajiv.punmiya@gmail.com> Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com> Co-authored-by: vanIvan <alfa1211@gmail.com> Co-authored-by: Lars Kiesow <lkiesow@uos.de> Co-authored-by: rbaraglia <baraglia.r@live.fr> Co-authored-by: jchai.me <jreus@users.noreply.github.com> Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>	2022-08-22 14:54:38 +02:00
Jindrich Matousek	1104c47524	Change rights to executable	2022-05-25 15:46:43 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	f81892483d	REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349 ) * Rename Speaker encoder module to encoder * Add a generic emotion dataset formatter * Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config * Add class map in emotion config * Add Base encoder config * Add evaluation encoder script * Fix the bug in plot_embeddings * Enable Weight decay for encoder training * Add argumnet to disable storage * Add Perfect Sampler and remove storage * Add evaluation during encoder training * Fix lint checks * Remove useless config parameter * Active evaluation in speaker encoder test and use multispeaker dataset for this test * Unit tests fixs * Remove useless tests for speedup the aux_tests * Use get_optimizer in Encoder * Add BaseEncoder Class * Fix the unitests * Add Perfect Batch Sampler unit test * Add compute encoder accuracy in a function	2022-03-11 14:43:40 +01:00
Edresson Casanova	f381e29b91	REBASED: Add support for the speaker encoder training using torch spectrograms (#1348 ) * Add support for the speaker encoder training using torch spectrograms * Remove useless function in speaker encoder dataset class	2022-03-10 14:54:51 +01:00
Eren Gölge	1e414b3a09	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	bf540f4323	Update imports for trainer	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	be3a03126a	Update imports for trainer	2022-02-25 11:28:14 +01:00
Eren Gölge	56378b12f7	Fix speaker encoder init	2021-12-21 12:26:25 +00:00
Eren Gölge	2e9b6b4f90	Refactor Speaker Encoder training	2021-09-30 14:47:56 +00:00
Ayush Chaurasia	936a47504d	Update Logger API, recipes	2021-08-09 18:34:00 +00:00
Ayush Chaurasia	f63cf46c55	Unified logger API	2021-08-09 18:34:00 +00:00
Ayush Chaurasia	f606741dc4	Add artifacts logging , wandb args	2021-08-09 18:31:16 +00:00
Agrin Hilmkil	ced4cfdbbf	Allow saving / loading checkpoints from cloud paths (#683 ) * Allow saving / loading checkpoints from cloud paths Allows saving and loading checkpoints directly from cloud paths like Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec. Note: The user will have to install the relevant dependency for each protocol. Otherwise fsspec will fail and specify which dependency is missing. * Append suffix _fsspec to save/load function names * Add a lower bound to the fsspec dependency Skips the 0 major version. * Add missing changes from refactor * Use fsspec for remaining artifacts * Add test case with path requiring fsspec * Avoid writing logs to file unless output_path is local * Document the possibility of using paths supported by fsspec * Fix style and lint * Add missing lint fixes * Add type annotations to new functions * Use Coqpit method for converting config to dict * Fix type annotation in semi-new function * Add return type for load_fsspec * Fix bug where fs not always created * Restore the experiment removal functionality	2021-08-09 18:02:36 +00:00
Edresson	4eac1c4651	bug fix on train_encoder and unit tests	2021-07-11 12:00:39 -03:00
Eren Gölge	c7aad884cd	Implement unified trainer	2021-06-28 17:03:19 +02:00
Eren Gölge	8f47f95998	correct import of `load_meta_data` remove redundant import	2021-06-28 17:03:19 +02:00
Eren Gölge	bec85ac58d	make style	2021-05-31 16:37:15 +02:00
Edresson	7448177b72	use SpeakerManager on compute embeddings script	2021-05-29 21:11:53 -03:00
Edresson	c90037c2e9	solve merge problems	2021-05-26 16:01:30 -03:00
Edresson Casanova	f89cb6aec2	Merge branch 'dev' into dev	2021-05-25 17:30:25 -03:00
Edresson	d570c2d790	pylint fix and data loader bug fix	2021-05-26 01:11:37 -03:00
Edresson	856ea19758	bug fix in dataloader and update inference	2021-05-18 03:43:16 -03:00
Edresson	3fcc748b2e	implement the Speaker Encoder H/ASP	2021-05-11 16:27:05 -03:00
Eren Gölge	843d1b3d98	linter fixes	2021-05-11 11:30:00 +02:00
Eren Gölge	19fb1d743d	style update	2021-05-11 11:30:00 +02:00
Eren Gölge	9f7599e3c3	fix train_encoder for coqpit	2021-05-11 11:29:18 +02:00
Eren Gölge	3fde2001b1	train_encoder refactoring for coqpit	2021-05-11 11:29:18 +02:00
Edresson	85ccad7e0a	add Audio data augamentation Addtive and RIR	2021-05-11 00:59:57 -03:00
Edresson	77d85c6cc5	add softmaxproto loss and bug fix in data loader	2021-05-10 17:08:38 -03:00
Eren Gölge	f519012dea	reformatting and styling	2021-04-12 11:47:39 +02:00
Eren Gölge	08581deb61	linter updates	2021-03-08 02:53:02 +01:00
erogol	d5a0190c4b	update copy_config_file to copy_model_files	2021-01-06 13:19:40 +01:00
Eren Gölge	2473b2dc62	Merge pull request #559 from krzim/patch-1 Fix import to grab the encoder model save function	2020-12-10 00:19:32 +01:00
Qingping Hou	b0b97d636f	speed up metafile build for voxceleb	2020-11-14 23:45:17 -08:00
Qingping Hou	0cc3650ef6	support loading config in yaml	2020-11-14 00:13:53 -08:00
krzim	2202e171c5	Fix import to grab the encoder model save function I saw that this was recently changed but I'm not sure if it should have been. This is the correct function given the arguments provided to it in the train loop.	2020-10-29 18:03:11 -04:00
erogol	154f90bc44	format speaker encoder imports	2020-09-28 11:19:19 +02:00
mueller91	cfeeef7a7f	fix: broken imports and missing files after merging in latest commits from mozilla/dev into mueller91/dev. speaker_encoder's config.json and visuals.py are missing in the current dev branch of MozillaTTS, and some imports are broken.	2020-09-22 20:10:41 +02:00
mueller91	1fe5eb054f	Merge branch 'dev' of https://github.com/mozilla/TTS into dev Conflicts: TTS/bin/train_encoder.py requirements.txt	2020-09-22 19:58:53 +02:00
mueller91	df4caec4b7	add: check_config for speaker_encoder	2020-09-22 19:52:09 +02:00
erogol	8150d5727e	Merge branch 'dev' of https://github.com/mozilla/TTS into dev	2020-09-21 14:21:55 +02:00
mueller	6b0621c794	cleanup	2020-09-17 16:46:43 +02:00
mueller	a273b1a210	add: add random noise to dataset	2020-09-17 14:23:40 +02:00
mueller	e36a3067e4	add: save wavs instead feats to storage. This is done in order to mitigate staleness when caching and loading from data storage	2020-09-17 14:14:30 +02:00
mueller	1511076fde	add: Configurable encoder dataset storage to reduce disk I/O add: Averaged time for data loader to console and Tensorboard output	2020-09-17 12:29:38 +02:00
erogol	f9001a4bdd	refactor and fix compat issues for speaker encoder	2020-09-11 17:17:07 +02:00

1 2

51 Commits