coqui-tts

Commit Graph

Author	SHA1	Message	Date
Jindrich Matousek	a60b423f76	Merge remote-tracking branch 'upstream/main'	2023-04-13 13:21:06 +02:00
Eren Gölge	ad8b9bf2be	🐸 Coqui Studio API integration (#2484 ) * Warn when lang is not avail * Make style * Implement Coqui Studio API * Test * Update docs * Set action * Make style * Make lint * Update README * Make style * Fix action * Run actions	2023-04-05 15:06:50 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Jindrich Matousek	cbdec704dc	Merge branch 'coqui-ai:main' into main	2023-02-20 08:32:10 +01:00
Jindrich Matousek	7761e5cee0	Merge branch 'coqui-ai:main' into main	2023-02-08 15:17:28 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Eren G??lge	85b3a04b37	Merge branch 'api_model_path' into dev	2023-02-06 11:18:00 +01:00
marius851000	1f4d8bf0f1	Fix tts-server for multi-lingual models (#2257 )	2023-02-06 10:54:34 +01:00
Eren G??lge	7fddabc8ac	Implement cloning in API	2023-01-30 13:35:48 +01:00
Jindrich Matousek	f278da4fc9	Merge branch 'coqui-ai:main' into main	2022-12-28 14:12:58 +01:00
Eren Gölge	1ddc484b49	Python API implementation (#2195 ) * Draft implementation * Fix style * Add api tests * Fix lint * Update docs * Update tests * Set env * Fixup * Fixup * Fix lint * Revert	2022-12-12 12:04:20 +01:00
Jindrich Matousek	5c0d71c746	Merge remote-tracking branch 'upstream/main'	2022-12-03 17:29:38 +01:00
logan hart	ff9b63d02a	Add neon models (#2140 ) * Add neon ljspeech vits model * Add neon german model * Update .models.json * Add neon spanish model * Add french model * Add Dutch model * Add Hungarian model * Add Greek model * Remove uneeded description * Update .models.json * Update .models.json * Handling neon models * Add all neon models * Update .models.json * Split zoo_tests * Update test names * Update model testing Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-11-16 16:12:39 +01:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Jindrich Matousek	97de55595f	Merge remote-tracking branch 'upstream/main'	2022-08-24 12:21:18 +02:00
Eren Gölge	946afa8197	v0.8.0 (#1810 ) * Fix checkpointing GAN models (#1641) * checkpoint sae step crash fix * checkpoint save step crash fix * Update gan.py updated requested changes * crash fix * Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469) Co-authored-by: Eren Gölge <erogol@hotmail.com> * Fix Publish CI (#1597) * Try out manylinux * temporary removal of useless pipeline * remove check and use only manylinux * Try --plat-name * Add install requirements * Add back other actions * Add PR trigger * Remove conditions * Fix sythax * Roll back some changes * Add other python versions * Add test pypi upload * Add username * Add back __token__ as username * Modify name of entry to testpypi * Set it to release only * Fix version checking * Fix tokenizer for punc only (#1717) * Remove redundant config field * Fix SSIM loss * Separate loss tests * Fix BCELoss adressing #1192 * Make style * Add durations as aux input for VITS (#1694) * Add durations as aux input for VITS * Make style * Fix tts_tests * Fix test_get_aux_input * Make lint * feat: updated recipes and lr fix (#1718) - updated the recipes activating more losses for more stable training - re-enabling guided attention loss - fixed a bug about not the correct lr fetched for logging * Implement VitsAudioConfig (#1556) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style * Fix device allocation * Fix SSIM loss correction * Fix aux tests (#1753) * Set n_jobs to 1 for resample script * Delete resample test * Set n_jobs 1 in vad test * delete vad test * Revert "Delete resample test" This reverts commit `bb7c8466af`. * Remove tests with resample * Fix for FloorDiv Function Warning (#1760) * Fix for Floor Function Warning Fix for Floor Function Warning * Adding double quotes to fix formatting Adding double quotes to fix formatting * Update glow_tts.py * Update glow_tts.py * Fix type in download_vctk.sh (#1739) typo in comment * Update decoder.py (#1792) Minor comment correction. * Update requirements.txt (#1791) Support for #1775 * Update README.md (#1776) Fix typo in different and code sample * Fix & update WaveRNN vocoder model (#1749) * Fixes KeyError bug. Adding logging to dashboard. * Make pep8 compliant * Make style compliant * Still fixing style * Fix rand_segment edge case (input_len == seg_len - 1) * Update requirements.txt; inflect==5.6 (#1809) New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 Force inflect==5.6 (pydantic free) install to solve dependency issue. * Update README.md; download progress bar in CLI. (#1797) * Update README.md - minor PR - added model_info usage guide based on #1623 in README.md . * "added tqdm bar for model download" * Update manage.py * fixed style * fixed style * sort imports * Update wavenet.py (#1796) * Update wavenet.py Current version does not use "in_channels" argument. In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. However, since it is a generic implementation, I believe it is better to update it for a more general use. * "in_channels -> hidden_channels" * Adjust default to be able to process longer sentences (#1835) Running `tts --text "$text" --out_path …` with a somewhat longer sentences in the text will lead to warnings like “Decoder stopped with max_decoder_steps 500” and the sentences just being cut off in the resulting WAV file. This happens quite frequently when feeding longer texts (e.g. a blog post) to `tts`. It's particular frustrating since the error is not always obvious in the output. You have to notice that there are missing parts. This is something other users seem to have run into as well [1]. This patch simply increases the maximum number of steps allowed for the tacotron decoder to fix this issue, resulting in a smoother default behavior. [1] https://github.com/mozilla/TTS/issues/734 * Fix language flags generated by espeak-ng phonemizer (#1801) * fix language flags generated by espeak-ng phonemizer * Style * Updated language flag regex to consider all language codes alike * fix get_random_embeddings --> get_random_embedding (#1726) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding * Introduce numpy and torch transforms (#1705) * Refactor audio processing functions * Add tests for numpy transforms * Fix imports * Fix imports2 * Implement bucketed weighted sampling for VITS (#1871) * Update capacitron_layers.py (#1664) crashing because of dimension miss match at line no. 57 [batch, 256] vs [batch , 1, 512] enc_out = torch.cat([enc_out, speaker_embedding], dim=-1) * updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853) * Fix BCE loss issue (#1872) * Fix BCE loss issue * Remove import * Remove deprecated files (#1873) - samplers.py is moved - distribute.py is replaces by the 👟Trainer * Handle when no batch sampler (#1882) * Fix tune wavegrad (#1844) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort * Bump up to v0.8.0 * Add new DE Thorsten models (#1898) - Tacotron2-DDC - HifiGAN vocoder Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com> Co-authored-by: camillem <camillem@users.noreply.github.com> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: a-froghyar <adamfroghyar@gmail.com> Co-authored-by: ivan provalov <iprovalo@yahoo.com> Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com> Co-authored-by: p0p4k <rajiv.punmiya@gmail.com> Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com> Co-authored-by: vanIvan <alfa1211@gmail.com> Co-authored-by: Lars Kiesow <lkiesow@uos.de> Co-authored-by: rbaraglia <baraglia.r@live.fr> Co-authored-by: jchai.me <jreus@users.noreply.github.com> Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>	2022-08-22 14:54:38 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
Jindrich Matousek	2f81e8701e	Merge branch 'main' of https://github.com/coqui-ai/TTS into coqui-ai-main	2022-06-24 16:06:48 +02:00
Eren Gölge	829e2c24f9	v0.7.1 (#1676 ) * Add Thorsten VITS model (#1675) Co-authored-by: Eren Gölge <egolge@coqui.ai> * Remove GL message Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-21 14:11:39 +02:00
Eren G??lge	3328be7a8e	Remove GL message	2022-06-21 12:39:31 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Eren Gölge	2fc38f67d2	Update SpeakerManager init in Synthesizer	2022-05-11 11:32:27 +02:00
jmaty	1aa05feeb4	merge local changes and official repo update 0.6.2	2022-05-05 14:22:45 +02:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Jindrich Matousek	51d7ad161c	Better WA for glottal stop: now works also for multiple sentences in a single input text	2022-04-06 14:37:51 +02:00
Jindrich Matousek	aae77dac07	WA: when [!] is not at the end of a sentence, it is used as a glottal stop in the phonetic input and sentences are NOT delimited by [!]	2022-04-06 10:06:06 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
WeberJulian	1b22f03e98	Fix G2P backend of the released models (#1461 ) * Fix enforce phonemizer * Add new models * Fix .model.json	2022-03-30 12:47:11 +02:00
WeberJulian	c66a6241fd	Enforce phonemizer definition for synthesis (#1441 ) * Enforce phonemizer definition for synthesis * Fix train_tts, tokenizer init can now edit config * Add small change to trigger CI pipeline * fix wrong output path for one tts_test * Fix style * Test config overides by args and tokenizer * Fix style	2022-03-25 23:15:33 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Eren Gölge	942df0fb05	Update vits dataset	2022-03-02 09:14:32 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	1445a46e9e	Update synthesizer to use iinit_from_config	2022-02-25 11:26:59 +01:00
Eren Gölge	2fe16de8e3	Make lint	2022-02-25 11:25:00 +01:00
Eren Gölge	c9972e6f14	Make lint	2022-02-25 11:07:34 +01:00
Eren Gölge	9bb347a52b	Update for tokenizer API	2022-02-25 11:05:06 +01:00
Eren Gölge	84091096a6	Refactor Synthesizer class for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	1df1d6c4a9	Update for tokenizer API	2022-02-25 10:48:03 +01:00
Eren Gölge	3476be30d7	Refactor Synthesizer class for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	acc6eef625	Update for tokenizer API	2022-02-25 10:48:02 +01:00
Eren Gölge	3d86edfc81	Refactor Synthesizer class for TTSTokenizer	2022-02-25 09:32:54 +01:00
Eren Gölge	fc09e319d4	Prioritize the given encoder path over config	2022-01-03 14:24:19 +00:00
Eren Gölge	7fad969a1f	Fix if else statement	2022-01-03 14:16:11 +00:00
Eren Gölge	8fd1ee1926	Print urls when BadZipError	2022-01-01 15:26:35 +00:00
Eren Gölge	61874bc0a0	Fix your_tts inference from the listed models	2021-12-31 13:45:05 +00:00
Eren Gölge	5c5ddd2ba7	Init speaker manager for speaker encoder	2021-12-22 15:51:53 +00:00
Eren Gölge	56378b12f7	Fix speaker encoder init	2021-12-21 12:26:25 +00:00
Eren Gölge	c9c1fa0548	Fix multi-speaker init in Synthesizer	2021-12-21 09:44:07 +00:00

1 2 3

105 Commits