coqui-tts

Commit Graph

Author	SHA1	Message	Date
Aarni Koskela	38f6f8f0bb	Run `make style` & re-enable it in CI (#3127 )	2023-11-06 11:36:37 +01:00
David Garvey	a151d70242	Add stdout option (#3027 ) * add add cli options for play and speed --play argument uses simpleaudio to play the tts wav --speed <float 0.0-2.0> passes speed argument to Coqui Studio models * remove simpleaudio not referenced in file * fix simpleaudio dependency version * add ALSA headers for simpleaudio compilation * Dockerfile ALSA headers for simpleaudio * base changes to use stdout instead of play audio Considering conversion to pipe wav data for audio playback with ohter program like aplay. This is incomplete code. Using to get feedback before proceeding with implementation. * remove play for pipe_out arg that suppresses stdout removed play and simpleaudio dependency in place of pipe fuctionality to allow passing wav file data to a program dedicated to playing audio. * scipy.io.wavfile.write fails with /dev/null target * Streaming inference for XTTS 🚀 (#3035) * v0.17.7 * Redownload XTTS with the local and remote config do not match * Remove unused method * Print a message when it is already donwloaded * Try-except to present error when the user dont have connection * Fix style * 0.17.8 * v0.17.8 --------- Co-authored-by: Julian Weber <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com> Co-authored-by: Edresson Casanova <edresson1@gmail.com> Co-authored-by: ggoknar <ggoknar@coqui.ai>	2023-10-16 12:07:21 +02:00
Dusty Hagstrom	13cd076a7f	Synthesizer skips over embeddings file if model only has one speaker (#2587 ) * It looks like the Neon model is special in that t does not have a speaker_name and it wants to get the only item available. This was blocking a valid model with one speaker and a d_vector_file from being executed to get the embedding. * Update synthesizer.py oh my how embarrassing	2023-10-16 11:55:45 +02:00
Eren Gölge	623ea41634	Fix model tests (#2943 )	2023-09-14 15:21:48 +02:00
Eren Gölge	4033db5f4b	🔥 XTTS implementation	2023-09-13 17:51:24 +02:00
Jake Tae	409db505d2	Add device support in TTS and Synthesizer (#2855 ) * fix: resolve merge conflicts * fix: retain backwards compatability in functions * feature: utilize device for voice transfer * feature: use device for vocoder * chore: cleanup vocoder cpu logic * fix: add necessary vocoder output device check * fix: add necessary vocoder output device check * fix: indentation * fix: check if waveform is pt tensor before cpu conversion --------- Co-authored-by: Jake Tae <jaketae@Jakes-MacBook-Pro-2.local>	2023-08-14 21:04:44 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
Eren Gölge	c844b6570a	Inference API for 🐶Bark (#2685 ) * Add bark requirements * Draft Bark implementation * Download HF models * Update synthesizer * Add bark model * Make style * Update pylintrc * Update model URLs * Update Bark Config * Fix here and ther * Make style * Make lint * Update requirements * Update requirements	2023-06-28 11:55:27 +02:00
Eren Gölge	e785d101a1	Port Fairseq TTS models (#2628 ) * Load fairseq models * Add docs and missing files * Managing fairseq models and docs for API * Make style * Use scarf URL * Add tests * Fix URL * Pass cpu * Make lint * Fixup * Make lint * fixup * Fixup * Change tokenization order * Update README * Fixup * Fixup	2023-06-05 11:15:13 +02:00
Shukrullo Turgunov	0d5e68a09f	fix typo (#2647 ) * fix typo * typo fix	2023-06-05 09:58:16 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
prakharpbuf	c1875f68df	typos and minor fixes (#2508 ) * Update tacotron1-2.md * Update README.md * Update Tutorial_2_train_your_first_TTS_model.ipynb * Update synthesizer.py There is no arg called --speaker_name * Update formatting_your_dataset.md * Update AnalyzeDataset.ipynb * Update AnalyzeDataset.ipynb * Update AnalyzeDataset.ipynb * Update finetuning.md * Update train_yourtts.py * Update train_yourtts.py * Update train_yourtts.py * Update finetuning.md	2023-04-26 15:22:57 +02:00
Eren Gölge	ad8b9bf2be	🐸 Coqui Studio API integration (#2484 ) * Warn when lang is not avail * Make style * Implement Coqui Studio API * Test * Update docs * Set action * Make style * Make lint * Update README * Make style * Fix action * Run actions	2023-04-05 15:06:50 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Eren G??lge	85b3a04b37	Merge branch 'api_model_path' into dev	2023-02-06 11:18:00 +01:00
marius851000	1f4d8bf0f1	Fix tts-server for multi-lingual models (#2257 )	2023-02-06 10:54:34 +01:00
Eren G??lge	7fddabc8ac	Implement cloning in API	2023-01-30 13:35:48 +01:00
Eren Gölge	1ddc484b49	Python API implementation (#2195 ) * Draft implementation * Fix style * Add api tests * Fix lint * Update docs * Update tests * Set env * Fixup * Fixup * Fix lint * Revert	2022-12-12 12:04:20 +01:00
logan hart	ff9b63d02a	Add neon models (#2140 ) * Add neon ljspeech vits model * Add neon german model * Update .models.json * Add neon spanish model * Add french model * Add Dutch model * Add Hungarian model * Add Greek model * Remove uneeded description * Update .models.json * Update .models.json * Handling neon models * Add all neon models * Update .models.json * Split zoo_tests * Update test names * Update model testing Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-11-16 16:12:39 +01:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
Eren G??lge	3328be7a8e	Remove GL message	2022-06-21 12:39:31 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Eren Gölge	2fc38f67d2	Update SpeakerManager init in Synthesizer	2022-05-11 11:32:27 +02:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
WeberJulian	1b22f03e98	Fix G2P backend of the released models (#1461 ) * Fix enforce phonemizer * Add new models * Fix .model.json	2022-03-30 12:47:11 +02:00
WeberJulian	c66a6241fd	Enforce phonemizer definition for synthesis (#1441 ) * Enforce phonemizer definition for synthesis * Fix train_tts, tokenizer init can now edit config * Add small change to trigger CI pipeline * fix wrong output path for one tts_test * Fix style * Test config overides by args and tokenizer * Fix style	2022-03-25 23:15:33 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Eren Gölge	942df0fb05	Update vits dataset	2022-03-02 09:14:32 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	1445a46e9e	Update synthesizer to use iinit_from_config	2022-02-25 11:26:59 +01:00
Eren Gölge	2fe16de8e3	Make lint	2022-02-25 11:25:00 +01:00
Eren Gölge	c9972e6f14	Make lint	2022-02-25 11:07:34 +01:00
Eren Gölge	9bb347a52b	Update for tokenizer API	2022-02-25 11:05:06 +01:00
Eren Gölge	84091096a6	Refactor Synthesizer class for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	1df1d6c4a9	Update for tokenizer API	2022-02-25 10:48:03 +01:00
Eren Gölge	3476be30d7	Refactor Synthesizer class for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	acc6eef625	Update for tokenizer API	2022-02-25 10:48:02 +01:00
Eren Gölge	3d86edfc81	Refactor Synthesizer class for TTSTokenizer	2022-02-25 09:32:54 +01:00
Eren Gölge	fc09e319d4	Prioritize the given encoder path over config	2022-01-03 14:24:19 +00:00
Eren Gölge	7fad969a1f	Fix if else statement	2022-01-03 14:16:11 +00:00
Eren Gölge	8fd1ee1926	Print urls when BadZipError	2022-01-01 15:26:35 +00:00
Eren Gölge	61874bc0a0	Fix your_tts inference from the listed models	2021-12-31 13:45:05 +00:00
Eren Gölge	5c5ddd2ba7	Init speaker manager for speaker encoder	2021-12-22 15:51:53 +00:00
Eren Gölge	56378b12f7	Fix speaker encoder init	2021-12-21 12:26:25 +00:00
Eren Gölge	c9c1fa0548	Fix multi-speaker init in Synthesizer	2021-12-21 09:44:07 +00:00

1 2 3

105 Commits