Commit Graph

414 Commits

Author SHA1 Message Date
Enno Hermann 4a2684be34
fix(bin.synthesize): more informative error for wrong --language argument (#3294)
In multilingual models, the target language is specified via the
`--language_idx` argument. However, the `tts` CLI also accepts a `--language`
argument for use with Coqui Studio, so it is easy to choose the wrong one,
resulting in the following confusing error at synthesis time:

```
AssertionError:   Language None is not supported. Supported languages are
['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar',
'zh-cn', 'hu', 'ko', 'ja']
```

This commit adds a better error message when `--language` is passed for a
non-studio model.

Fixes #3270, fixes #3291
2023-11-24 12:24:42 +01:00
Enno Hermann 0fb0d67de7 refactor: use save_checkpoint()/save_best_model() from Trainer 2023-11-17 01:18:23 +01:00
Enno Hermann 96678c7ba2 refactor: use copy_model_files() from Trainer 2023-11-17 01:18:23 +01:00
Enno Hermann 3c2d5a9e03
Remove duplicate AudioProcessor code and fix ExtractTTSpectrogram.ipynb (#3230)
* chore: remove unused argument

* refactor(audio.processor): remove duplicate stft+griffin_lim

* chore(audio.processor): remove unused compute_stft_paddings

Same function available in numpy_transforms

* refactor(audio.processor): remove duplicate db_to_amp

* refactor(audio.processor): remove duplicate amp_to_db

* refactor(audio.processor): remove duplicate linear_to_mel

* refactor(audio.processor): remove duplicate mel_to_linear

* refactor(audio.processor): remove duplicate build_mel_basis

* refactor(audio.processor): remove duplicate stft_parameters

* refactor(audio.processor): use pre-/deemphasis from numpy_transforms

* refactor(audio.processor): use rms_volume_norm from numpy_transforms

* chore(audio.processor): remove duplicate assert

Already checked in numpy_transforms.compute_f0

* refactor(audio.processor): use find_endpoint from numpy_transforms

* refactor(audio.processor): use trim_silence from numpy_transforms

* refactor(audio.processor): use volume_norm from numpy_transforms

* refactor(audio.processor): use load_wav from numpy_transforms

* fix(bin.extract_tts_spectrograms): set quantization bits

* fix(ExtractTTSpectrogram.ipynb): adapt to current TTS code

Fixes #2447, #2574

* refactor(audio.processor): remove duplicate quantization methods
2023-11-16 10:57:06 +01:00
Eren Gölge a24ebcd8a6
Fix coqui api (#3168) 2023-11-08 10:51:23 +01:00
Aarni Koskela 38f6f8f0bb
Run `make style` & re-enable it in CI (#3127) 2023-11-06 11:36:37 +01:00
David Garvey a151d70242
Add stdout option (#3027)
* add add cli options for play and speed
--play argument uses simpleaudio to play the tts wav
--speed <float 0.0-2.0> passes speed argument to Coqui Studio models

* remove simpleaudio not referenced in file

* fix simpleaudio dependency version

* add ALSA headers for simpleaudio compilation

* Dockerfile ALSA headers for simpleaudio

* base changes to use stdout instead of play audio
Considering conversion to pipe wav data for audio playback with ohter program
like aplay.

This is incomplete code. Using to get feedback before proceeding with
implementation.

* remove play for pipe_out arg that suppresses stdout
removed play and simpleaudio dependency in place of pipe
fuctionality to allow passing wav file data to a program
dedicated to playing audio.

* scipy.io.wavfile.write fails with /dev/null target

* Streaming inference for XTTS 🚀 (#3035)

* v0.17.7

* Redownload XTTS with the local and remote config do not match

* Remove unused method

* Print a message when it is already donwloaded

* Try-except to present error when the user dont have connection

* Fix style

* 0.17.8

* v0.17.8

---------

Co-authored-by: Julian Weber <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: ggoknar <ggoknar@coqui.ai>
2023-10-16 12:07:21 +02:00
Aarni Koskela 0a82f063cc Late-import main TTS libraries in `tts` CLI 2023-09-26 15:38:56 +03:00
Aarni Koskela 5c047cf304 Ensure `tts` CLI tool readme and usage help is in sync 2023-09-26 15:38:56 +03:00
Eren Gölge 4033db5f4b 🔥 XTTS implementation 2023-09-13 17:51:24 +02:00
Jake Tae b79b6f0762
feature: add device flag to tts cli (#2875) 2023-08-28 11:20:12 +02:00
Eren Gölge 3a104d5c49
Update Studio API for XTTS (#2861)
* Update Studio API for XTTS

* Update the docs

* Update README.md

* Update README.md

Update README
2023-08-13 12:04:12 +02:00
logan hart 6fdb88f8e2
Add Delightful-TTS implementation (#2095)
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
PiaoYang 630327c4e6
Update compute_embeddings.py (#2668)
* [Typo] Fix variable name. More readable description.

Update train_yourtts.py

Reformat.

Reformat using black again.

* Add `old_append`. Fix bool argparse.

* Reformat.
2023-07-04 11:37:47 +02:00
Eren Gölge c844b6570a
Inference API for 🐶Bark (#2685)
* Add bark requirements

* Draft Bark implementation

* Download HF models

* Update synthesizer

* Add bark model

* Make style

* Update pylintrc

* Update model URLs

* Update Bark Config

* Fix here and ther

* Make style

* Make lint

* Update requirements

* Update requirements
2023-06-28 11:55:27 +02:00
Eren Gölge 8e415732dd Fixup 2023-06-06 09:41:46 +02:00
Eren Gölge 547a72c97d Fixup 2023-06-05 22:38:56 +02:00
Eren Gölge 50b1074779 Make `tts` ready 2023-06-05 11:29:10 +02:00
manmay nakhashi a3d5801c44
Tortoise TTS inference (#2547)
* initial commit

* Tortoise inference

* revert path change

* style fix

* remove accidental remove

* style fixes

* style fixes

* removed unwanted assests and deps

* remove changes

* remove cvvp

* style fix black

* added tortoise config and updated config and args, refactoring the code

* added tortoise to api

* Pull mel_norm from url

* Use TTS cleaners

* Let download model files

* add ability to pass tortoise presets through coqui api

* fix tests

* fix style and tests

* fix tts commandline for tortoise

* Add config.json to tortoise

* Use kwargs

* Use regular model api for loading tortoise

* Add load from dir to synthesizer

* Fix Tortoise floats

* Use model_dir when there are multiple urls

* Use `synthesize` when exists

* lint fixes and resolve preset bug

* resolve a download bug and update model link

* fix json

* do tortoise inference from voice dir

* fix

* fix test

* fix speaker id and remove assests

* update inference_tests.yml

* replace inference_test.yml

* fix extra dir as None

* fix tests

* remove space

* Reformat docstring

* Add docs

* Update docs

* lint fixes

---------

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-05-16 00:58:21 +02:00
Eren Gölge 9b5822d625
Update VAD for silence trimming. (#2604)
* Update vad for mp3 and fault tolerance

* Make style

* Remove importt

* Remove stupid defaults
2023-05-11 11:09:23 +02:00
Eren Gölge dba5cec497
Merge pull request #2509 from coqui-ai/update_vad
Update VAD
2023-04-13 19:35:17 +02:00
Eren Gölge 5a9bda13f3 Make style 2023-04-13 14:19:06 +02:00
Eren Gölge c9375e4b8b Make style 2023-04-13 14:17:06 +02:00
Eren Gölge 758ef84cc2 Using 🐸Studio models with `tts` command 2023-04-13 14:14:41 +02:00
Eren G??lge 537dc0e933 Update VAD 2023-04-13 00:39:46 +02:00
Eren Gölge d309f50e53
Implement FreeVC (#2451)
* Update .gitignore

* Draft FreeVC implementation

* Tests and relevant updates

* Update API tests

* Add missings

* Update requirements

* :(

* Lazy handle for vc

* Update docs for voice conversion

* Make style
2023-03-25 18:33:23 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
Martin Weinelt 994be163e1
Use packaging.version for version comparisons (#2310)
* Use packaging.version for version comparisons

The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.

With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:

> TypeError: '<' not supported between instances of 'str' and 'int'

This is fixed by this migration.

[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/

* Improve espeak version detection robustness

On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.

This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.

This improves the version detection by simply looking for the version
after the "text-to-speech:" token.

* Replace distuils.copy_tree with shutil.copytree

The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
2023-01-29 23:47:00 +01:00
Edresson Casanova 3b1a28fa95
Add YourTTS VCTK recipe (#2198)
* Add YourTTS VCTK recipe

* Fix lint

* Add compute_embeddings and resample_files functions to be able to reuse it

* Add automatic download and speaker embedding computation for YourTTS VCTK recipe

* Add parameter for eval metadata file on compute embeddings function
2022-12-12 16:14:25 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Eren Gölge 5f5d441ee5
Write non-speech files in a TXT (#2048)
* Write non-speech files in a txt

* Save 16-bit wav out of vad
2022-10-06 13:25:54 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Eren Gölge 0a112f7841
Add metafile arg (#1977) 2022-09-16 14:41:49 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova 159eeeef64
Fix find unique phonemes script (#1928)
* Fix find unique phonemes script

* Fix unit tests
2022-09-08 10:17:35 +02:00
Stanislav Kachnov 2c9f00a808
Fix tune wavegrad (#1844)
* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort
2022-08-22 09:55:32 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
camillem 5c821d9fa1
Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-06-27 10:32:43 +02:00
p0p4k 71281ff1e4
Add support for model_info in CLI (#1623)
* model_info

* model_info

* model_info_by_idx and name

* model_info_by_idx and name

* model_info

* Update manage.py

* fixed linter

* fixed linter

* fixed linter

* fixed linter

* fixed return style checks

* fixed linter

* fixed linter

* fixed idx always positive

* added comments

* fix parser.args check

* fix parser.args check

* Make style

Co-authored-by: Eren G??lge <egolge@coqui.ai>
2022-06-20 23:28:17 +02:00
Eren Gölge f70e82cd19
Use fsspec and torch for embedding file IO (#1581)
* Use fsspec and torch for embedding file

* Fixup

* Fix load and save files

* Fix compute embedding script

* Set use_cuda to true if available

* Add dummy speakers.pth file

* Make style

* Change default speakers file extension

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-01 13:49:42 +02:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova 6233f4fcd7 Bug fix in compute embedding without eval partition 2022-04-26 13:58:03 -03:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
WeberJulian c66a6241fd
Enforce phonemizer definition for synthesis (#1441)
* Enforce phonemizer definition for synthesis

* Fix train_tts, tokenizer init can now edit config

* Add small change to trigger CI pipeline

* fix wrong output path for one tts_test

* Fix style

* Test config overides by args and tokenizer

* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova 3435bc8fca Fix style tests 2022-03-23 15:05:32 -03:00
Edresson Casanova ea53d6feb3 Replace webrtcvad by silero-vad 2022-03-23 14:39:31 -03:00
Eren Gölge 72d85e53c9
Update model file extension (#1422)
* Update model file ext to ```.pth```

* Update docs

* Rename more

* Find model files
2022-03-22 17:55:00 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00