* add add cli options for play and speed
--play argument uses simpleaudio to play the tts wav
--speed <float 0.0-2.0> passes speed argument to Coqui Studio models
* remove simpleaudio not referenced in file
* fix simpleaudio dependency version
* add ALSA headers for simpleaudio compilation
* Dockerfile ALSA headers for simpleaudio
* base changes to use stdout instead of play audio
Considering conversion to pipe wav data for audio playback with ohter program
like aplay.
This is incomplete code. Using to get feedback before proceeding with
implementation.
* remove play for pipe_out arg that suppresses stdout
removed play and simpleaudio dependency in place of pipe
fuctionality to allow passing wav file data to a program
dedicated to playing audio.
* scipy.io.wavfile.write fails with /dev/null target
* Streaming inference for XTTS 🚀 (#3035)
* v0.17.7
* Redownload XTTS with the local and remote config do not match
* Remove unused method
* Print a message when it is already donwloaded
* Try-except to present error when the user dont have connection
* Fix style
* 0.17.8
* v0.17.8
---------
Co-authored-by: Julian Weber <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: ggoknar <ggoknar@coqui.ai>
* initial commit
* Tortoise inference
* revert path change
* style fix
* remove accidental remove
* style fixes
* style fixes
* removed unwanted assests and deps
* remove changes
* remove cvvp
* style fix black
* added tortoise config and updated config and args, refactoring the code
* added tortoise to api
* Pull mel_norm from url
* Use TTS cleaners
* Let download model files
* add ability to pass tortoise presets through coqui api
* fix tests
* fix style and tests
* fix tts commandline for tortoise
* Add config.json to tortoise
* Use kwargs
* Use regular model api for loading tortoise
* Add load from dir to synthesizer
* Fix Tortoise floats
* Use model_dir when there are multiple urls
* Use `synthesize` when exists
* lint fixes and resolve preset bug
* resolve a download bug and update model link
* fix json
* do tortoise inference from voice dir
* fix
* fix test
* fix speaker id and remove assests
* update inference_tests.yml
* replace inference_test.yml
* fix extra dir as None
* fix tests
* remove space
* Reformat docstring
* Add docs
* Update docs
* lint fixes
---------
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Use packaging.version for version comparisons
The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.
With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:
> TypeError: '<' not supported between instances of 'str' and 'int'
This is fixed by this migration.
[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/
* Improve espeak version detection robustness
On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.
This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.
This improves the version detection by simply looking for the version
after the "text-to-speech:" token.
* Replace distuils.copy_tree with shutil.copytree
The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
* Add YourTTS VCTK recipe
* Fix lint
* Add compute_embeddings and resample_files functions to be able to reuse it
* Add automatic download and speaker embedding computation for YourTTS VCTK recipe
* Add parameter for eval metadata file on compute embeddings function
* Cache fsspec downloaded files
* Use diff paths for test
* Make fsspec caching optional
* Decom GPU docker tests
* Make progress bar optional for better CI log
* Check path local
* Set the right device to the speaker encoder
* Bug fix on inference list_language_idxs parameter
* Bug fix on speaker encoder resample audio transform
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
* fix imports in tune_wavegrad
* load_config returns Coqpit object instead None
* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program
* fix var order in the result of batch collating
* make style
* make style with black and isort
* Use fsspec and torch for embedding file
* Fixup
* Fix load and save files
* Fix compute embedding script
* Set use_cuda to true if available
* Add dummy speakers.pth file
* Make style
* Change default speakers file extension
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
* new CI config
* initial Capacitron implementation
* delete old unused file
* fix empty formatting changes
* update losses and training script
* fix previous commit
* fix commit
* Add Capacitron test and first round of test fixes
* revert formatter change
* add changes to the synthesizer
* add stepwise gradual lr scheduler and changes to the recipe
* add inference script for dev use
* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting
* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model
* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style
* chore: fix linting
* feat: add Tacotron 2 support
* leftover from dev
* chore:rename parser args
* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers
* chore: revert arbitrary trainer changes
* fmt: revert formatting bug
* formatting again
* formatting fixed
* fix: log func
* fix: update optimizer
- Implemented load_state_dict for continuing training
* fix: clean optimizer init for standard models
* improvement: purge espeak flags and add training scripts
* Delete capacitronT2.py
delete old training script, new one is pushed
* feat: capacitron trainer methods
- extracted capacitron specific training operations from the trainer into custom
methods in taco1 and taco2 models
* chore: renaming and merging capacitron and gst style args
* fix: bug fixes from the previous commit
* fix: implement state_dict method on CapacitronOptimizer
* fix: call method
* fix: inference naming
* Delete train_capacitron.py
* fix: synthesize
* feat: update tests
* chore: fix style
* Delete capacitron_inference.py
* fix: fix train tts t2 capacitron tests
* fix: double forward in T2 train step
* fix: double forward in T1 train step
* fix: run make style
* fix: remove unused import
* fix: test for T1 capacitron
* fix: make lint
* feat: add blizzard2013 recipes
* make style
* fix: update recipes
* chore: make style
* Plot test sentences in Tacotron
* chore: make style and fix import
* fix: call forward first before problematic floordiv op
* fix: update recipes
* feat: add min_audio_len to recipes
* aux_input["style_mel"]
* chore: make style
* Make capacitron T2 recipe more stable
* Remove T1 capacitron Ljspeech
* feat: implement new grad clipping routine and update configs
* make style
* Add pretrained checkpoints
* Add default vocoder
* Change trainer package
* Fix grad clip issue for tacotron
* Fix scheduler issue with tacotron
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Enforce phonemizer definition for synthesis
* Fix train_tts, tokenizer init can now edit config
* Add small change to trigger CI pipeline
* fix wrong output path for one tts_test
* Fix style
* Test config overides by args and tokenizer
* Fix style
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Fix the bug in find unique chars script
* Add OpenBible formatter
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference