Commit Graph

105 Commits

Author SHA1 Message Date
Jindrich Matousek a60b423f76 Merge remote-tracking branch 'upstream/main' 2023-04-13 13:21:06 +02:00
Eren Gölge ad8b9bf2be
🐸 Coqui Studio API integration (#2484)
* Warn when lang is not avail

* Make style

* Implement Coqui Studio API

* Test

* Update docs

* Set action

* Make style

* Make lint

* Update README

* Make style

* Fix action

* Run actions
2023-04-05 15:06:50 +02:00
Eren Gölge d309f50e53
Implement FreeVC (#2451)
* Update .gitignore

* Draft FreeVC implementation

* Tests and relevant updates

* Update API tests

* Add missings

* Update requirements

* :(

* Lazy handle for vc

* Update docs for voice conversion

* Make style
2023-03-25 18:33:23 +01:00
Jindrich Matousek cbdec704dc
Merge branch 'coqui-ai:main' into main 2023-02-20 08:32:10 +01:00
Jindrich Matousek 7761e5cee0
Merge branch 'coqui-ai:main' into main 2023-02-08 15:17:28 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
Eren G??lge 85b3a04b37 Merge branch 'api_model_path' into dev 2023-02-06 11:18:00 +01:00
marius851000 1f4d8bf0f1
Fix tts-server for multi-lingual models (#2257) 2023-02-06 10:54:34 +01:00
Eren G??lge 7fddabc8ac Implement cloning in API 2023-01-30 13:35:48 +01:00
Jindrich Matousek f278da4fc9
Merge branch 'coqui-ai:main' into main 2022-12-28 14:12:58 +01:00
Eren Gölge 1ddc484b49
Python API implementation (#2195)
* Draft implementation

* Fix style

* Add api tests

* Fix lint

* Update docs

* Update tests

* Set env

* Fixup

* Fixup

* Fix lint

* Revert
2022-12-12 12:04:20 +01:00
Jindrich Matousek 5c0d71c746 Merge remote-tracking branch 'upstream/main' 2022-12-03 17:29:38 +01:00
logan hart ff9b63d02a
Add neon models (#2140)
* Add neon ljspeech vits model

* Add neon german model

* Update .models.json

* Add neon spanish model

* Add french model

* Add Dutch model

* Add Hungarian model

* Add Greek model

* Remove uneeded description

* Update .models.json

* Update .models.json

* Handling neon models

* Add all neon models

* Update .models.json

* Split zoo_tests

* Update test names

* Update model testing

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-11-16 16:12:39 +01:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Jindrich Matousek 97de55595f Merge remote-tracking branch 'upstream/main' 2022-08-24 12:21:18 +02:00
Eren Gölge 946afa8197
v0.8.0 (#1810)
* Fix checkpointing GAN models (#1641)

* checkpoint sae step crash fix

* checkpoint save step crash fix

* Update gan.py

updated requested changes

* crash fix

* Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)

Co-authored-by: Eren Gölge <erogol@hotmail.com>

* Fix Publish CI (#1597)

* Try out manylinux

* temporary removal of useless pipeline

* remove check and use only manylinux

* Try --plat-name

* Add install requirements

* Add back other actions

* Add PR trigger

* Remove conditions

* Fix sythax

* Roll back some changes

* Add other python versions

* Add test pypi upload

* Add username

* Add back __token__ as username

* Modify name of entry to testpypi

* Set it to release only

* Fix version checking

* Fix tokenizer for punc only (#1717)

* Remove redundant config field

* Fix SSIM loss

* Separate loss tests

* Fix BCELoss adressing  #1192

* Make style

* Add durations as aux input for VITS (#1694)

* Add durations as aux input for VITS

* Make style

* Fix tts_tests

* Fix test_get_aux_input

* Make lint

* feat: updated recipes and lr fix (#1718)

- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging

* Implement VitsAudioConfig (#1556)

* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style

* Fix device allocation

* Fix SSIM loss correction

* Fix aux tests (#1753)

* Set n_jobs to 1 for resample script

* Delete resample test

* Set n_jobs 1 in vad test

* delete vad test

* Revert "Delete resample test"

This reverts commit bb7c8466af.

* Remove tests with resample

* Fix for FloorDiv Function Warning (#1760)

* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py

* Fix type in download_vctk.sh (#1739)

typo in comment

* Update decoder.py (#1792)

Minor comment correction.

* Update requirements.txt (#1791)

Support for #1775

* Update README.md (#1776)

Fix typo in different and code sample

* Fix & update WaveRNN vocoder model (#1749)

* Fixes KeyError bug. Adding logging to dashboard.

* Make pep8 compliant

* Make style compliant

* Still fixing style

* Fix rand_segment edge case (input_len == seg_len - 1)

* Update requirements.txt; inflect==5.6 (#1809)

New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 
Force inflect==5.6 (pydantic free) install to solve dependency issue.

* Update README.md; download progress bar in CLI. (#1797)

* Update README.md

- minor PR
- added model_info usage guide based on #1623 in README.md .

* "added tqdm bar for model download"

* Update manage.py

* fixed style

* fixed style

* sort imports

* Update wavenet.py (#1796)

* Update wavenet.py

Current version does not use "in_channels" argument. 
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. 
However, since it is a generic implementation, I believe it is better to update it for a more general use.

* "in_channels -> hidden_channels"

* Adjust default to be able to process longer sentences (#1835)

Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.

This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].

This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.

[1] https://github.com/mozilla/TTS/issues/734

* Fix language flags generated by espeak-ng phonemizer (#1801)

* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike

* fix get_random_embeddings --> get_random_embedding (#1726)

* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding

* Introduce numpy and torch transforms (#1705)

* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2

* Implement bucketed weighted sampling for VITS (#1871)

* Update capacitron_layers.py (#1664)

crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)

* updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853)

* Fix BCE loss issue (#1872)

* Fix BCE loss issue

* Remove import

* Remove deprecated files (#1873)

- samplers.py is moved
- distribute.py is replaces by the 👟Trainer

* Handle when no batch sampler (#1882)

* Fix tune wavegrad (#1844)

* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort

* Bump up to v0.8.0

* Add new DE Thorsten models (#1898)

- Tacotron2-DDC
- HifiGAN vocoder

Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com>
Co-authored-by: camillem <camillem@users.noreply.github.com>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: a-froghyar <adamfroghyar@gmail.com>
Co-authored-by: ivan provalov <iprovalo@yahoo.com>
Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com>
Co-authored-by: p0p4k <rajiv.punmiya@gmail.com>
Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com>
Co-authored-by: vanIvan <alfa1211@gmail.com>
Co-authored-by: Lars Kiesow <lkiesow@uos.de>
Co-authored-by: rbaraglia <baraglia.r@live.fr>
Co-authored-by: jchai.me <jreus@users.noreply.github.com>
Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>
2022-08-22 14:54:38 +02:00
Eren Gölge 49bac724c0
Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style
2022-07-12 18:49:58 +02:00
Jindrich Matousek 2f81e8701e Merge branch 'main' of https://github.com/coqui-ai/TTS into coqui-ai-main 2022-06-24 16:06:48 +02:00
Eren Gölge 829e2c24f9
v0.7.1 (#1676)
* Add Thorsten VITS model (#1675)

Co-authored-by: Eren Gölge <egolge@coqui.ai>

* Remove GL message

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-21 14:11:39 +02:00
Eren G??lge 3328be7a8e Remove GL message 2022-06-21 12:39:31 +02:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Eren Gölge 2fc38f67d2 Update SpeakerManager init in Synthesizer 2022-05-11 11:32:27 +02:00
jmaty 1aa05feeb4 merge local changes and official repo update 0.6.2 2022-05-05 14:22:45 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Jindrich Matousek 51d7ad161c Better WA for glottal stop: now works also for multiple sentences in a single input text 2022-04-06 14:37:51 +02:00
Jindrich Matousek aae77dac07 WA: when [!] is not at the end of a sentence, it is used as a glottal stop in the phonetic input and sentences are NOT delimited by [!] 2022-04-06 10:06:06 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
WeberJulian 1b22f03e98
Fix G2P backend of the released models (#1461)
* Fix enforce phonemizer

* Add new models

* Fix .model.json
2022-03-30 12:47:11 +02:00
WeberJulian c66a6241fd
Enforce phonemizer definition for synthesis (#1441)
* Enforce phonemizer definition for synthesis

* Fix train_tts, tokenizer init can now edit config

* Add small change to trigger CI pipeline

* fix wrong output path for one tts_test

* Fix style

* Test config overides by args and tokenizer

* Fix style
2022-03-25 23:15:33 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
Edresson Casanova dbe9da7f15
Add Voice conversion inference support (#1337)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Eren Gölge 942df0fb05 Update vits dataset 2022-03-02 09:14:32 +01:00
Eren Gölge 1f0c8179da Make style 2022-02-25 11:26:59 +01:00
Eren Gölge 1445a46e9e Update synthesizer to use iinit_from_config 2022-02-25 11:26:59 +01:00
Eren Gölge 2fe16de8e3 Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge c9972e6f14 Make lint 2022-02-25 11:07:34 +01:00
Eren Gölge 9bb347a52b Update for tokenizer API 2022-02-25 11:05:06 +01:00
Eren Gölge 84091096a6 Refactor Synthesizer class for TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge 1df1d6c4a9 Update for tokenizer API 2022-02-25 10:48:03 +01:00
Eren Gölge 3476be30d7 Refactor Synthesizer class for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge acc6eef625 Update for tokenizer API 2022-02-25 10:48:02 +01:00
Eren Gölge 3d86edfc81 Refactor Synthesizer class for TTSTokenizer 2022-02-25 09:32:54 +01:00
Eren Gölge fc09e319d4 Prioritize the given encoder path over config 2022-01-03 14:24:19 +00:00
Eren Gölge 7fad969a1f Fix if else statement 2022-01-03 14:16:11 +00:00
Eren Gölge 8fd1ee1926 Print urls when BadZipError 2022-01-01 15:26:35 +00:00
Eren Gölge 61874bc0a0 Fix your_tts inference from the listed models 2021-12-31 13:45:05 +00:00
Eren Gölge 5c5ddd2ba7 Init speaker manager for speaker encoder 2021-12-22 15:51:53 +00:00
Eren Gölge 56378b12f7 Fix speaker encoder init 2021-12-21 12:26:25 +00:00
Eren Gölge c9c1fa0548 Fix multi-speaker init in Synthesizer 2021-12-21 09:44:07 +00:00