Commit Graph

1738 Commits

Author SHA1 Message Date
Eren G??lge 6ee94f8bad Fixup 2023-01-30 14:02:25 +01:00
Eren G??lge 713e8c8d04 Add pretrained model 2023-01-30 13:55:17 +01:00
Eren G??lge 7fddabc8ac Implement cloning in API 2023-01-30 13:35:48 +01:00
Eren G??lge 335b8ed44e Add vocoder path 2023-01-30 12:59:29 +01:00
Martin Weinelt 994be163e1
Use packaging.version for version comparisons (#2310)
* Use packaging.version for version comparisons

The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.

With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:

> TypeError: '<' not supported between instances of 'str' and 'int'

This is fixed by this migration.

[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/

* Improve espeak version detection robustness

On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.

This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.

This improves the version detection by simply looking for the version
after the "text-to-speech:" token.

* Replace distuils.copy_tree with shutil.copytree

The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
2023-01-29 23:47:00 +01:00
Eren G??lge cf076345e7 Make style 2023-01-23 13:49:51 +01:00
Eren G??lge 13334d507c Load model from path 2023-01-23 13:45:45 +01:00
Gerard Sant Muniesa c59b3f75b8
Add Catalan text cleaners for Catalan support (#2295) 2023-01-23 11:56:30 +01:00
Shivam Mehta d83ee8fe45
Adding neural HMM TTS Model (#2272)
* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* Update the Trainer requirement version for a compatible one (#2276)

* Bump up to v0.10.2

* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* fixing documentation

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-01-23 11:53:04 +01:00
Eren Gölge 497f22b20b
Cache speaker encoder model (#2284) 2023-01-23 11:49:51 +01:00
Eren G??lge 6e3f74fc29 Fix #2191 2023-01-15 23:11:57 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Eren Gölge 14d45b5347
Bump up to v0.10.2 2023-01-11 01:06:02 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Jindrich Matousek f278da4fc9
Merge branch 'coqui-ai:main' into main 2022-12-28 14:12:58 +01:00
Eren Gölge a31af762e8
v0.10.1 (#2242)
* Add Ukrainian LADA (female) voice

* Add ca and fa models

* Add pth files to manager

* Bump up to v0.10.1

Co-authored-by: Yehor Smoliakov <yehors@ukr.net>
2022-12-26 15:46:21 +01:00
Eren G??lge f814d52394 Bump up to v0.10.1 2022-12-26 14:29:46 +01:00
Eren G??lge 8c32a6998a Add pth files to manager 2022-12-26 14:29:25 +01:00
Eren G??lge cf765cb3f2 Add ca and fa models 2022-12-26 14:29:10 +01:00
Eren G??lge 46b0ad37e7 Bump up to v0.10.0 2022-12-15 11:19:23 +01:00
Eren Gölge a9167cf239
Fixup overflow (#2218)
* Update overflow config

* Pulling shuffle and drop_last  from config

* Print training stats for overflow
2022-12-15 00:56:48 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
Edresson Casanova 3b1a28fa95
Add YourTTS VCTK recipe (#2198)
* Add YourTTS VCTK recipe

* Fix lint

* Add compute_embeddings and resample_files functions to be able to reuse it

* Add automatic download and speaker embedding computation for YourTTS VCTK recipe

* Add parameter for eval metadata file on compute embeddings function
2022-12-12 16:14:25 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
p0p4k 2e153d54a8
Adding missing key to formatter (#2194)
quick fix for #2156.
 added 'root_path' key.
2022-12-12 12:25:37 +01:00
Eren Gölge 1ddc484b49
Python API implementation (#2195)
* Draft implementation

* Fix style

* Add api tests

* Fix lint

* Update docs

* Update tests

* Set env

* Fixup

* Fixup

* Fix lint

* Revert
2022-12-12 12:04:20 +01:00
Eren Gölge fdeefcc612
Handle espeak 1.48.15 (#2203) 2022-12-12 11:23:45 +01:00
Edresson Casanova ee20e30958 Fix VITS multi-speaker voice conversion inference 2022-12-05 09:15:01 -03:00
Eren Gölge 9321b22203
Fix scheduler order 2022-12-05 12:26:15 +01:00
Jindrich Matousek 5c0d71c746 Merge remote-tracking branch 'upstream/main' 2022-12-03 17:29:38 +01:00
Eren G??lge bc6120c330 [ci skip]Bump up to v0.9.0 2022-11-16 16:45:02 +01:00
logan hart ff9b63d02a
Add neon models (#2140)
* Add neon ljspeech vits model

* Add neon german model

* Update .models.json

* Add neon spanish model

* Add french model

* Add Dutch model

* Add Hungarian model

* Add Greek model

* Remove uneeded description

* Update .models.json

* Update .models.json

* Handling neon models

* Add all neon models

* Update .models.json

* Split zoo_tests

* Update test names

* Update model testing

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-11-16 16:12:39 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
Eren G??lge b686c09704 Fix #2062 2022-11-07 09:22:43 +01:00
freezerain fcbfca869f
Fix back/forward slash in file path in mailabs formatter (#1938)
* mailabs formatter: back/forward slash in file path fix

* formatters.mailabs() path rework for Windows os

* new formatter added "mailabs_win"

* lint test fix commit

* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility

* Black small style fix
2022-11-01 12:54:40 +01:00
Victor Shepardson 5307a2229b
Fix Capacitron training (#2086) 2022-11-01 12:52:06 +01:00
Eren Gölge dae79b0acd
Remove `/` prefix from the relative path (#2065) 2022-10-10 13:32:27 +02:00
Eren Gölge 843fa6f3fa
Check num of columns in coqui format (#2066)
* Check 4 colums in coqui format

* Fix encoding

* Fixup
2022-10-10 12:13:32 +02:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Eren Gölge 5f5d441ee5
Write non-speech files in a TXT (#2048)
* Write non-speech files in a txt

* Save 16-bit wav out of vad
2022-10-06 13:25:54 +02:00
Edresson Casanova d6ad9a05b4
Fix colliding dataset cache file names (#1994)
* Fix colliding dataset cache file names

* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Eren Gölge 0a112f7841
Add metafile arg (#1977) 2022-09-16 14:41:49 +02:00
Julian Weber 896e46d0e5
Fix vc (#1971) 2022-09-16 12:01:26 +02:00
Eren Gölge b95cf3363c
Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Jindrich Matousek 8cfbe23d9e Parse speaker name in artic dataset to extract language and append language item
Add comments
2022-09-13 17:32:25 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova 371772c355
Replace pyworld by pyin (#1946)
* Replace pyworld by pyin

* Fix unit tests
2022-09-09 10:43:14 +02:00
happylittlecat 4546b4cbd8
Add espeak support for Chinese (#1905)
* fix description

* add espeak support for chinese

* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman 5abbe56642
Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Edresson Casanova 159eeeef64
Fix find unique phonemes script (#1928)
* Fix find unique phonemes script

* Fix unit tests
2022-09-08 10:17:35 +02:00
KyuubiYoru 3b7dff568a
Fixes a race condition with multiple simultaneous get requests. (#1807)
* Fixes a race condition with multiple simultaneous get requests.

* Removed unused import

* Removed unused threading import

* Changed lock style to notation

* make style

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-09-08 10:16:16 +02:00
Julian Weber bb59718c03
Add capacitron v2 model (#1768)
* Add capacitron v2 in .models.json

* Put right commit hash
2022-09-08 09:43:56 +02:00
Edresson Casanova 096b35f639
Add VCTK speaker encoder recipe (#1912) 2022-08-26 16:19:03 +02:00
Jindrich Matousek ec4501d31c Make artic formatter compatible with changes made to other formatters (root_path is a part of items) 2022-08-26 15:36:01 +02:00
Jindrich Matousek 97de55595f Merge remote-tracking branch 'upstream/main' 2022-08-24 12:21:18 +02:00
Eren Gölge 946afa8197
v0.8.0 (#1810)
* Fix checkpointing GAN models (#1641)

* checkpoint sae step crash fix

* checkpoint save step crash fix

* Update gan.py

updated requested changes

* crash fix

* Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)

Co-authored-by: Eren Gölge <erogol@hotmail.com>

* Fix Publish CI (#1597)

* Try out manylinux

* temporary removal of useless pipeline

* remove check and use only manylinux

* Try --plat-name

* Add install requirements

* Add back other actions

* Add PR trigger

* Remove conditions

* Fix sythax

* Roll back some changes

* Add other python versions

* Add test pypi upload

* Add username

* Add back __token__ as username

* Modify name of entry to testpypi

* Set it to release only

* Fix version checking

* Fix tokenizer for punc only (#1717)

* Remove redundant config field

* Fix SSIM loss

* Separate loss tests

* Fix BCELoss adressing  #1192

* Make style

* Add durations as aux input for VITS (#1694)

* Add durations as aux input for VITS

* Make style

* Fix tts_tests

* Fix test_get_aux_input

* Make lint

* feat: updated recipes and lr fix (#1718)

- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging

* Implement VitsAudioConfig (#1556)

* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style

* Fix device allocation

* Fix SSIM loss correction

* Fix aux tests (#1753)

* Set n_jobs to 1 for resample script

* Delete resample test

* Set n_jobs 1 in vad test

* delete vad test

* Revert "Delete resample test"

This reverts commit bb7c8466af.

* Remove tests with resample

* Fix for FloorDiv Function Warning (#1760)

* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py

* Fix type in download_vctk.sh (#1739)

typo in comment

* Update decoder.py (#1792)

Minor comment correction.

* Update requirements.txt (#1791)

Support for #1775

* Update README.md (#1776)

Fix typo in different and code sample

* Fix & update WaveRNN vocoder model (#1749)

* Fixes KeyError bug. Adding logging to dashboard.

* Make pep8 compliant

* Make style compliant

* Still fixing style

* Fix rand_segment edge case (input_len == seg_len - 1)

* Update requirements.txt; inflect==5.6 (#1809)

New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 
Force inflect==5.6 (pydantic free) install to solve dependency issue.

* Update README.md; download progress bar in CLI. (#1797)

* Update README.md

- minor PR
- added model_info usage guide based on #1623 in README.md .

* "added tqdm bar for model download"

* Update manage.py

* fixed style

* fixed style

* sort imports

* Update wavenet.py (#1796)

* Update wavenet.py

Current version does not use "in_channels" argument. 
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. 
However, since it is a generic implementation, I believe it is better to update it for a more general use.

* "in_channels -> hidden_channels"

* Adjust default to be able to process longer sentences (#1835)

Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.

This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].

This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.

[1] https://github.com/mozilla/TTS/issues/734

* Fix language flags generated by espeak-ng phonemizer (#1801)

* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike

* fix get_random_embeddings --> get_random_embedding (#1726)

* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding

* Introduce numpy and torch transforms (#1705)

* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2

* Implement bucketed weighted sampling for VITS (#1871)

* Update capacitron_layers.py (#1664)

crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)

* updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853)

* Fix BCE loss issue (#1872)

* Fix BCE loss issue

* Remove import

* Remove deprecated files (#1873)

- samplers.py is moved
- distribute.py is replaces by the 👟Trainer

* Handle when no batch sampler (#1882)

* Fix tune wavegrad (#1844)

* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort

* Bump up to v0.8.0

* Add new DE Thorsten models (#1898)

- Tacotron2-DDC
- HifiGAN vocoder

Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com>
Co-authored-by: camillem <camillem@users.noreply.github.com>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: a-froghyar <adamfroghyar@gmail.com>
Co-authored-by: ivan provalov <iprovalo@yahoo.com>
Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com>
Co-authored-by: p0p4k <rajiv.punmiya@gmail.com>
Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com>
Co-authored-by: vanIvan <alfa1211@gmail.com>
Co-authored-by: Lars Kiesow <lkiesow@uos.de>
Co-authored-by: rbaraglia <baraglia.r@live.fr>
Co-authored-by: jchai.me <jreus@users.noreply.github.com>
Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>
2022-08-22 14:54:38 +02:00
Eren Gölge e5430a6519
Add new DE Thorsten models (#1898)
- Tacotron2-DDC
- HifiGAN vocoder
2022-08-22 11:27:39 +02:00
Eren G??lge 8845f06fd9 Bump up to v0.8.0 2022-08-22 11:26:47 +02:00
Stanislav Kachnov 2c9f00a808
Fix tune wavegrad (#1844)
* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort
2022-08-22 09:55:32 +02:00
Eren Gölge fcb0bb58ae
Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge 7442bcefa5
Remove deprecated files (#1873)
- samplers.py is moved
- distribute.py is replaces by the 👟Trainer
2022-08-15 12:16:37 +02:00
Eren Gölge 4333492341
Fix BCE loss issue (#1872)
* Fix BCE loss issue

* Remove import
2022-08-15 11:27:21 +02:00
manmay nakhashi e4db7c51b5
Update capacitron_layers.py (#1664)
crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
Jindrich Matousek af2aee5ba9 Fix train_log name 2022-08-09 11:00:06 +02:00
Eren Gölge d46fbc240c
Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi 7fd9b89ebf
fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia 75ac9e3f0c
Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00
Lars Kiesow 8c645080ac
Adjust default to be able to process longer sentences (#1835)
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.

This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].

This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.

[1] https://github.com/mozilla/TTS/issues/734
2022-08-07 13:51:29 +02:00
p0p4k 903a77c197
Update wavenet.py (#1796)
* Update wavenet.py

Current version does not use "in_channels" argument. 
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. 
However, since it is a generic implementation, I believe it is better to update it for a more general use.

* "in_channels -> hidden_channels"
2022-08-01 12:20:37 +02:00
p0p4k 4fe50801b5
Update README.md; download progress bar in CLI. (#1797)
* Update README.md

- minor PR
- added model_info usage guide based on #1623 in README.md .

* "added tqdm bar for model download"

* Update manage.py

* fixed style

* fixed style

* sort imports
2022-08-01 12:17:47 +02:00
Eren G??lge 7d8b1665c8 Fix rand_segment edge case (input_len == seg_len - 1) 2022-08-01 11:37:45 +02:00
vanIvan 5094499eba
Fix & update WaveRNN vocoder model (#1749)
* Fixes KeyError bug. Adding logging to dashboard.

* Make pep8 compliant

* Make style compliant

* Still fixing style
2022-07-26 15:05:11 +02:00
p0p4k 10195c4eba
Update decoder.py (#1792)
Minor comment correction.
2022-07-26 13:06:06 +02:00
Jindrich Matousek 61508bf336 Fix artic_multispeaker formatter 2022-07-20 21:12:16 +02:00
ivan provalov 903d9c791a
Fix for FloorDiv Function Warning (#1760)
* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py
2022-07-20 11:31:22 +02:00
Eren Gölge f7587fc134
Fix SSIM loss correction 2022-07-13 10:47:12 +02:00
Eren Gölge bc1f93c299
Fix device allocation 2022-07-12 19:05:25 +02:00
Eren Gölge 49bac724c0
Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style
2022-07-12 18:49:58 +02:00
a-froghyar 34b80e0280
feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
2022-07-12 15:00:53 +02:00
Eren G??lge 48a4f3647f Make lint 2022-07-12 14:58:26 +02:00
WeberJulian c614f21982
Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS

* Make style

* Fix tts_tests

* Fix test_get_aux_input
2022-07-12 14:25:21 +02:00
Eren G??lge 2cf89b88c9 Make style 2022-07-12 14:12:57 +02:00
Eren G??lge a6f73a18cb Fix BCELoss adressing #1192 2022-07-12 14:11:34 +02:00
Eren G??lge c17ff17a18 Fix SSIM loss 2022-07-12 12:35:24 +02:00
Eren G??lge f1e35596e8 Remove redundant config field 2022-07-11 13:39:41 +02:00
Jindrich Matousek a7d2e9b475 Support ignored speakers in artic multi-speaker formatter 2022-07-10 22:31:41 +02:00
Jindrich Matousek 1896db7e2c Add formatter for artic multispeaker dataset 2022-07-10 22:08:11 +02:00
Jindrich Matousek 8e758ca8fe Set speaker name to the directory name containing speaker's data 2022-07-10 15:24:17 +02:00
Jindrich Matousek 3270dda162 Refactor artic formatter 2022-07-10 11:37:40 +02:00
Jindrich Matousek 9758971baa Add artic formatter 2022-07-10 11:27:02 +02:00
WeberJulian 5cef6facb0
Fix tokenizer for punc only (#1717) 2022-07-06 22:59:41 +02:00
Jindrich Matousek d214ac1405 fix outputs[0] coming as None
proposed by manmay-nakhashi in https://github.com/coqui-ai/TTS/pull/1641
2022-06-28 15:22:04 +02:00
camillem 5c821d9fa1
Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-06-27 10:32:43 +02:00
Jindrich Matousek 2f81e8701e Merge branch 'main' of https://github.com/coqui-ai/TTS into coqui-ai-main 2022-06-24 16:06:48 +02:00
manmay nakhashi 577ec406f4
Fix checkpointing GAN models (#1641)
* checkpoint sae step crash fix

* checkpoint save step crash fix

* Update gan.py

updated requested changes

* crash fix
2022-06-22 12:07:46 +02:00
Eren G??lge 00e67092d8 Bump up to v0.7.1 2022-06-21 14:12:55 +02:00
Eren G??lge 3328be7a8e Remove GL message 2022-06-21 12:39:31 +02:00