Commit Graph

928 Commits

Author SHA1 Message Date
Edresson Casanova 4d3f23b5d3
Add CML-TTS dataset YourTTS training recipe (#2934) 2023-09-12 11:49:14 +02:00
Aleś Bułojčyk fead04f779
Add phonemizer for Belarusian language (#2856) 2023-08-28 11:20:45 +02:00
Eren Gölge a7a96d08dd
Fix loading Bark (#2893)
* Fixup hubert path

* Make style
2023-08-26 11:59:00 +02:00
Jake Tae 409db505d2
Add device support in TTS and Synthesizer (#2855)
* fix: resolve merge conflicts

* fix: retain backwards compatability in functions

* feature: utilize device for voice transfer

* feature: use device for vocoder

* chore: cleanup vocoder cpu logic

* fix: add necessary vocoder output device check

* fix: add necessary vocoder output device check

* fix: indentation

* fix: check if waveform is pt tensor before cpu conversion

---------

Co-authored-by: Jake Tae <jaketae@Jakes-MacBook-Pro-2.local>
2023-08-14 21:04:44 +02:00
Eren Gölge 3a104d5c49
Update Studio API for XTTS (#2861)
* Update Studio API for XTTS

* Update the docs

* Update README.md

* Update README.md

Update README
2023-08-13 12:04:12 +02:00
Eren G??lge 37b558ccb9 Make style 2023-08-11 12:55:23 +02:00
Eren G??lge 9a8352b8da Fix import error with Bark 2023-08-11 03:33:59 +02:00
Eren Gölge 4186f42b21
Handle missing JA phonemizer (#2843)
* Handle missing JA phonemizer

* Make style
2023-08-07 13:19:38 +02:00
Javier 4e7f8cd021
Add fairseq onnx support and strict configuration, fixes some onnx errors (#2831) 2023-08-04 11:02:59 +02:00
Eren Gölge 69f080eb47
Fix DelightfulTTS (#2823)
* Fix tests

* Make style
2023-07-31 13:52:45 +02:00
Eren Gölge 483888b9d8
Add kwargs to ignore extra arguments w/o error (#2822) 2023-07-31 11:37:35 +02:00
Aleś Bułojčyk d124f78430
Recipe for Belarusian TTS (#2756)
* Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com>

* Recipe for Belarusian TTS

---------

Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>
2023-07-31 10:26:21 +02:00
Javier c140df5a58
Adds multi-language support for VITS onnx, fixes onnx inference error when speaker_id is None or not passed, fixes onnx exporting for models with init_discriminator=false (#2816) 2023-07-31 10:19:49 +02:00
Eren Gölge 8aacb81849
Fix Tortoise load (#2791)
* Remove key prunning in tortoise

* Make lint
2023-07-24 13:42:47 +02:00
logan hart 6fdb88f8e2
Add Delightful-TTS implementation (#2095)
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
Eren Gölge 0de12ec5aa
API tests (#2790)
* Separate API tests and only run when uplifted

* Make style
2023-07-24 12:14:21 +02:00
Paul O'Leary McCann c0aabb8596
Make Japanese-specific dependencies optional (#2776)
* Don't install MeCab by default

* Add optional [ja] deps, like [dev] etc

* Add JA requirements file

* Add JA requirements to requirements_all

This should help the tests run.
2023-07-24 11:28:27 +02:00
Eren Gölge 672ec3b35e
Fix #2749 (#2750) 2023-07-08 11:40:44 +02:00
Eren Gölge a2984fb435
Fix #2745 (#2748) 2023-07-07 20:23:27 +02:00
Eren Gölge 7b5c8422c8
Export multispeaker onnx (#2743) 2023-07-06 13:36:50 +02:00
ZhouGongZaiShi d5f16d77c2
delete meaningless print() (#2662) 2023-07-04 11:38:17 +02:00
Eren G??lge cb9c320691 Fixup 2023-06-30 14:13:11 +02:00
Eren G??lge 91cc11d636 Remove commented codes 2023-06-28 12:14:37 +02:00
Eren G??lge 6b9ebf5aab Merge branch 'p3_11' into dev 2023-06-28 12:13:04 +02:00
Eren Gölge c844b6570a
Inference API for 🐶Bark (#2685)
* Add bark requirements

* Draft Bark implementation

* Download HF models

* Update synthesizer

* Add bark model

* Make style

* Update pylintrc

* Update model URLs

* Update Bark Config

* Fix here and ther

* Make style

* Make lint

* Update requirements

* Update requirements
2023-06-28 11:55:27 +02:00
Eren G??lge a13b1352a4 Fixup 2023-06-26 19:30:26 +02:00
Eren G??lge 17ac188958 Drop fairseq for Hubert 2023-06-26 19:27:48 +02:00
Eren G??lge c03768bb53 Make style 2023-06-26 17:16:26 +02:00
Eren G??lge a1c431e6a9 Fixups 2023-06-26 12:55:18 +02:00
Eren Gölge fff8b762bc
Merge branch 'dev' into bark 2023-06-21 15:49:05 +02:00
Eren Gölge 4cf8652392
Fix Tortoise load (#2697)
* Handle missing gpt weights

* Make style

* Fix lint
2023-06-21 15:42:01 +02:00
Eren G??lge cf98ae04df Make lint 2023-06-21 12:05:08 +02:00
Eren G??lge 3b9fca2398 Make style 2023-06-21 12:02:06 +02:00
Eren G??lge 0f8932a6a9 Fix here and ther 2023-06-21 11:59:27 +02:00
Eren G??lge 03c347b7f3 Update Bark Config 2023-06-21 11:58:18 +02:00
Eren G??lge f4c88ed677 Make style 2023-06-19 14:22:32 +02:00
Eren G??lge 37b708dac7 Add bark model 2023-06-19 14:16:06 +02:00
Eren G??lge f59da4dba5 Draft Bark implementation 2023-06-12 14:32:39 +02:00
Tsai Meng-Ting d65819422b
Update stochastic_duration_predictor.py (#2663)
fix a typo
2023-06-12 11:10:54 +02:00
Eren Gölge e785d101a1
Port Fairseq TTS models (#2628)
* Load fairseq models

* Add docs and missing files

* Managing fairseq models and docs for API

* Make style

* Use scarf URL

* Add tests

* Fix URL

* Pass cpu

* Make lint

* Fixup

* Make lint

* fixup

* Fixup

* Change tokenization order

* Update README

* Fixup

* Fixup
2023-06-05 11:15:13 +02:00
Eren Gölge 9e99e0f42d Disable reduction 2023-05-18 11:12:51 +02:00
Eren Gölge 4de797bb11
Draft ONNX export for VITS (#2563)
* Draft ONNX export for VITS

Could not get it work to output variable length sequence

* Fixup for onnx constant output

* Make style

* Remove commented code
2023-05-16 01:07:56 +02:00
manmay nakhashi a3d5801c44
Tortoise TTS inference (#2547)
* initial commit

* Tortoise inference

* revert path change

* style fix

* remove accidental remove

* style fixes

* style fixes

* removed unwanted assests and deps

* remove changes

* remove cvvp

* style fix black

* added tortoise config and updated config and args, refactoring the code

* added tortoise to api

* Pull mel_norm from url

* Use TTS cleaners

* Let download model files

* add ability to pass tortoise presets through coqui api

* fix tests

* fix style and tests

* fix tts commandline for tortoise

* Add config.json to tortoise

* Use kwargs

* Use regular model api for loading tortoise

* Add load from dir to synthesizer

* Fix Tortoise floats

* Use model_dir when there are multiple urls

* Use `synthesize` when exists

* lint fixes and resolve preset bug

* resolve a download bug and update model link

* fix json

* do tortoise inference from voice dir

* fix

* fix test

* fix speaker id and remove assests

* update inference_tests.yml

* replace inference_test.yml

* fix extra dir as None

* fix tests

* remove space

* Reformat docstring

* Add docs

* Update docs

* lint fixes

---------

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-05-16 00:58:21 +02:00
Michael Görner 27e237ed08
use default_factory for audio parameter (#2576)
Python 3.11 complains about the mutable default and other members
were already adapted to use the factory, so I expect this line just
went unnoticed until now.
2023-05-08 11:17:36 +02:00
Eren Gölge 1a6a5710fd Make lint 2023-04-17 15:02:56 +02:00
Eren Gölge 2533a18d62 Add BN tests 2023-04-17 13:37:10 +02:00
Eren Gölge 2d49c05259 Remove import 2023-04-17 13:05:29 +02:00
Eren Gölge cd83991067 Add BN phonemizer 2023-04-17 12:54:00 +02:00
Matthew Boakes 4c829e74a1 Update Librosa Version To V0.10.0 2023-04-05 00:59:20 +01:00
Yingzhi WANG 95fa2c9fd6
fix typo (#2475) 2023-04-03 23:31:09 +02:00
p0p 91cf1b2da9
[minor] batch["speaker_ids"] getting set two times (#2470)
* [minor] batch["speaker_ids"] getting set two times

just to make it consistent with language_ids

* Update vits.py

style.
2023-04-03 11:35:21 +02:00
Eren Gölge d309f50e53
Implement FreeVC (#2451)
* Update .gitignore

* Draft FreeVC implementation

* Tests and relevant updates

* Update API tests

* Add missings

* Update requirements

* :(

* Lazy handle for vc

* Update docs for voice conversion

* Make style
2023-03-25 18:33:23 +01:00
Khalid Bashir 14c80dd1fd
vits.py training fixed due to return_complex (#2418)
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
    return model.train_step(*input_args)
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
    mel_slice_hat = wav_to_mel(
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
    spec = torch.stft(
  File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
2023-03-19 00:22:04 +01:00
Roee Shenberg 3c15f0619a
Bug fixes in OverFlow audio generation (#2380) 2023-03-15 12:02:11 +01:00
Daniel Vera Nieto dfb48737fb Style fixed 2023-03-13 16:11:15 +01:00
Dani Vera 0d12229b64
Update vits.py
This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.
2023-03-10 18:35:16 +01:00
manmay nakhashi 624513018d
add energy by default to Fastspeech2 config (#2326)
* add energy by default

* added energy to base tts

* fix energy dataset

* fix styles

* fix test
2023-03-06 10:20:25 +01:00
thennal10 d39bc74f57
OverFlow with test sentences (#2253)
* Fix typo in function definiton

* Swap hasattr out

hasattr(self, "speaker_manager")  and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
2023-03-01 09:11:30 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
Martin Weinelt 994be163e1
Use packaging.version for version comparisons (#2310)
* Use packaging.version for version comparisons

The distutils package is deprecated¹ and relies on PEP 386² version
comparisons, which have been superseded by PEP 440³ which is implemented
through the packaging module.

With more recent distutils versions, provided through setuptools
vendoring, we are seeing the following exception during version
comparisons:

> TypeError: '<' not supported between instances of 'str' and 'int'

This is fixed by this migration.

[1] https://docs.python.org/3/library/distutils.html
[2] https://peps.python.org/pep-0386/
[3] https://peps.python.org/pep-0440/

* Improve espeak version detection robustness

On many modern systems espeak is just a symlink to espeak-ng. In that
case looking for the 3rd word in the version output will break the
version comparison, when it finds `text-to-speech:`, instead of a proper
version.

This will not break during runtime, where espeak-ng would be
prioritized, but the phonemizer and tokenizer tests force the backend
to `espeak`, which exhibits this breakage.

This improves the version detection by simply looking for the version
after the "text-to-speech:" token.

* Replace distuils.copy_tree with shutil.copytree

The distutils module is deprecated and slated for removal in Python
3.12. Its usage should be replaced, in this case by a compatible method
from shutil.
2023-01-29 23:47:00 +01:00
Gerard Sant Muniesa c59b3f75b8
Add Catalan text cleaners for Catalan support (#2295) 2023-01-23 11:56:30 +01:00
Shivam Mehta d83ee8fe45
Adding neural HMM TTS Model (#2272)
* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* Update the Trainer requirement version for a compatible one (#2276)

* Bump up to v0.10.2

* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* fixing documentation

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-01-23 11:53:04 +01:00
Eren Gölge 497f22b20b
Cache speaker encoder model (#2284) 2023-01-23 11:49:51 +01:00
Eren G??lge 6e3f74fc29 Fix #2191 2023-01-15 23:11:57 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Eren Gölge a9167cf239
Fixup overflow (#2218)
* Update overflow config

* Pulling shuffle and drop_last  from config

* Print training stats for overflow
2022-12-15 00:56:48 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
p0p4k 2e153d54a8
Adding missing key to formatter (#2194)
quick fix for #2156.
 added 'root_path' key.
2022-12-12 12:25:37 +01:00
Eren Gölge fdeefcc612
Handle espeak 1.48.15 (#2203) 2022-12-12 11:23:45 +01:00
Edresson Casanova ee20e30958 Fix VITS multi-speaker voice conversion inference 2022-12-05 09:15:01 -03:00
Eren Gölge 9321b22203
Fix scheduler order 2022-12-05 12:26:15 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
freezerain fcbfca869f
Fix back/forward slash in file path in mailabs formatter (#1938)
* mailabs formatter: back/forward slash in file path fix

* formatters.mailabs() path rework for Windows os

* new formatter added "mailabs_win"

* lint test fix commit

* mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility

* Black small style fix
2022-11-01 12:54:40 +01:00
Victor Shepardson 5307a2229b
Fix Capacitron training (#2086) 2022-11-01 12:52:06 +01:00
Eren Gölge dae79b0acd
Remove `/` prefix from the relative path (#2065) 2022-10-10 13:32:27 +02:00
Eren Gölge 843fa6f3fa
Check num of columns in coqui format (#2066)
* Check 4 colums in coqui format

* Fix encoding

* Fixup
2022-10-10 12:13:32 +02:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Edresson Casanova d6ad9a05b4
Fix colliding dataset cache file names (#1994)
* Fix colliding dataset cache file names

* Remove unused code
2022-09-21 12:54:07 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Julian Weber 896e46d0e5
Fix vc (#1971) 2022-09-16 12:01:26 +02:00
Eren Gölge b95cf3363c
Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
happylittlecat 4546b4cbd8
Add espeak support for Chinese (#1905)
* fix description

* add espeak support for chinese

* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman 5abbe56642
Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Eren Gölge fcb0bb58ae
Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge 4333492341
Fix BCE loss issue (#1872)
* Fix BCE loss issue

* Remove import
2022-08-15 11:27:21 +02:00
manmay nakhashi e4db7c51b5
Update capacitron_layers.py (#1664)
crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
Eren Gölge d46fbc240c
Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi 7fd9b89ebf
fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia 75ac9e3f0c
Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00
Lars Kiesow 8c645080ac
Adjust default to be able to process longer sentences (#1835)
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.

This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].

This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.

[1] https://github.com/mozilla/TTS/issues/734
2022-08-07 13:51:29 +02:00
p0p4k 903a77c197
Update wavenet.py (#1796)
* Update wavenet.py

Current version does not use "in_channels" argument. 
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. 
However, since it is a generic implementation, I believe it is better to update it for a more general use.

* "in_channels -> hidden_channels"
2022-08-01 12:20:37 +02:00
Eren G??lge 7d8b1665c8 Fix rand_segment edge case (input_len == seg_len - 1) 2022-08-01 11:37:45 +02:00
p0p4k 10195c4eba
Update decoder.py (#1792)
Minor comment correction.
2022-07-26 13:06:06 +02:00
ivan provalov 903d9c791a
Fix for FloorDiv Function Warning (#1760)
* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py
2022-07-20 11:31:22 +02:00
Eren Gölge f7587fc134
Fix SSIM loss correction 2022-07-13 10:47:12 +02:00