From c10f9a3699182a91d4c01afd65143ea39de44382 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 13 Mar 2023 12:42:20 +0100 Subject: [PATCH] Update docs (#2389) * Update docs index * Add MarryTTS docs * Update docs index * Add Overflow docs --- docs/source/index.md | 3 ++- docs/source/marytts.md | 0 docs/source/models/overflow.md | 36 ++++++++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 docs/source/marytts.md create mode 100644 docs/source/models/overflow.md diff --git a/docs/source/index.md b/docs/source/index.md index 3f27ffb8..51735928 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -28,6 +28,7 @@ formatting_your_dataset what_makes_a_good_dataset tts_datasets + marytts .. toctree:: :maxdepth: 2 @@ -48,10 +49,10 @@ models/vits.md models/forward_tts.md models/tacotron1-2.md + models/overflow.md .. toctree:: :maxdepth: 2 :caption: `vocoder` Models ``` - diff --git a/docs/source/marytts.md b/docs/source/marytts.md new file mode 100644 index 00000000..e69de29b diff --git a/docs/source/models/overflow.md b/docs/source/models/overflow.md new file mode 100644 index 00000000..09e270ea --- /dev/null +++ b/docs/source/models/overflow.md @@ -0,0 +1,36 @@ +# Overflow TTS + +Neural HMMs are a type of neural transducer recently proposed for +sequence-to-sequence modelling in text-to-speech. They combine the best features +of classic statistical speech synthesis and modern neural TTS, requiring less +data and fewer training updates, and are less prone to gibberish output caused +by neural attention failures. In this paper, we combine neural HMM TTS with +normalising flows for describing the highly non-Gaussian distribution of speech +acoustics. The result is a powerful, fully probabilistic model of durations and +acoustics that can be trained using exact maximum likelihood. Compared to +dominant flow-based acoustic models, our approach integrates autoregression for +improved modelling of long-range dependences such as utterance-level prosody. +Experiments show that a system based on our proposal gives more accurate +pronunciations and better subjective speech quality than comparable methods, +whilst retaining the original advantages of neural HMMs. Audio examples and code +are available at https://shivammehta25.github.io/OverFlow/. + + +## Important resources & papers +- HMM: https://de.wikipedia.org/wiki/Hidden_Markov_Model +- OverflowTTS paper: https://arxiv.org/abs/2211.06892 +- Neural HMM: https://arxiv.org/abs/2108.13320 +- Audio Samples: https://shivammehta25.github.io/OverFlow/ + + +## OverflowConfig +```{eval-rst} +.. autoclass:: TTS.tts.configs.overflow_config.OverflowConfig + :members: +``` + +## Overflow Model +```{eval-rst} +.. autoclass:: TTS.tts.models.overflow.Overflow + :members: +``` \ No newline at end of file