Add bark docs

2023-06-29 17:52:42 +02:00 · 2023-06-29 17:52:42 +02:00 · 797eab2dd0
parent a035b25340
commit 797eab2dd0
2 changed files with 104 additions and 0 deletions
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -52,6 +52,7 @@
    models/tacotron1-2.md
    models/overflow.md
    models/tortoise.md
+    models/bark.md

 .. toctree::
    :maxdepth: 2
--- a/docs/source/models/bark.md
+++ b/docs/source/models/bark.md
@ -0,0 +1,103 @@
+# Bark 🐶
+
+Bark is a multi-lingual TTS model created by [Suno-AI](https://www.suno.ai/). It can generate conversational speech as well as  music and sound effects.
+It is architecturally very similar to Google's [AudioLM](https://arxiv.org/abs/2209.03143). For more information, please refer to the [Suno-AI's repo](https://github.com/suno-ai/bark).
+
+
+## Acknowledgements
+- 👑[Suno-AI](https://www.suno.ai/) for training and open-sourcing this model.
+- 👑[serp-ai](https://github.com/serp-ai/bark-with-voice-clone) for controlled voice cloning.
+
+
+## Example Use
+
+```python
+text = "Hello, my name is Manmay , how are you?"
+
+from TTS.tts.configs.bark_config import BarkConfig
+from TTS.tts.models.bark import Bark
+
+config = BarkConfig()
+model = Bark.init_from_config(config)
+model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True)
+
+# with random speaker
+output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None)
+
+# cloning a speaker.
+# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz`
+output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/")
+```
+
+Using 🐸TTS API:
+
+```python
+from TTS.api import TTS
+
+# Load the model to GPU
+# Bark is really slow on CPU, so we recommend using GPU.
+tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True)
+
+
+# Cloning a new speaker
+# This expects to find a mp3 or wav file like `bark_voices/new_speaker/speaker.wav`
+# It computes the cloning values and stores in `bark_voices/new_speaker/speaker.npz`
+tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
+                file_path="output.wav",
+                voice_dir="bark_voices/",
+                speaker="ljspeech")
+
+
+# When you run it again it uses the stored values to generate the voice.
+tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
+                file_path="output.wav",
+                voice_dir="bark_voices/",
+                speaker="ljspeech")
+
+
+# random speaker
+tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True)
+tts.tts_to_file("hello world", file_path="out.wav")
+```
+
+Using 🐸TTS Command line:
+
+```console
+# cloning the `ljspeech` voice
+tts --model_name  tts_models/multilingual/multi-dataset/bark \
+--text "This is an example." \
+--out_path "output.wav" \
+--voice_dir bark_voices/ \
+--speaker_idx "ljspeech" \
+--progress_bar True
+
+# Random voice generation
+tts --model_name  tts_models/multilingual/multi-dataset/bark \
+--text "This is an example." \
+--out_path "output.wav" \
+--progress_bar True
+```
+
+
+## Important resources & papers
+- Original Repo: https://github.com/suno-ai/bark
+- Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone
+- AudioLM: https://arxiv.org/abs/2209.03143
+
+## BarkConfig
+```{eval-rst}
+.. autoclass:: TTS.tts.configs.bark_config.BarkConfig
+    :members:
+```
+
+## BarkArgs
+```{eval-rst}
+.. autoclass:: TTS.tts.models.bark.BarkArgs
+    :members:
+```
+
+## Bark Model
+```{eval-rst}
+.. autoclass:: TTS.tts.models.bark.Bark
+    :members:
+```