diff --git a/docs/source/index.md b/docs/source/index.md index 6b55ebd8..5ef3d88c 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -52,6 +52,7 @@ models/tacotron1-2.md models/overflow.md models/tortoise.md + models/bark.md .. toctree:: :maxdepth: 2 diff --git a/docs/source/models/bark.md b/docs/source/models/bark.md new file mode 100644 index 00000000..d07cca3f --- /dev/null +++ b/docs/source/models/bark.md @@ -0,0 +1,103 @@ +# Bark 🐶 + +Bark is a multi-lingual TTS model created by [Suno-AI](https://www.suno.ai/). It can generate conversational speech as well as music and sound effects. +It is architecturally very similar to Google's [AudioLM](https://arxiv.org/abs/2209.03143). For more information, please refer to the [Suno-AI's repo](https://github.com/suno-ai/bark). + + +## Acknowledgements +- 👑[Suno-AI](https://www.suno.ai/) for training and open-sourcing this model. +- 👑[serp-ai](https://github.com/serp-ai/bark-with-voice-clone) for controlled voice cloning. + + +## Example Use + +```python +text = "Hello, my name is Manmay , how are you?" + +from TTS.tts.configs.bark_config import BarkConfig +from TTS.tts.models.bark import Bark + +config = BarkConfig() +model = Bark.init_from_config(config) +model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True) + +# with random speaker +output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None) + +# cloning a speaker. +# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz` +output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/") +``` + +Using 🐸TTS API: + +```python +from TTS.api import TTS + +# Load the model to GPU +# Bark is really slow on CPU, so we recommend using GPU. +tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True) + + +# Cloning a new speaker +# This expects to find a mp3 or wav file like `bark_voices/new_speaker/speaker.wav` +# It computes the cloning values and stores in `bark_voices/new_speaker/speaker.npz` +tts.tts_to_file(text="Hello, my name is Manmay , how are you?", + file_path="output.wav", + voice_dir="bark_voices/", + speaker="ljspeech") + + +# When you run it again it uses the stored values to generate the voice. +tts.tts_to_file(text="Hello, my name is Manmay , how are you?", + file_path="output.wav", + voice_dir="bark_voices/", + speaker="ljspeech") + + +# random speaker +tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True) +tts.tts_to_file("hello world", file_path="out.wav") +``` + +Using 🐸TTS Command line: + +```console +# cloning the `ljspeech` voice +tts --model_name tts_models/multilingual/multi-dataset/bark \ +--text "This is an example." \ +--out_path "output.wav" \ +--voice_dir bark_voices/ \ +--speaker_idx "ljspeech" \ +--progress_bar True + +# Random voice generation +tts --model_name tts_models/multilingual/multi-dataset/bark \ +--text "This is an example." \ +--out_path "output.wav" \ +--progress_bar True +``` + + +## Important resources & papers +- Original Repo: https://github.com/suno-ai/bark +- Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone +- AudioLM: https://arxiv.org/abs/2209.03143 + +## BarkConfig +```{eval-rst} +.. autoclass:: TTS.tts.configs.bark_config.BarkConfig + :members: +``` + +## BarkArgs +```{eval-rst} +.. autoclass:: TTS.tts.models.bark.BarkArgs + :members: +``` + +## Bark Model +```{eval-rst} +.. autoclass:: TTS.tts.models.bark.Bark + :members: +```