Update dataset formatting docs

2022-02-14 10:49:25 +00:00 · 2022-02-14 10:49:25 +00:00 · 2db67e3356
parent 58c38de58d
commit 2db67e3356
2 changed files with 51 additions and 7 deletions
--- a/docs/source/formatting_your_dataset.md
+++ b/docs/source/formatting_your_dataset.md
@ -58,23 +58,68 @@ If you use a different dataset format then the LJSpeech or the other public data

 If your dataset is in a new language or it needs special normalization steps, then you need a new `text_cleaner`.

-What you get out of a `formatter` is a `List[List[]]` in the following format.
+What you get out of a `formatter` is a `List[Dict]` in the following format.

 ```
 >>> formatter(metafile_path)
-[["audio1.wav", "This is my sentence.", "MyDataset"],
-["audio1.wav", "This is maybe a sentence.", "MyDataset"],
-...
+[
+    {"audio_file":"audio1.wav", "text":"This is my sentence.", "speaker_name":"MyDataset", "language": "lang_code"},
+    {"audio_file":"audio1.wav", "text":"This is maybe a sentence.", "speaker_name":"MyDataset", "language": "lang_code"},
+    ...
 ]
 ```

-Each sub-list is parsed as ```["<filename>", "<transcription>", "<speaker_name">]```.
+Each sub-list is parsed as ```{"<filename>", "<transcription>", "<speaker_name">]```.
 ```<speaker_name>``` is the dataset name for single speaker datasets and it is mainly used
 in the multi-speaker models to map the speaker of the each sample. But for now, we only focus on single speaker datasets.

-The purpose of a `formatter` is to parse your metafile and load the audio file paths and transcriptions. Then, its output passes to a `Dataset` object. It computes features from the audio signals, calls text normalization routines, and converts raw text to
+The purpose of a `formatter` is to parse your manifest file and load the audio file paths and transcriptions.
+Then, the output is passed to the `Dataset`. It computes features from the audio signals, calls text normalization routines, and converts raw text to
 phonemes if needed.

+## Loading your dataset
+
+Load one of the dataset supported by 🐸TTS.
+
+```python
+from TTS.tts.configs.shared_configs import BaseDatasetConfig
+from TTS.tts.datasets import load_tts_samples
+
+
+# dataset config for one of the pre-defined datasets
+dataset_config = BaseDatasetConfig(
+    name="vctk", meta_file_train="", language="en-us", path="dataset-path")
+)
+
+# load training samples
+train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=True)
+```
+
+Load a custom dataset with a custom formatter.
+
+```python
+from TTS.tts.datasets import load_tts_samples
+
+
+# custom formatter implementation
+def formatter(root_path, manifest_file, **kwargs):  # pylint: disable=unused-argument
+    """Assumes each line as ```<filename>|<transcription>```
+    """
+    txt_file = os.path.join(root_path, manifest_file)
+    items = []
+    speaker_name = "my_speaker"
+    with open(txt_file, "r", encoding="utf-8") as ttf:
+        for line in ttf:
+            cols = line.split("|")
+            wav_file = os.path.join(root_path, "wavs", cols[0])
+            text = cols[1]
+            items.append({"text":text, "audio_file":wav_file, "speaker_name":speaker_name})
+    return items
+
+# load training samples
+train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=True, formatter=formatter)
+```
+
 See `TTS.tts.datasets.TTSDataset`, a generic `Dataset` implementation for the `tts` models.

 See `TTS.vocoder.datasets.*`, for different `Dataset` implementations for the `vocoder` models.
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -27,7 +27,6 @@
    formatting_your_dataset
    what_makes_a_good_dataset
    tts_datasets
-    converting_torch_to_tf

 .. toctree::
    :maxdepth: 2