mirror of https://github.com/coqui-ai/TTS.git
docs: enhance TTS example scripts documentation with installation guide, examples, and troubleshooting
This commit is contained in:
parent
eef419b373
commit
b3e9f93dce
|
@ -0,0 +1,222 @@
|
|||
# TTS Example Scripts
|
||||
|
||||
This directory contains example scripts demonstrating how to use the TTS (Text-to-Speech) system.
|
||||
|
||||
## Available Scripts
|
||||
|
||||
1. `simple_tts.py` - The simplest way to use TTS with minimal setup
|
||||
2. `quick_tts.py` - Command-line interface for quick text-to-speech conversion
|
||||
3. `interactive_tts.py` - Interactive script with speaker selection and multi-line text input
|
||||
4. `example_tts.py` - Basic example showing TTS functionality
|
||||
|
||||
## Usage
|
||||
|
||||
### Simple TTS
|
||||
```bash
|
||||
python simple_tts.py
|
||||
```
|
||||
This will convert the default text to speech using speaker p335.
|
||||
|
||||
### Quick TTS
|
||||
```bash
|
||||
python quick_tts.py "Your text goes here"
|
||||
```
|
||||
Converts command-line text to speech immediately.
|
||||
|
||||
### Interactive TTS
|
||||
```bash
|
||||
python interactive_tts.py
|
||||
```
|
||||
Provides an interactive interface where you can:
|
||||
1. Choose from available speakers
|
||||
2. Enter multi-line text
|
||||
3. Generate speech with custom output filenames
|
||||
|
||||
## Output
|
||||
All scripts generate WAV files that can be played with any media player.
|
||||
- `simple_tts.py` generates `speech_[speaker_id].wav`
|
||||
- `quick_tts.py` generates `speech_output.wav`
|
||||
- `interactive_tts.py` generates `speech_output_[number].wav`
|
||||
|
||||
## Requirements
|
||||
- Python 3.9+
|
||||
- TTS library
|
||||
- espeak-ng (required for phonemization)
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Python Environment Setup
|
||||
```bash
|
||||
# Create a virtual environment
|
||||
python -m venv venv
|
||||
|
||||
# Activate virtual environment
|
||||
# On Windows
|
||||
venv\Scripts\activate
|
||||
# On macOS/Linux
|
||||
source venv/bin/activate
|
||||
|
||||
# Upgrade pip
|
||||
python -m pip install --upgrade pip
|
||||
```
|
||||
|
||||
### 2. Install Dependencies
|
||||
```bash
|
||||
# Install TTS library
|
||||
pip install TTS
|
||||
|
||||
# Install system dependencies
|
||||
# For macOS
|
||||
brew install espeak-ng
|
||||
|
||||
# For Ubuntu/Debian
|
||||
sudo apt-get install espeak-ng
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Text-to-Speech
|
||||
```python
|
||||
from TTS.api import TTS
|
||||
|
||||
# Initialize TTS
|
||||
tts = TTS(model_name="tts_models/en/vctk/vits")
|
||||
|
||||
# Simple conversion
|
||||
tts.tts_to_file(text="Hello, world!", file_path="output.wav", speaker="p335")
|
||||
```
|
||||
|
||||
### Multi-line Text
|
||||
```python
|
||||
text = """
|
||||
This is a multi-line text example.
|
||||
It will be converted to speech with proper pauses.
|
||||
You can use it for longer content like articles or books.
|
||||
"""
|
||||
```
|
||||
|
||||
### Different Speakers
|
||||
```python
|
||||
# List available speakers
|
||||
tts = TTS(model_name="tts_models/en/vctk/vits")
|
||||
print("Available speakers:", tts.speakers)
|
||||
|
||||
# Try different speakers
|
||||
tts.tts_to_file(text="Same text, different voice.", file_path="speaker1.wav", speaker="p227")
|
||||
tts.tts_to_file(text="Same text, different voice.", file_path="speaker2.wav", speaker="p228")
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model Options
|
||||
- Model Name: `tts_models/en/vctk/vits`
|
||||
- Sample Rate: 22050 Hz
|
||||
- Speaker IDs: p225-p376 available
|
||||
- Language: English
|
||||
|
||||
### Audio Settings
|
||||
```python
|
||||
# Available audio settings
|
||||
settings = {
|
||||
"sample_rate": 22050, # Audio sample rate
|
||||
"output_format": "wav", # Output format (wav, mp3)
|
||||
"speed": 1.0, # Speech speed (0.5-2.0)
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Tips
|
||||
1. **Memory Usage**
|
||||
- Batch processing for multiple files
|
||||
- Clear cache between large generations
|
||||
- Monitor system resources
|
||||
|
||||
2. **Speed Optimization**
|
||||
- Use CPU for small tasks
|
||||
- Enable GPU for batch processing
|
||||
- Cache model for repeated use
|
||||
|
||||
## Development
|
||||
|
||||
### Setting up Development Environment
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone <repository-url>
|
||||
cd TTS
|
||||
|
||||
# Create development environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # or `venv\Scripts\activate` on Windows
|
||||
|
||||
# Install development dependencies
|
||||
pip install -r requirements.dev.txt
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
# Run all tests
|
||||
python -m pytest tests/
|
||||
|
||||
# Run specific test file
|
||||
python -m pytest tests/test_specific.py
|
||||
```
|
||||
|
||||
### Contributing
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Run tests
|
||||
5. Submit pull request
|
||||
|
||||
## Supported Languages
|
||||
The current model (`tts_models/en/vctk/vits`) supports English with multiple speakers.
|
||||
Each speaker has a unique voice characteristic. The available speakers can be viewed
|
||||
when running the interactive script.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **No audio output file generated**
|
||||
- Check if you have write permissions in the current directory
|
||||
- Ensure enough disk space is available
|
||||
- Verify that the text input is not empty
|
||||
|
||||
2. **espeak-ng not found**
|
||||
- Make sure espeak-ng is installed correctly
|
||||
- For macOS: `brew install espeak-ng`
|
||||
- For Ubuntu/Debian: `sudo apt-get install espeak-ng`
|
||||
- Add espeak-ng to your system PATH if needed
|
||||
|
||||
3. **Speaker not found error**
|
||||
- Use the interactive script to see available speaker IDs
|
||||
- Default speaker is "p335"
|
||||
- Make sure to use exact speaker ID (case sensitive)
|
||||
|
||||
4. **Model download issues**
|
||||
- Check your internet connection
|
||||
- Ensure you have enough disk space
|
||||
- Try removing the downloaded model and let it re-download
|
||||
|
||||
5. **Memory errors**
|
||||
- Try with shorter text inputs
|
||||
- Close other memory-intensive applications
|
||||
- Check if your system meets minimum requirements
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
1. **Custom Output Location**
|
||||
- All scripts support custom output paths
|
||||
- Use absolute paths for reliable file saving
|
||||
- Ensure write permissions in target directory
|
||||
|
||||
2. **Voice Customization**
|
||||
- Try different speakers for variety
|
||||
- Use interactive mode to preview voices
|
||||
- Experiment with different text formats
|
||||
|
||||
### Getting Help
|
||||
If you encounter issues not covered here:
|
||||
1. Check the error message for specific details
|
||||
2. Verify your Python environment and dependencies
|
||||
3. Try running the example scripts with simple inputs first
|
||||
4. Check the TTS library documentation for advanced issues
|
Loading…
Reference in New Issue