Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
ElevenLabs TTS Setup Guide
Overview
ElevenLabs provides high-quality, natural-sounding text-to-speech (TTS) that significantly improves the audio quality of your animations compared to free alternatives.
Why ElevenLabs?
- β Superior Quality: Most natural-sounding AI voices available
- β Fast Generation: Typically < 5 seconds for narration
- β Reliable: Consistent output, no blank audio issues
- β Multiple Voices: Wide selection of voices for different styles
- β Emotional Range: Voices can convey emotion and emphasis
Getting Started
Step 1: Create an ElevenLabs Account
- Go to elevenlabs.io
- Click "Sign Up" (top right)
- Choose a plan:
- Free Tier: 10,000 characters/month (~10 animations)
- Starter: $5/month for 30,000 characters
- Creator: $22/month for 100,000 characters
- Pro: $99/month for 500,000 characters
Step 2: Get Your API Key
- Log in to your ElevenLabs account
- Click your profile icon (top right)
- Select "Profile"
- Find the "API Key" section
- Click "Copy" to copy your API key
- It looks like:
sk_abc123def456...
- It looks like:
Step 3: Configure the Project
Option A: Environment Variable (Recommended)
Create or edit .env file in the project root:
# ElevenLabs Configuration
ELEVENLABS_API_KEY=sk_your_actual_api_key_here
# Optional: Hugging Face as fallback
HUGGINGFACE_API_KEY=hf_your_huggingface_key_here
Option B: Command Line Argument
python orchestrator.py "photosynthesis" --elevenlabs-key sk_your_api_key_here
Option C: Programmatic
from orchestrator import NeuroAnimOrchestrator
orchestrator = NeuroAnimOrchestrator(
elevenlabs_api_key="sk_your_api_key_here",
hf_api_key="hf_your_fallback_key_here"
)
Step 4: Install Dependencies
# Activate your virtual environment
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install required packages
pip install httpx gtts pydub
Available Voices
The system comes with 9 pre-configured professional voices:
| Voice Name | ID | Description | Best For |
|---|---|---|---|
| rachel | 21m00Tcm4TlvDq8ikWAM |
Clear, neutral female | Educational content, narration |
| adam | pNInz6obpgDQGcFmaJgB |
Deep, confident male | Documentary, serious topics |
| antoni | ErXwobaYiN019PkySvjV |
Well-rounded male | General narration |
| arnold | VR6AewLTigWG4xSOukaG |
Crisp, articulate male | Technical content |
| bella | EXAVITQu4vr4xnSDxMaL |
Soft, gentle female | Children's content |
| domi | AZnzlk1XvdvUeBnXmlld |
Strong female | Assertive narration |
| elli | MF3mGyEYCl7XYWbV9V6O |
Emotional, expressive female | Storytelling |
| josh | TxGEqnHWrfWFTfGW9XjX |
Young, energetic male | Youth content |
| sam | yoZ06aMxZJJ28mfd3POQ |
Raspy male | Character voices |
Using a Specific Voice
# In your code
tts_result = await tts_generator.generate_speech(
text="Your narration text",
output_path=audio_file,
voice="adam" # Change to any voice name
)
Using Custom Voices
If you've created custom voices in ElevenLabs:
# Use the voice ID directly
tts_result = await tts_generator.generate_speech(
text="Your narration text",
output_path=audio_file,
voice="your_custom_voice_id_here"
)
Advanced Configuration
Voice Settings
You can fine-tune voice characteristics:
tts_result = await tts_generator.generate_speech(
text="Your narration text",
output_path=audio_file,
voice="rachel",
stability=0.5, # 0.0-1.0: Lower = more expressive, Higher = more stable
similarity_boost=0.75, # 0.0-1.0: Higher = more similar to original voice
style=0.0, # 0.0-1.0: Style exaggeration
use_speaker_boost=True # Enhance clarity
)
Stability
- Low (0.0-0.3): More expressive and variable, good for storytelling
- Medium (0.4-0.6): Balanced, good for most content (default: 0.5)
- High (0.7-1.0): Very consistent, good for audiobooks
Similarity Boost
- Low (0.0-0.4): More creative interpretation
- Medium (0.5-0.7): Balanced (default: 0.75)
- High (0.8-1.0): Closest to the original voice
Model Selection
ElevenLabs offers different models:
tts_result = await tts_generator.generate_speech(
text="Your narration text",
output_path=audio_file,
voice="rachel",
model_id="eleven_monolingual_v1" # Default, English only, fastest
# model_id="eleven_multilingual_v2" # Supports multiple languages
# model_id="eleven_turbo_v2" # Faster, slightly lower quality
)
Testing Your Setup
Quick Test Script
Create test_tts.py:
import asyncio
from pathlib import Path
from utils.tts import generate_speech_elevenlabs
async def test_elevenlabs():
"""Test ElevenLabs TTS."""
text = "Hello! This is a test of ElevenLabs text to speech."
output = Path("test_audio.mp3")
try:
result = await generate_speech_elevenlabs(
text=text,
output_path=output,
voice="rachel"
)
print(f"β
Success! Audio saved to: {output}")
print(f"Provider: {result['provider']}")
print(f"File size: {result['file_size_bytes']} bytes")
except Exception as e:
print(f"β Error: {e}")
if __name__ == "__main__":
asyncio.run(test_elevenlabs())
Run it:
python test_tts.py
Test All Voices
import asyncio
from pathlib import Path
from utils.tts import TTSGenerator
async def test_all_voices():
"""Generate samples of all available voices."""
tts = TTSGenerator()
voices = await tts.get_available_voices()
text = "This is a sample of my voice for educational animations."
for voice_name in ["rachel", "adam", "bella"]:
output = Path(f"voice_sample_{voice_name}.mp3")
print(f"Generating {voice_name}...")
result = await tts.generate_speech(
text=text,
output_path=output,
voice=voice_name
)
print(f"β
{voice_name}: {output}")
if __name__ == "__main__":
asyncio.run(test_all_voices())
How the Fallback System Works
The TTS system has automatic fallback:
1. Try ElevenLabs (if API key available)
β (if fails)
2. Try Hugging Face TTS (if API key available)
β (if fails)
3. Try Google TTS (free, always available)
You can disable fallback:
tts_generator = TTSGenerator(
elevenlabs_api_key="your_key",
fallback_enabled=False # Fail immediately if ElevenLabs fails
)
Monitoring Usage
Check Your Usage
- Go to elevenlabs.io
- Log in
- Click "Usage" in the sidebar
- View your character usage and remaining quota
Estimate Costs
Rule of thumb: 1 minute of narration β 150-200 words β 900-1200 characters
Free Tier (10,000 chars/month):
- ~8-10 minutes of narration
- ~8-10 animations (assuming 1 min each)
Starter ($5/month, 30,000 chars):
- ~25-30 minutes of narration
- ~25-30 animations
Creator ($22/month, 100,000 chars):
- ~80-100 minutes of narration
- ~80-100 animations
Troubleshooting
Problem: "ElevenLabs API key not provided"
Solution:
- Check your
.envfile exists - Verify
ELEVENLABS_API_KEY=sk_...is set correctly - No quotes around the key
- No spaces around the
=
Problem: "401 Unauthorized"
Solutions:
- API key is invalid
- API key has expired
- Account has been suspended
- Check your key at elevenlabs.io/profile
Problem: "429 Too Many Requests"
Solutions:
- You've exceeded your quota
- Wait for quota to reset (monthly)
- Upgrade your plan
- Enable fallback to HuggingFace/gTTS
Problem: "Audio file is blank/silent"
Solutions:
- Check the output file size (should be > 10KB)
- Try a different voice
- Check if text is too short (< 10 chars)
- Verify audio format is compatible
Problem: "Slow generation"
Solutions:
- Use
eleven_turbo_v2model - Check your internet connection
- Reduce text length (split long narrations)
- Consider caching commonly used phrases
Problem: "Import Error: No module named 'httpx'"
Solution:
pip install httpx gtts pydub
Best Practices
1. Text Preparation
- Use proper punctuation: Helps with natural pauses
- Avoid special characters: Stick to alphanumeric and basic punctuation
- Break long text: Split into shorter segments for better pacing
- Add pauses: Use
...for longer pauses
Example:
text = """
Photosynthesis is the process by which plants create energy.
It happens in the chloroplasts... using sunlight, water, and carbon dioxide.
The result? Glucose and oxygen!
"""
2. Voice Selection
- Educational content: Rachel, Arnold
- Storytelling: Elli, Antoni
- Technical topics: Adam, Arnold
- Children's content: Bella, Josh
3. Caching
For repeated phrases, cache the audio:
import hashlib
from pathlib import Path
def get_cached_audio(text: str, voice: str) -> Path:
"""Get cached audio or generate if not exists."""
text_hash = hashlib.md5(f"{text}:{voice}".encode()).hexdigest()
cache_path = Path(f"audio_cache/{text_hash}.mp3")
if cache_path.exists():
return cache_path
# Generate and cache
cache_path.parent.mkdir(exist_ok=True)
# ... generate audio ...
return cache_path
4. Error Handling
Always handle TTS errors gracefully:
try:
audio = await tts_generator.generate_speech(...)
except Exception as e:
logger.error(f"TTS failed: {e}")
# Use fallback or text overlay instead
return None
Security Best Practices
β DO:
- Store API keys in
.envfile - Add
.envto.gitignore - Use environment variables in production
- Rotate keys periodically
- Use separate keys for dev/prod
β DON'T:
- Commit API keys to git
- Share keys in public forums
- Hard-code keys in source files
- Use production keys for testing
- Share keys between team members
Cost Optimization Tips
- Use Free Tier First: Test with 10k chars/month
- Enable Fallback: Save quota by using free alternatives when needed
- Cache Audio: Don't regenerate same narration
- Optimize Text: Remove unnecessary words
- Batch Processing: Generate multiple animations in one session
- Monitor Usage: Set alerts in ElevenLabs dashboard
Getting Help
ElevenLabs Support
- Documentation: https://docs.elevenlabs.io
- Discord: https://discord.gg/elevenlabs
- Email: [email protected]
Project Issues
- GitHub Issues: [Your repo URL]
- Documentation: See
README.md - Examples: See
example.py
Alternative TTS Providers
If ElevenLabs doesn't work for you:
Hugging Face (Free)
HUGGINGFACE_API_KEY=hf_your_key_here
- Pros: Free, open source
- Cons: Lower quality, slower
Google TTS (Free)
# No API key needed, automatic fallback
- Pros: Free, reliable, fast
- Cons: Robotic voice, limited customization
AWS Polly
# Requires AWS credentials
- Pros: Good quality, many voices
- Cons: AWS complexity, pay-per-use
Azure TTS
# Requires Azure subscription
- Pros: Good quality, multilingual
- Cons: Microsoft ecosystem, pricing
Next Steps
- β Set up your API key
- β
Test with
test_tts.py - β Generate your first animation
- β Experiment with different voices
- β Optimize settings for your content
Happy animating! π¬ποΈ