Static Audio Generation Skill

Generate pre-recorded audio files using ElevenLabs TTS for instant playback (no API latency, reduced cost, offline support).

When to Use This Skill

"Generate greeting audio" - Create greeting responses
"Add new startup message" - System status audio
"Create test audio files" - Testing wake word/STT
"Sync audio to BobFast5" - Cross-repo audio management
"Generate static TTS" - Any pre-recorded phrases

Quick Reference

Directory Structure

audio/static/
├── greetings/          # Greeting responses ("Yes wizard?", "I'm listening")
│   ├── yes_wizard.mp3
│   ├── im_listening.mp3
│   └── greetings.txt   # Index file
├── startup/            # Startup/shutdown/error messages
│   ├── initializing.mp3
│   ├── startup_complete.mp3
│   └── startup.txt     # Index file
└── testing/            # Test audio for wake word/STT testing
    ├── wake_up_bob.mp3
    ├── hey_bob.mp3
    └── what_time_is_it.mp3

Generation Commands

# Generate all greetings (from predefined list)
python generate_greeting_audio.py

# Generate all startup/shutdown/error messages
python generate_startup_audio.py

# Generate all static audio (comprehensive)
python generate_static_audio.py

Naming Convention

Rule: Lowercase, underscores, descriptive

"Yes wizard?" → yes_wizard.mp3
"I'm listening" → im_listening.mp3
"Startup complete. Listening for wake words." → startup_complete_listening_for_wake_words.mp3

Normalization function (tts/static_audio.py):

from tts.static_audio import normalize_phrase_to_filename
filename = normalize_phrase_to_filename("Yes wizard?")  # → "yes_wizard"

Generation Workflows

Workflow 1: Generate Greetings

Script: generate_greeting_audio.py

# 1. Define greetings list (edit script)
GREETINGS = [
    "Yes wizard?",
    "What do you need boss?",
    "I'm listening",
    "Yes?"
]

# 2. Run generation
python generate_greeting_audio.py

# Output:
# audio/static/greetings/yes_wizard.mp3
# audio/static/greetings/what_do_you_need_boss.mp3
# audio/static/greetings/im_listening.mp3
# audio/static/greetings/yes.mp3
# audio/static/greetings/greetings.txt (index)

When to add new greetings:

Adding personality variety
Testing different responses
Supporting new conversation states

Workflow 2: Generate Startup Messages

Script: generate_startup_audio.py

# 1. Define messages (edit script)
STARTUP_PHRASES = [
    "Initializing",
    "Found eye controller",
    "Startup complete. Listening for wake words.",
]

SHUTDOWN_PHRASES = ["Shutting down"]
ERROR_PHRASES = ["Configuration error"]

# 2. Run generation
python generate_startup_audio.py

# Output: audio/static/startup/*.mp3

When to add startup messages:

New component initialization feedback
Debugging startup sequence
User experience improvements

Workflow 3: Generate Test Audio

Purpose: Audio files for automated testing (wake word, STT, full pipeline)

Test audio types:

Wake word triggers: "Wake up Bob", "Hey Bob"
Commands: "What time is it?", "Tell me a joke"
Conversations: Full conversation test sequences

Generation options:

Option A: Use ElevenLabs (Bob's voice)

# Add to generate_static_audio.py or create test-specific script
TEST_PHRASES = [
    "Wake up Bob",
    "Hey Bob",
    "What time is it?",
    "Tell me a joke",
    "Can you speak louder?",
    "What is the weather like today?",
    "Goodbye Bob"
]

# Generate to audio/static/testing/

Option B: Record yourself

# Record 3 seconds
arecord -d 3 -f S16_LE -r 16000 -c 1 audio/static/testing/wake_up_bob.wav

# Convert to MP3 (optional)
ffmpeg -i wake_up_bob.wav -b:a 32k wake_up_bob.mp3

Option C: Use espeak (quick but robotic)

espeak "Wake up Bob" --stdout | \
    sox -t wav - -r 16000 -c 1 -b 16 audio/static/testing/wake_up_bob.wav

Workflow 4: Cross-Repo Sync (BobTheSkull5 → BobFast5)

When: After generating new audio files for testing on vision system

Method 1: Manual copy (Windows)

# Copy specific category
copy audio\static\testing\*.mp3 ..\BobFast5\audio\static\testing\

# Or use xcopy for directory sync
xcopy audio\static\testing ..\BobFast5\audio\static\testing\ /Y /S

Method 2: Use cross-repo-sync skill

# See cross-repo-sync skill for safe patterns

Method 3: Deploy to Pi (includes audio)

# deploy_to_pi.bat doesn't currently copy audio/ directory
# Add manual step or extend deployment script
pscp -pw peacock7 -r audio/static knarl@192.168.1.44:/home/knarl/BobTheSkull5/audio/

ElevenLabs Configuration

Voice Settings (from BobConfig.py)

ELEVEN_LABS_VOICE_ID = "nPczCjzI2devNBz1zQrb"  # Brian (default Bob voice)
ELEVEN_LABS_MODEL = "eleven_turbo_v2_5"
TTS_STABILITY = 0.71
TTS_SIMILARITY_BOOST = 0.5
TTS_STYLE = 0.0
TTS_USE_SPEAKER_BOOST = True

Voice Selection Guide

Brian (default): Deep, authoritative, sarcastic personality Use for: Greetings, conversation responses, personality-driven content

Alternative voices (if needed):

Calmer voice for error messages
Different voice for testing/debugging distinction

Cost Optimization

Strategy: Generate once, reuse forever

Greetings used hundreds of times → synthesize once saves $$$
Startup messages on every boot → pre-generate
Test audio → generate once, test infinite times

Cost per file: ~$0.18 per 1000 characters (turbo_v2_5)

Average greeting: ~15 characters = $0.0027 per file
Generate 10 greetings once = $0.027
Use 1000 times = $0.00003 per use (vs $0.0027 per dynamic TTS)

Common Use Cases

Use Case 1: Add New Greeting Variant

# 1. Edit generate_greeting_audio.py
GREETINGS = [
    "Yes wizard?",
    "What do you need boss?",
    "I'm listening",
    "Yes?",
    "Speak wizard",  # NEW
]

# 2. Generate
python generate_greeting_audio.py

# 3. Verify
ls audio/static/greetings/
# Should see: speak_wizard.mp3

# 4. Update state_machine.py to use new greeting (if needed)
# Edit GREETINGS list in state_machine/state_machine.py

# 5. Test playback
# Use test_audio_output.py or manually play

Use Case 2: Generate Test Suite Audio

# Create generate_test_audio.py
#!/usr/bin/env python3
from pathlib import Path
from dotenv import load_dotenv
from elevenlabs import ElevenLabs, VoiceSettings
from BobConfig import BobConfig
from tts.static_audio import normalize_phrase_to_filename

load_dotenv()
config = BobConfig()
config.load_from_env()

OUTPUT_DIR = Path("audio/static/testing")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

TEST_PHRASES = [
    "Wake up Bob",
    "Hey Bob",
    "What time is it?",
    "Tell me a joke",
    "Can you speak louder?",
    "What is the weather like today?",
    "Goodbye Bob"
]

client = ElevenLabs(api_key=config.ELEVEN_LABS_API_KEY)

for phrase in TEST_PHRASES:
    filename = f"{normalize_phrase_to_filename(phrase)}.mp3"
    filepath = OUTPUT_DIR / filename

    print(f"Generating: {phrase} → {filename}")

    audio_generator = client.text_to_speech.convert(
        voice_id=config.ELEVEN_LABS_VOICE_ID,
        text=phrase,
        model_id=config.ELEVEN_LABS_MODEL,
        voice_settings=VoiceSettings(
            stability=config.TTS_STABILITY,
            similarity_boost=config.TTS_SIMILARITY_BOOST,
            style=config.TTS_STYLE,
            use_speaker_boost=config.TTS_USE_SPEAKER_BOOST
        )
    )

    audio_data = b"".join(audio_generator)
    filepath.write_bytes(audio_data)
    print(f"  ✓ Saved ({len(audio_data)/1024:.1f} KB)\n")

Use Case 3: Batch Regenerate All Audio

# Regenerate everything (after voice change or quality update)
python generate_greeting_audio.py
python generate_startup_audio.py
python generate_static_audio.py

# Verify total file count
find audio/static -name "*.mp3" | wc -l

# Check total size
du -sh audio/static

Troubleshooting

Error: "ELEVEN_LABS_API_KEY not found"

Problem: API key not in environment

Solution:

# Check .env file
cat .env | grep ELEVEN_LABS_API_KEY

# Should show:
# BOBTHESKULL_ELEVEN_LABS_API_KEY=sk_...

# If missing, add it:
echo "BOBTHESKULL_ELEVEN_LABS_API_KEY=sk-your-key-here" >> .env

Error: "Audio generation failed"

Problem: API rate limit or network issue

Solution:

# Check API quota at elevenlabs.io dashboard
# Wait 1 minute and retry
# Or add retry logic with delay

Files generated but playback fails

Problem: Incorrect audio format or corrupted file

Solution:

# Check file size (should be >1KB for typical greeting)
ls -lh audio/static/greetings/

# Test playback directly
mpv audio/static/greetings/yes_wizard.mp3

# Regenerate specific file if corrupted

Filename normalization incorrect

Problem: Special characters in phrase causing issues

Solution:

# Check normalization
from tts.static_audio import normalize_phrase_to_filename
print(normalize_phrase_to_filename("Your phrase here"))

# Should convert:
# - Spaces → underscores
# - Punctuation → removed
# - Uppercase → lowercase
# Example: "Yes, wizard?" → "yes_wizard"

Pro Tips

Generate in batches - Create all related audio at once (all greetings, all startup messages)
Test before deploying - Play generated files locally before syncing to Pi
Version control audio - Commit generated MP3 files to git (they're small and rarely change)
Use index files - greetings.txt and startup.txt document what's available
Consistent voice settings - Don't change TTS settings mid-project or you'll need to regenerate everything
Organize by category - Use subdirectories (greetings/, startup/, testing/) for clarity
Name descriptively - startup_complete_listening_for_wake_words.mp3 better than startup_msg_3.mp3
Test audio duration - Keep greetings short (1-2 seconds) for responsive feel
Create test variants - Generate same phrase with different emphases for testing
Document custom scripts - If you create generate_test_audio.py, add it to repo

Integration with Other Skills

Works well with:

cross-repo-sync - Syncing audio between BobTheSkull5 and BobFast5
audio-injection-testing - Using generated test audio for automated testing
pi-deployment - Deploying audio files to Raspberry Pi

Time Savings

Without skill:

10-15 minutes per audio file (setup, generation, naming, placement, verification)
Frequent errors in naming/directory structure
Manual cross-repo copying with mistakes

With skill:

3-5 minutes per audio file (documented process)
Consistent naming via normalization function
Clear cross-repo sync patterns

Estimated time savings: 2-3x faster

References

Generation Scripts:

Supporting Code:

tts/static_audio.py - Static audio playback and normalization
BobConfig.py - ElevenLabs configuration

Audio Directories:

audio/static/greetings/ - Greeting responses
audio/static/startup/ - Startup/shutdown/error messages
audio/static/testing/ - Test audio files

static-audio-generation

Static Audio Generation Skill

When to Use This Skill

Quick Reference

Directory Structure

Generation Commands

Naming Convention

Generation Workflows

Workflow 1: Generate Greetings

Workflow 2: Generate Startup Messages

Workflow 3: Generate Test Audio

Workflow 4: Cross-Repo Sync (BobTheSkull5 → BobFast5)

ElevenLabs Configuration

Voice Settings (from BobConfig.py)

Voice Selection Guide

Cost Optimization

Common Use Cases

Use Case 1: Add New Greeting Variant

Use Case 2: Generate Test Suite Audio

Use Case 3: Batch Regenerate All Audio

Troubleshooting

Error: "ELEVEN_LABS_API_KEY not found"

Error: "Audio generation failed"

Files generated but playback fails

Filename normalization incorrect

Pro Tips

Integration with Other Skills

Time Savings

References