Back to skills
extension
Category: Productivity & OfficeNo API key required

Local STT (Nvidia Parakeet + Whisper Support)

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

personAuthor: araa47hubclawhub

Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

  • Parakeet (default): Best accuracy for English, correctly captures names and filler words
  • Whisper: Fastest inference, supports 99 languages

Usage

# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet

Options

  • -b/--backend: parakeet (default), whisper
  • -m/--model: Model variant (see below)
  • --no-int8: Disable int8 quantization
  • -q/--quiet: Suppress progress
  • --room-id: Matrix room ID for direct message

Models

Parakeet (default backend)

| Model | Description | |-------|-------------| | v2 (default) | English only, best accuracy | | v3 | Multilingual |

Whisper

| Model | Description | |-------|-------------| | tiny | Fastest, lower accuracy | | base (default) | Good balance | | small | Better accuracy | | large-v3-turbo | Best quality, slower |

Benchmark (24s audio)

| Backend/Model | Time | RTF | Notes | |---------------|------|-----|-------| | Whisper Base int8 | 0.43s | 0.018x | Fastest | | Parakeet v2 int8 | 0.60s | 0.025x | Best accuracy | | Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |

openclaw.json

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}