Back to skills
extension
Category: Development & EngineeringNo API key required

K8s Self Hosted Whisper Api

Transcribe audio via the self-hosted Whisper ASR instance running on Kubernetes. Use this skill whenever the user wants to transcribe audio files, convert sp...

personAuthor: xavjerhubclawhub

Self-Hosted Whisper API (curl)

Transcribe an audio file via the Whisper ASR webservice at http://whisper-asr.whisper-asr.svc.cluster.local:9000.

Uses the onerahmet/openai-whisper-asr-webservice API (/asr endpoint).

Quick start

{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

Defaults:

  • Endpoint: http://whisper-asr.whisper-asr.svc.cluster.local:9000/asr
  • Task: transcribe
  • Output: txt

Useful flags

{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --language en --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language de
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --output srt --out /tmp/subtitles.srt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --output vtt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --translate
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --vad-filter --json
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --word-timestamps --json

Notes

  • Supported --output formats: txt, json, vtt, srt, tsv
  • --translate produces an English transcript regardless of source language
  • --vad-filter enables voice activity detection to skip silent sections
  • --word-timestamps adds word-level timing (use with --json)
  • The model is configured on the server side (ASR_MODEL env var), not per request
  • Swagger docs available at http://whisper-asr.whisper-asr.svc.cluster.local:9000/docs
  • No authentication required