返回 Skill 列表
extension
分类: 效率与办公无需 API Key

Audio

处理、提升并转换音频文件,包含降噪、归一化、格式转换、转录和播客工作流。

person作者: ivangdavilahubclawhub

Requirements

Required:

  • ffmpeg / ffprobe — core audio processing

Optional (for advanced features):

  • sox — additional noise reduction
  • whisper — local transcription (or use API)
  • demucs — stem separation

Quick Reference

| Situation | Load | |-----------|------| | FFmpeg commands by task | commands.md | | Loudness standards by platform | loudness.md | | Podcast production workflow | podcast.md | | Transcription workflow | transcription.md |

Core Capabilities

| Task | Method | |------|--------| | Convert formats | FFmpeg (-acodec) | | Remove noise | FFmpeg filters or SoX | | Normalize loudness | ffmpeg-normalize or -af loudnorm | | Transcribe | Whisper → text, SRT, VTT | | Separate stems | Demucs (vocals, drums, bass, other) |

Execution Pattern

  1. Clarify goal — What format? What loudness? What platform?
  2. Analyze sourceffprobe for codec, sample rate, channels, duration
  3. Process — FFmpeg/SoX for transformation
  4. Verify — Check output plays, meets specs, sounds correct
  5. Deliver — Provide file to user

Common Requests → Actions

| User says | Agent does | |-----------|------------| | "Convert to MP3" | -acodec libmp3lame -q:a 2 | | "Remove background noise" | Apply highpass/lowpass or dedicated denoiser | | "Normalize for podcast" | -af loudnorm=I=-16:TP=-1.5:LRA=11 | | "Transcribe this" | Whisper → output SRT/VTT/TXT | | "Extract audio from video" | -vn -acodec copy or re-encode | | "Make it smaller" | Lower bitrate: -b:a 128k or -b:a 96k | | "Speed up 1.5x" | -af atempo=1.5 |

Format Quick Reference

| Format | Use Case | Quality | |--------|----------|---------| | WAV | Master, editing | Lossless | | FLAC | Archive, audiophile | Lossless compressed | | MP3 | Universal sharing | Lossy, 128-320 kbps | | AAC/M4A | Apple, podcasts | Lossy, efficient | | OGG/Opus | WhatsApp, Discord | Lossy, very efficient |

Quality Defaults

  • Podcast: -16 LUFS (Spotify), -19 LUFS (Apple)
  • Music: -14 LUFS (Spotify), -16 LUFS (Apple Music)
  • MP3 quality: VBR -q:a 2 (~190 kbps) or CBR -b:a 192k
  • Sample rate: 44.1kHz for music, 48kHz for video sync

Scope

This skill:

  • Processes audio files user explicitly provides
  • Runs FFmpeg commands on user request
  • Does NOT access cloud services without user knowing
  • Does NOT store files persistently (user manages their files)