MiniMax CLI — Agent Skill Guide

Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.

Prerequisites

# Install
npm install -g mmx-cli

# Auth (persisted to ~/.mmx/credentials.json)
mmx auth login --api-key sk-xxxxx

# Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello"

Region is auto-detected. Override with --region global or --region cn.

Agent Flags

Always use these flags in non-interactive (agent/CI) contexts:

| Flag | Purpose | |---|---| | --non-interactive | Fail fast on missing args instead of prompting | | --quiet | Suppress spinners/progress; stdout is pure data | | --output json | Machine-readable JSON output | | --async | Return task ID immediately (video generation) | | --dry-run | Preview the API request without executing | | --yes | Skip confirmation prompts |

Commands

text chat

Chat completion. Default model: MiniMax-M2.7.

mmx text chat --message <text> [flags]

| Flag | Type | Description | |---|---|---| | --message <text> | string, required, repeatable | Message text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello") | | --messages-file <path> | string | JSON file with messages array. Use - for stdin | | --system <text> | string | System prompt | | --model <model> | string | Model ID (default: MiniMax-M2.7) | | --max-tokens <n> | number | Max tokens (default: 4096) | | --temperature <n> | number | Sampling temperature (0.0, 1.0] | | --top-p <n> | number | Nucleus sampling threshold | | --stream | boolean | Stream tokens (default: on in TTY) | | --tool <json-or-path> | string, repeatable | Tool definition JSON or file path |

# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet

# Multi-turn
mmx text chat \
  --system "You are a coding assistant." \
  --message "user:Write fizzbuzz in Python" \
  --output json

# From file
cat conversation.json | mmx text chat --messages-file - --output json

stdout: response text (text mode) or full response object (json mode).

image generate

Generate images. Model: image-01.

mmx image generate --prompt <text> [flags]

| Flag | Type | Description | |---|---|---| | --prompt <text> | string, required | Image description | | --aspect-ratio <ratio> | string | e.g. 16:9, 1:1 | | --n <count> | number | Number of images (default: 1) | | --subject-ref <params> | string | Subject reference: type=character,image=path-or-url | | --out-dir <dir> | string | Download images to directory | | --out-prefix <prefix> | string | Filename prefix (default: image) |

mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)

mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)

video generate

Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.

mmx video generate --prompt <text> [flags]

| Flag | Type | Description | |---|---|---| | --prompt <text> | string, required | Video description | | --model <model> | string | MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast | | --first-frame <path-or-url> | string | First frame image | | --callback-url <url> | string | Webhook URL for completion | | --download <path> | string | Save video to specific file | | --async | boolean | Return task ID immediately | | --no-wait | boolean | Same as --async | | --poll-interval <seconds> | number | Polling interval (default: 5) |

# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}

# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4

video task get

Query status of a video generation task.

mmx video task get --task-id <id> [--output json]

video download

Download a completed video by task ID.

mmx video download --file-id <id> [--out <path>]

speech synthesize

Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.

mmx speech synthesize --text <text> [flags]

| Flag | Type | Description | |---|---|---| | --text <text> | string | Text to synthesize | | --text-file <path> | string | Read text from file. Use - for stdin | | --model <model> | string | speech-2.8-hd (default), speech-2.6, speech-02 | | --voice <id> | string | Voice ID (default: English_expressive_narrator) | | --speed <n> | number | Speed multiplier | | --volume <n> | number | Volume level | | --pitch <n> | number | Pitch adjustment | | --format <fmt> | string | Audio format (default: mp3) | | --sample-rate <hz> | number | Sample rate (default: 32000) | | --bitrate <bps> | number | Bitrate (default: 128000) | | --channels <n> | number | Audio channels (default: 1) | | --language <code> | string | Language boost | | --subtitles | boolean | Include subtitle timing data | | --pronunciation <from/to> | string, repeatable | Custom pronunciation | | --sound-effect <effect> | string | Add sound effect | | --out <path> | string | Save audio to file | | --stream | boolean | Stream raw audio to stdout |

mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3

echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3

music generate

Generate music. Model: music-2.5. Responds well to rich, structured descriptions.

mmx music generate --prompt <text> [--lyrics <text>] [flags]

| Flag | Type | Description | |---|---|---| | --prompt <text> | string | Music style description (can be detailed) | | --lyrics <text> | string | Song lyrics with structure tags. Use "\u65e0\u6b4c\u8bcd" for instrumental. Cannot be used with --instrumental | | --lyrics-file <path> | string | Read lyrics from file. Use - for stdin | | --vocals <text> | string | Vocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies" | | --genre <text> | string | Music genre, e.g. folk, pop, jazz | | --mood <text> | string | Mood or emotion, e.g. warm, melancholic, uplifting | | --instruments <text> | string | Instruments to feature, e.g. "acoustic guitar, piano" | | --tempo <text> | string | Tempo description, e.g. fast, slow, moderate | | --bpm <number> | number | Exact tempo in beats per minute | | --key <text> | string | Musical key, e.g. C major, A minor, G sharp | | --avoid <text> | string | Elements to avoid in the generated music | | --use-case <text> | string | Use case context, e.g. "background music for video", "theme song" | | --structure <text> | string | Song structure, e.g. "verse-chorus-verse-bridge-chorus" | | --references <text> | string | Reference tracks or artists, e.g. "similar to Ed Sheeran" | | --extra <text> | string | Additional fine-grained requirements | | --instrumental | boolean | Generate instrumental music (no vocals). Cannot be used with --lyrics or --lyrics-file | | --aigc-watermark | boolean | Embed AI-generated content watermark | | --format <fmt> | string | Audio format (default: mp3) | | --sample-rate <hz> | number | Sample rate (default: 44100) | | --bitrate <bps> | number | Bitrate (default: 256000) | | --out <path> | string | Save audio to file | | --stream | boolean | Stream raw audio to stdout |

At least one of --prompt or --lyrics is required.

# Simple usage
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet

# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
  --vocals "male and female duet, harmonies in chorus" \
  --instruments "acoustic guitar, piano" \
  --bpm 95 \
  --lyrics-file song.txt \
  --out duet.mp3

# Instrumental (use --instrumental flag)
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3

vision describe

Image understanding via VLM. Provide either --image or --file-id, not both.

mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]

| Flag | Type | Description | |---|---|---| | --image <path-or-url> | string | Local path or URL (auto base64-encoded) | | --file-id <id> | string | Pre-uploaded file ID (skips base64) | | --prompt <text> | string | Question about the image (default: "Describe the image.") |

mmx vision describe --image photo.jpg --prompt "What breed?" --output json

stdout: description text (text mode) or full response (json mode).

search query

Web search via MiniMax.

mmx search query --q <query>

| Flag | Type | Description | |---|---|---| | --q <query> | string, required | Search query |

mmx search query --q "MiniMax AI" --output json --quiet

quota show

Display Token Plan usage and remaining quotas.

mmx quota show [--output json]

Tool Schema Export

Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:

# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema

# Single command
mmx config export-schema --command "video generate"

Use this to dynamically register mmx commands as tools in your agent framework.

Exit Codes

| Code | Meaning | |---|---| | 0 | Success | | 1 | General error | | 2 | Usage error (bad flags, missing args) | | 3 | Authentication error | | 4 | Quota exceeded | | 5 | Timeout | | 10 | Content filter triggered |

Piping Patterns

# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'

# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null

# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet

# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4

Configuration Precedence

CLI flags → environment variables → ~/.mmx/config.json → defaults.

# Persistent config
mmx config set --key region --value cn
mmx config show

# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn