Step 1V 8K

Step 1V 8K for visual Q&A, multimodal analysis and image-text understanding

Published

Step 1V 8K is a StepFun model for visual Q&A, multimodal analysis and image-text understanding, commonly evaluated for Chinese assistants, document and multimodal workflows.

starsCapabilities

visibilityVision understandingstreamStreaming output

paymentsContext and pricing

Context limit—

Max output—

Input price¥5/ 1M tokens

Output price¥20/ 1M tokens

Cached input price¥1/ 1M tokens

descriptionOverview

Overview

Step 1V 8K is listed in StepFun's official platform documentation, with model ID step-1v-8k. Step models cover general chat, long context, lightweight high-volume usage and vision understanding.

Best for

Use Step 1V 8K for Chinese assistants, document Q&A, long-text analysis, low-latency services or image-text understanding. Test Chinese accuracy, context length, vision quality, cost and latency before production.

lightbulbUse cases

Chinese assistants
Document understanding and summarization
Knowledge-base Q&A
Image-text and multimodal analysis

thumb_upStrengths

Covers general, long-context, vision and lightweight tiers
Good fit for Chinese business scenarios
Easy to tier by context length
Useful for enterprise evaluation

infoLimitations

Capabilities vary by tier
Vision and long-context tasks need testing
Cost and latency depend on workload
Limits depend on StepFun documentation

linkReferences

open_in_newhttps://platform.stepfun.com/docs/overview/concept