Back to skills
extension
Category: OtherNo API key required

Image Reader

Extract text from images using OCR (Optical Character Recognition). Use this skill when you need to read text content from images, screenshots, photos, or an...

personAuthor: rendaixue-bytehubclawhub

Image Reader - OCR Text Extraction

A high-performance OCR skill for extracting text from images. Powered by RapidOCR with PP-OCRv4 models, supporting Chinese and English text recognition.

Features

  • Multi-language: Chinese (simplified/traditional), English, and mixed text
  • High accuracy: PP-OCRv4 model with >95% accuracy on typical screenshots
  • Structured output: Text with confidence scores and bounding boxes
  • Image info: Dimensions, format, and color mode included
  • Fast: CPU-only, no GPU required

Quick Start

python scripts/read_image.py /path/to/image.jpg

Usage Examples

Extract text from a screenshot

python scripts/read_image.py screenshot.png

JSON Output

The script outputs structured JSON:

{
  "success": true,
  "text": "Full extracted text",
  "lines": [
    {
      "text": "Individual line",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ],
  "line_count": 5,
  "image_info": {
    "format": "PNG",
    "size": [1920, 1080],
    "mode": "RGB"
  }
}

Requirements

pip install rapidocr onnxruntime pillow

First run will download OCR models (~50MB) automatically.

Common Use Cases

  • UI Screenshots: Extract text from app/website screenshots
  • Document Photos: Read text from photographed documents
  • Diagrams: Extract labels and annotations
  • Receipts: Parse receipt/invoice data

Output Fields

| Field | Type | Description | |-------|------|-------------| | success | bool | Whether OCR succeeded | | text | string | All extracted text | | lines | array | Individual text lines with metadata | | line_count | int | Number of text lines detected | | image_info | object | Image metadata |

Technical Details

  • Engine: RapidOCR (ONNX Runtime backend)
  • Models: PP-OCRv4 (detection + recognition)
  • Languages: Chinese, English (auto-detected)
  • Performance: ~1-2 seconds per image on CPU

License

MIT License

Third-party dependencies:

  • RapidOCR - Apache 2.0 License
  • ONNX Runtime - MIT License