Image Reader - OCR Text Extraction

A high-performance OCR skill for extracting text from images. Powered by RapidOCR with PP-OCRv4 models, supporting Chinese and English text recognition.

Features

Multi-language: Chinese (simplified/traditional), English, and mixed text
High accuracy: PP-OCRv4 model with >95% accuracy on typical screenshots
Structured output: Text with confidence scores and bounding boxes
Image info: Dimensions, format, and color mode included
Fast: CPU-only, no GPU required

Quick Start

python scripts/read_image.py /path/to/image.jpg

Usage Examples

Extract text from a screenshot

python scripts/read_image.py screenshot.png

JSON Output

The script outputs structured JSON:

{
  "success": true,
  "text": "Full extracted text",
  "lines": [
    {
      "text": "Individual line",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ],
  "line_count": 5,
  "image_info": {
    "format": "PNG",
    "size": [1920, 1080],
    "mode": "RGB"
  }
}

Requirements

pip install rapidocr onnxruntime pillow

First run will download OCR models (~50MB) automatically.

Common Use Cases

UI Screenshots: Extract text from app/website screenshots
Document Photos: Read text from photographed documents
Diagrams: Extract labels and annotations
Receipts: Parse receipt/invoice data

Output Fields

| Field | Type | Description | |-------|------|-------------| | success | bool | Whether OCR succeeded | | text | string | All extracted text | | lines | array | Individual text lines with metadata | | line_count | int | Number of text lines detected | | image_info | object | Image metadata |

Technical Details

Engine: RapidOCR (ONNX Runtime backend)
Models: PP-OCRv4 (detection + recognition)
Languages: Chinese, English (auto-detected)
Performance: ~1-2 seconds per image on CPU

License

MIT License

Third-party dependencies:

RapidOCR - Apache 2.0 License
ONNX Runtime - MIT License