Invoice Recognition Skill

Overview

This skill helps you deploy a complete invoice recognition system based on Qwen3-VL vision-language model with OpenVINO INT4 quantization. The system can automatically extract invoice information from PDF files and export structured data to Excel.

Key Features

Smart PDF Parsing: Automatically detects embedded images in PDF; falls back to page screenshot when no embedded images exist
VLM-based Invoice Recognition: Uses Qwen3-VL multimodal understanding for OCR, supporting 32 languages
Batch Excel Export: One-click export of multiple invoice recognition results to structured Excel
Custom Headers: Users can flexibly adjust export fields according to business needs
Streaming Output: Real-time display of recognition progress and results

Supported Invoice Fields

| Field | Description | |-------|-------------| | 发票号码 | Invoice number | | 开票日期 | Invoice date | | 购买方名称 | Buyer name | | 销售方名称 | Seller name | | 金额 | Amount (excluding tax) | | 税额 | Tax amount | | 价税合计 | Total amount (including tax) |

Deployment Steps

Step 1: Install Dependencies

pip install openvino nncf modelscope gradio pymupdf openpyxl pandas qwen-vl-utils Pillow huggingface_hub accelerate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install git+https://github.com/openvino-dev-samples/optimum-intel.git@2f62e5aee74b4acba3836e1f26678c0db0a09c00
pip install ipywidgets

Step 2: Download Model

Download the quantized Qwen3-VL-4B OpenVINO INT4 model from ModelScope:

from pathlib import Path

model_dir = Path("Qwen3-VL-4B-Instruct-int4-ov")

if not model_dir.exists():
    from modelscope import snapshot_download
    snapshot_download("snake7gun/Qwen3-VL-4B-Instruct-int4-ov", local_dir=str(model_dir))

Step 3: Setup Device Selection

Important: You must place notebook_utils.py in the same directory as your code. This file is required for device selection widget.

from notebook_utils import device_widget

device = device_widget(default="AUTO", exclude=["NPU"])

Step 4: Load Model

from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model = OVModelForVisualCausalLM.from_pretrained(model_dir, device=device.value)

min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained(
    model_dir, 
    min_pixels=min_pixels, 
    max_pixels=max_pixels
)

Step 5: Define Invoice Recognition Function

from PIL import Image
from transformers import TextStreamer
import json
import re

INVOICE_PROMPT = """你是一个专业的发票识别助手。请从发票图像中提取以下信息,并以JSON格式返回:
{
    "发票号码": "xxx",
    "开票日期": "xxxx年xx月xx日",
    "购买方名称": "xxx",
    "销售方名称": "xxx",
    "金额": xx.xx,
    "税额": xx.xx,
    "价税合计": xx.xx
}
如果某项信息无法识别,请填写"未识别"。只返回JSON格式,不要添加其他文字说明。"""

def recognize_invoice(image_path, prompt=INVOICE_PROMPT):
    image = Image.open(image_path)
    
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": str(image_path)},
                {"type": "text", "text": prompt}
            ]
        }
    ]
    
    inputs = processor.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_dict=True,
        return_tensors="pt"
    )
    
    generated_ids = model.generate(
        **inputs, 
        max_new_tokens=512,
        streamer=TextStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)
    )
    
    generated_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    
    json_match = re.search(r'\{[^}]+\}', generated_text, re.DOTALL)
    if json_match:
        result = json.loads(json_match.group())
        return result
    else:
        return {"raw_response": generated_text}

Step 6: PDF Extraction

import fitz
import tempfile

class PDFExtractor:
    @staticmethod
    def extract_images_from_pdf(pdf_path, output_dir=None):
        if output_dir is None:
            output_dir = tempfile.mkdtemp()
        
        Path(output_dir).mkdir(parents=True, exist_ok=True)
        
        doc = fitz.open(pdf_path)
        image_paths = []
        page_info = []
        
        for page_num in range(len(doc)):
            page = doc[page_num]
            image_list = page.get_images(full=True)
            
            if image_list:
                for img_index, img in enumerate(image_list):
                    xref = img[0]
                    base_image = doc.extract_image(xref)
                    image_bytes = base_image["image"]
                    
                    image_path = Path(output_dir) / f"page{page_num + 1}_img{img_index + 1}.png"
                    with open(image_path, "wb") as f:
                        f.write(image_bytes)
                    
                    image_paths.append(str(image_path))
                    page_info.append({"page": page_num + 1, "type": "embedded", "index": img_index + 1})
            else:
                pix = page.get_pixmap(dpi=200)
                image_path = Path(output_dir) / f"page_{page_num + 1}.png"
                pix.save(str(image_path))
                
                image_paths.append(str(image_path))
                page_info.append({"page": page_num + 1, "type": "screenshot", "index": 1})
        
        doc.close()
        return image_paths, page_info

Step 7: Excel Export

import pandas as pd
from datetime import datetime

DEFAULT_HEADERS = ["发票号码", "开票日期", "购买方名称", "销售方名称", "金额", "税额", "价税合计"]

class ExcelExporter:
    @staticmethod
    def export_to_excel(data, headers=None, output_path=None):
        if headers is None:
            headers = DEFAULT_HEADERS
        
        if output_path is None:
            output_path = Path(tempfile.gettempdir()) / f"invoices_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx"
        
        df = pd.DataFrame(data, columns=headers)
        df.to_excel(output_path, index=False, engine='openpyxl')
        
        return str(output_path)

Step 8: Launch Gradio Interface

import gradio as gr

def create_invoice_demo():
    with gr.Blocks(title="发票识别系统 - 基于Qwen3-VL") as demo:
        gr.Markdown("""<center><font size=6>📄 发票识别系统</font></center>
        <center><font size=3>基于Qwen3-VL视觉语言模型 · 智能PDF解析 · Excel导出</font></center>""")
        
        page_info_state = gr.State(value=None)
        results_cache = gr.State(value=[])
        
        with gr.Row():
            with gr.Column(scale=1):
                pdf_input = gr.File(label="选择PDF文件", file_types=[".pdf"])
                pdf_preview = gr.Gallery(label="PDF图片预览", columns=3, height=400)
                process_btn = gr.Button("智能解析PDF", variant="primary")
                status_text = gr.Textbox(label="处理状态", lines=3)
            
            with gr.Column(scale=1):
                invoice_gallery = gr.Gallery(label="提取的发票", columns=2, height=400)
                recognize_all_btn = gr.Button("识别全部发票", variant="primary")
                recognize_status = gr.Textbox(label="识别状态", lines=2)
        
        with gr.Row():
            results_table = gr.DataFrame(
                label="发票信息",
                headers=DEFAULT_HEADERS,
                datatype=["str", "str", "str", "str", "number", "number", "number"]
            )
            export_btn = gr.Button("导出Excel", variant="primary")
            export_file = gr.File(label="下载Excel")
        
        # Bind event handlers (see full implementation in notebook)
        # process_btn.click(...)
        # recognize_all_btn.click(...)
        # export_btn.click(...)
    
    return demo

demo = create_invoice_demo()
demo.launch(debug=True)

Important Notes

notebook_utils.py: This file must be in the same directory for device selection to work. Download it from the official Optimum Intel examples repository.
Network Requirements: Model download requires access to ModelScope. Ensure network connectivity.
Memory Requirements: Qwen3-VL-4B INT4 model requires approximately 4GB RAM for inference.
Device Selection: Use AUTO device by default, exclude NPU if not available.

Technology Stack

| Module | Technology | |--------|------------| | PDF Parsing | PyMuPDF (fitz) | | Vision Model | Qwen3-VL-4B (OpenVINO INT4) | | Frontend | Gradio | | Data Export | pandas + openpyxl | | Model Source | ModelScope (snake7gun/Qwen3-VL-4B-Instruct-int4-ov) |

Usage Scenarios

Batch Invoice Processing: Upload PDF containing multiple invoices, automatically extract and export to Excel
Invoice Verification: Quickly verify invoice information against database records
Financial Automation: Integrate with financial systems for automated data entry
Custom Fields: Modify headers and prompts to extract different invoice fields as needed