Invoice Recognition Skill
Overview
This skill helps you deploy a complete invoice recognition system based on Qwen3-VL vision-language model with OpenVINO INT4 quantization. The system can automatically extract invoice information from PDF files and export structured data to Excel.
Key Features
- Smart PDF Parsing: Automatically detects embedded images in PDF; falls back to page screenshot when no embedded images exist
- VLM-based Invoice Recognition: Uses Qwen3-VL multimodal understanding for OCR, supporting 32 languages
- Batch Excel Export: One-click export of multiple invoice recognition results to structured Excel
- Custom Headers: Users can flexibly adjust export fields according to business needs
- Streaming Output: Real-time display of recognition progress and results
Supported Invoice Fields
| Field | Description | |-------|-------------| | 发票号码 | Invoice number | | 开票日期 | Invoice date | | 购买方名称 | Buyer name | | 销售方名称 | Seller name | | 金额 | Amount (excluding tax) | | 税额 | Tax amount | | 价税合计 | Total amount (including tax) |
Deployment Steps
Step 1: Install Dependencies
pip install openvino nncf modelscope gradio pymupdf openpyxl pandas qwen-vl-utils Pillow huggingface_hub accelerate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install git+https://github.com/openvino-dev-samples/optimum-intel.git@2f62e5aee74b4acba3836e1f26678c0db0a09c00
pip install ipywidgets
Step 2: Download Model
Download the quantized Qwen3-VL-4B OpenVINO INT4 model from ModelScope:
from pathlib import Path
model_dir = Path("Qwen3-VL-4B-Instruct-int4-ov")
if not model_dir.exists():
from modelscope import snapshot_download
snapshot_download("snake7gun/Qwen3-VL-4B-Instruct-int4-ov", local_dir=str(model_dir))
Step 3: Setup Device Selection
Important: You must place notebook_utils.py in the same directory as your code. This file is required for device selection widget.
from notebook_utils import device_widget
device = device_widget(default="AUTO", exclude=["NPU"])
Step 4: Load Model
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor
model = OVModelForVisualCausalLM.from_pretrained(model_dir, device=device.value)
min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained(
model_dir,
min_pixels=min_pixels,
max_pixels=max_pixels
)
Step 5: Define Invoice Recognition Function
from PIL import Image
from transformers import TextStreamer
import json
import re
INVOICE_PROMPT = """你是一个专业的发票识别助手。请从发票图像中提取以下信息,并以JSON格式返回:
{
"发票号码": "xxx",
"开票日期": "xxxx年xx月xx日",
"购买方名称": "xxx",
"销售方名称": "xxx",
"金额": xx.xx,
"税额": xx.xx,
"价税合计": xx.xx
}
如果某项信息无法识别,请填写"未识别"。只返回JSON格式,不要添加其他文字说明。"""
def recognize_invoice(image_path, prompt=INVOICE_PROMPT):
image = Image.open(image_path)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": str(image_path)},
{"type": "text", "text": prompt}
]
}
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
streamer=TextStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)
)
generated_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
json_match = re.search(r'\{[^}]+\}', generated_text, re.DOTALL)
if json_match:
result = json.loads(json_match.group())
return result
else:
return {"raw_response": generated_text}
Step 6: PDF Extraction
import fitz
import tempfile
class PDFExtractor:
@staticmethod
def extract_images_from_pdf(pdf_path, output_dir=None):
if output_dir is None:
output_dir = tempfile.mkdtemp()
Path(output_dir).mkdir(parents=True, exist_ok=True)
doc = fitz.open(pdf_path)
image_paths = []
page_info = []
for page_num in range(len(doc)):
page = doc[page_num]
image_list = page.get_images(full=True)
if image_list:
for img_index, img in enumerate(image_list):
xref = img[0]
base_image = doc.extract_image(xref)
image_bytes = base_image["image"]
image_path = Path(output_dir) / f"page{page_num + 1}_img{img_index + 1}.png"
with open(image_path, "wb") as f:
f.write(image_bytes)
image_paths.append(str(image_path))
page_info.append({"page": page_num + 1, "type": "embedded", "index": img_index + 1})
else:
pix = page.get_pixmap(dpi=200)
image_path = Path(output_dir) / f"page_{page_num + 1}.png"
pix.save(str(image_path))
image_paths.append(str(image_path))
page_info.append({"page": page_num + 1, "type": "screenshot", "index": 1})
doc.close()
return image_paths, page_info
Step 7: Excel Export
import pandas as pd
from datetime import datetime
DEFAULT_HEADERS = ["发票号码", "开票日期", "购买方名称", "销售方名称", "金额", "税额", "价税合计"]
class ExcelExporter:
@staticmethod
def export_to_excel(data, headers=None, output_path=None):
if headers is None:
headers = DEFAULT_HEADERS
if output_path is None:
output_path = Path(tempfile.gettempdir()) / f"invoices_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx"
df = pd.DataFrame(data, columns=headers)
df.to_excel(output_path, index=False, engine='openpyxl')
return str(output_path)
Step 8: Launch Gradio Interface
import gradio as gr
def create_invoice_demo():
with gr.Blocks(title="发票识别系统 - 基于Qwen3-VL") as demo:
gr.Markdown("""<center><font size=6>📄 发票识别系统</font></center>
<center><font size=3>基于Qwen3-VL视觉语言模型 · 智能PDF解析 · Excel导出</font></center>""")
page_info_state = gr.State(value=None)
results_cache = gr.State(value=[])
with gr.Row():
with gr.Column(scale=1):
pdf_input = gr.File(label="选择PDF文件", file_types=[".pdf"])
pdf_preview = gr.Gallery(label="PDF图片预览", columns=3, height=400)
process_btn = gr.Button("智能解析PDF", variant="primary")
status_text = gr.Textbox(label="处理状态", lines=3)
with gr.Column(scale=1):
invoice_gallery = gr.Gallery(label="提取的发票", columns=2, height=400)
recognize_all_btn = gr.Button("识别全部发票", variant="primary")
recognize_status = gr.Textbox(label="识别状态", lines=2)
with gr.Row():
results_table = gr.DataFrame(
label="发票信息",
headers=DEFAULT_HEADERS,
datatype=["str", "str", "str", "str", "number", "number", "number"]
)
export_btn = gr.Button("导出Excel", variant="primary")
export_file = gr.File(label="下载Excel")
# Bind event handlers (see full implementation in notebook)
# process_btn.click(...)
# recognize_all_btn.click(...)
# export_btn.click(...)
return demo
demo = create_invoice_demo()
demo.launch(debug=True)
Important Notes
-
notebook_utils.py: This file must be in the same directory for device selection to work. Download it from the official Optimum Intel examples repository.
-
Network Requirements: Model download requires access to ModelScope. Ensure network connectivity.
-
Memory Requirements: Qwen3-VL-4B INT4 model requires approximately 4GB RAM for inference.
-
Device Selection: Use
AUTOdevice by default, excludeNPUif not available.
Technology Stack
| Module | Technology | |--------|------------| | PDF Parsing | PyMuPDF (fitz) | | Vision Model | Qwen3-VL-4B (OpenVINO INT4) | | Frontend | Gradio | | Data Export | pandas + openpyxl | | Model Source | ModelScope (snake7gun/Qwen3-VL-4B-Instruct-int4-ov) |
Usage Scenarios
- Batch Invoice Processing: Upload PDF containing multiple invoices, automatically extract and export to Excel
- Invoice Verification: Quickly verify invoice information against database records
- Financial Automation: Integrate with financial systems for automated data entry
- Custom Fields: Modify headers and prompts to extract different invoice fields as needed
Scan to contact