返回 Skill 列表
extension
分类: 数据与分析无需 API Key

Master Data Matching

生产级主数据智能匹配系统。适用场景:供应商/客户/员工记录匹配、主数据去重、OCR提取结果解析等。

person作者: woaim65hubclawhub

Master Data Intelligent Matching System

Overview

A production-ready skill for intelligent entity resolution across business domains. It combines exact-match and vector-semantic retrieval, OCR field mapping with confidence coloring, and human-in-the-loop verification with active learning.

Usage

import mdm from './index.js';

// 1. Get supported domains
mdm.getSupportedDomains(); // ['procurement', 'finance', 'sales', 'hr']

// 2. Build OCR-to-schema mapping with confidence colors
const mapping = mdm.buildOcrSchemaMapping(ocrFields, 'procurement');

// 3. Run full matching pipeline
const result = mdm.runMatchingPipeline(ocrEntity, 'procurement', dbRecords);

// 4. Format result as summary
console.log(mdm.formatMatchingSummary(result));

Key Features

Business Domain Isolation

Four isolated schemas:

  • procurement — vendor records (vendor_name, vendor_code, tax_id, contact, etc.)
  • finance — company records (company_name, registration_number, fiscal_year_end, etc.)
  • sales — customer records (customer_name, customer_code, industry, credit_limit, etc.)
  • hr — employee records (employee_name, employee_id, id_number, department, etc.)

OCR Field to Schema Visual Line Mapping

buildOcrSchemaMapping(ocrFields, domain) maps raw OCR field names to schema fields with confidence colors:

| Color | Score | Meaning | |---------|-------------|----------------------------------| | 🟢 green | ≥ 0.92 | High confidence mapping | | 🟡 yellow | 0.70–0.92 | Medium confidence mapping | | 🔴 red | < 0.70 | Low confidence / unmapped | | 🔵 blue | db-only | Database field, no OCR data |

Dual-Path Entity Retrieval

dualPathEntityRetrieval(entity, domain, dbRecords) runs two parallel paths:

  1. Exact Match (threshold 0.92) — ALL critical fields must match exactly
  2. Vector Semantic (threshold 0.70) — weighted similarity across all fields

Results include needsHumanReview: true if confidence < 0.92 or no match found.

Field Value Verification

verifyFieldValues(ocrEntity, dbRecord, domain) returns 4-state verification per field:

| State | Meaning | |-------------|---------------------------------------------------| | match | OCR and DB values agree | | mismatch | Values differ (requires human resolution) | | new_info | Field only in OCR (new information) | | db_only | Field only in DB (not in OCR document) |

Human-in-the-Loop

Every pipeline result generates a hitlRequest with:

  • Mismatched fields highlighted
  • New info fields listed
  • Available review actions: confirm_match, reject_match, create_new, update_fields

Use processHumanDecision(decision, state) to process human feedback and generate learning payloads.

Active Learning

updateActiveLearning(payloads, stats) tracks:

  • Per-domain confirmation/rejection/new-record rates
  • Per-field error rates
  • Auto-adjusts thresholds when field error rate > 30%

Example

import mdm from './index.js';

// Sample OCR entity from a vendor invoice
const ocrVendor = {
  vendor_name: 'Acme Corporation Ltd',
  vendor_code: 'V-5001',
  tax_id: '91110000123456789X',
  contact_person: 'John Smith',
  email: 'john.smith@acme.com',
};

// Existing database records
const dbRecords = [
  {
    id: 'rec_001',
    vendor_name: 'Acme Corporation Ltd',
    vendor_code: 'V-5001',
    tax_id: '91110000123456789X',
    contact_person: 'John Smith',
    email: 'j.smith@acme.com',  // slight email mismatch
    phone: '+86-10-12345678',
    address: 'Beijing Chaoyang District',
    bank_account: '6222021234567890',
  },
];

// Run pipeline
const result = mdm.runMatchingPipeline(ocrVendor, 'procurement', dbRecords);
console.log(mdm.formatMatchingSummary(result));

// Process human decision
const decision = { action: 'confirm_match', notes: 'Email mismatch acceptable' };
const { status, learningPayload } = mdm.processHumanDecision(decision, {
  domain: 'procurement',
  ocrEntity: ocrVendor,
  matchResult: result.matchResult,
});

// Update active learning
const newStats = mdm.updateActiveLearning([learningPayload], {});

API Reference

| Function | Description | |-----------------------------------|------------------------------------------------| | getSupportedDomains() | List all supported business domains | | getDomainSchema(domain) | Get field schema for a domain | | buildOcrSchemaMapping(ocr, dom) | Map OCR fields to schema with confidence | | dualPathEntityRetrieval(...) | Run exact + semantic matching | | verifyFieldValues(...) | 4-state field verification | | runMatchingPipeline(...) | Full orchestration pipeline | | generateHitlReviewRequest(...) | Build human review request payload | | processHumanDecision(...) | Handle human feedback | | updateActiveLearning(...) | Update learning stats from decisions | | formatMatchingSummary(...) | Human-readable result summary |