Class Diagram to Neo4j Extraction Skill
Overview
This skill extracts structured data from UML class diagrams (images) and populates Neo4j graph databases. It's designed for:
- TMF (TM Forum) API specification diagrams
- UML class diagrams
- Entity-relationship diagrams
- Schema diagrams
Workflow
1. Image Analysis
- Use vision models (GPT-4 Vision, Claude Vision, etc.) to analyze diagram images
- Extract text, boxes, lines, and relationships
- Identify entities, properties, and relationships
2. Structured Extraction
- Parse entities (classes) with their properties
- Extract relationships (associations, inheritance, etc.)
- Capture cardinality and relationship metadata
- Handle color coding and visual indicators
3. Data Normalization
- Convert to structured format (YAML/JSON)
- Normalize entity names and types
- Standardize relationship types
- Handle references and aliases
4. Neo4j Population
- Generate Cypher queries
- Create nodes with properties
- Create relationships with metadata
- Handle constraints and indexes
Usage Patterns
Pattern 1: Direct Image → Neo4j
from classdiagram_to_neo4j import extract_and_populate
# Extract from image and populate Neo4j
extract_and_populate(
image_path="diagrams/product_offering.png",
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Pattern 2: Extract → Review → Populate
from classdiagram_to_neo4j import extract_diagram, populate_neo4j
# Step 1: Extract to JSON/YAML
data = extract_diagram(
image_path="diagrams/product_offering.png",
output_format="json",
output_path="extracted.json"
)
# Step 2: Review/edit JSON if needed
# ... manual review ...
# Step 3: Populate Neo4j
populate_neo4j(
data=data,
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Pattern 3: Batch Processing
from classdiagram_to_neo4j import extract_diagram, populate_neo4j
# Process multiple diagrams
diagrams = [
"diagrams/product_offering.png",
"diagrams/category.png",
"diagrams/pricing.png"
]
for diagram_path in diagrams:
data = extract_diagram(diagram_path, output_format="json")
populate_neo4j(
data=data,
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Diagram Types Supported
TMF-Style Diagrams
- ProductOffering hub diagrams
- Category relationships
- Specification diagrams
- Reference entity diagrams
UML Class Diagrams
- Classes with attributes
- Associations with multiplicities
- Inheritance hierarchies
- Aggregations and compositions
Schema Diagrams
- Database schemas
- API schemas
- Domain models
Extraction Process
Step 1: Vision Analysis
The vision model analyzes the image and extracts:
- Entities: Boxes/classes with names
- Properties: Attributes within entities
- Relationships: Lines/arrows between entities
- Metadata: Cardinality, roles, types
- Visual Indicators: Colors, borders, dashed lines
Step 2: Structured Output
Extracted data is normalized into:
meta:
source: "diagrams/product_offering.png"
extracted_at: "2024-01-01T00:00:00Z"
diagram_type: "uml_class"
entities:
ProductOffering:
label: "ProductOffering"
properties:
- name: "id"
type: "string"
required: true
- name: "name"
type: "string"
required: true
- name: "isBundle"
type: "boolean"
required: false
relationships:
- from: "ProductOffering"
to: "ProductSpecification"
type: "has_specification"
cardinality: "0..1"
direction: "out"
properties:
role: null
Step 3: Neo4j Population
Generates Cypher queries:
// Create schema block
MERGE (sb:SchemaBlock {id: 'tmf620_productoffering'})
SET sb.title = 'ProductOffering Diagram',
sb.artifact = 'diagrams/productoffering.png';
// Create entities with FQN
MERGE (e:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
SET e.name = 'ProductOffering',
e.specId = 'tmf620_productoffering',
e.kind = 'Entity';
// Create fields
MERGE (f:Field {fqn: 'tmf620_productoffering#ProductOffering.name'})
SET f.name = 'name',
f.type = 'string',
f.required = true;
// Link field to entity
MATCH (e:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
MATCH (f:Field {fqn: 'tmf620_productoffering#ProductOffering.name'})
MERGE (e)-[:HAS_FIELD]->(f);
// Create relationships
MATCH (from:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
MATCH (to:Entity {fqn: 'tmf620_productoffering#ProductSpecification'})
MERGE (from)-[r:RELATES_TO {
type: 'has_specification',
fromCardinality: '0..1',
toCardinality: '1',
direction: 'out'
}]->(to);
Key Features
1. Scalable Data Model
- Uses stable labels (
:Entity,:RefType,:SchemaBlock) instead of per-class labels - Uses FQN (Fully Qualified Name) for entity identity:
<specId>#<entityName> - Uses generic
RELATES_TOrelationship type withtypeproperty - Avoids label explosion and supports namespacing
- See
references/SCALABLE_RELATIONSHIP_MODEL.md
2. Provenance Tracking
- Tracks source diagram via
SchemaBlocknodes - Uses FQN for entity identity (supports multiple versions)
- Maintains extraction metadata (
specId,extracted_at) - Links entities to schema blocks via
CONTAINS_ENTITY
3. Conflict Resolution
- Handles duplicate entities
- Merges properties intelligently
- Resolves relationship conflicts
4. Validation
- Validates extracted data structure before population
- Checks for missing required fields
- Verifies relationship consistency
- Validates cardinality formats
- Can be disabled with
--no-validateflag
5. Property Persistence
- Properties are stored as
:Fieldnodes - Fields linked to entities via
HAS_FIELDrelationships - Property metadata (type, required, default) fully persisted
Configuration
Vision Model Settings
vision:
provider: "openai" # or "anthropic"
model: "gpt-4o" # or "claude-3-5-sonnet-20241022"
max_tokens: 8000
temperature: 0.1
use_structured_output: true # Uses JSON mode when available
Neo4j Settings
neo4j:
uri: "bolt://localhost:7687"
user: "neo4j"
password: "password"
database: "neo4j"
create_constraints: true
create_indexes: true
Extraction Settings
extraction:
include_properties: true
include_methods: false
normalize_names: true
handle_references: true
extract_cardinality: true
Output Formats
YAML Format
See schema_examples/tmf620/productoffering_hub.core.example.yaml for example.
JSON Format
{
"meta": {
"source": "diagrams/product_offering.png",
"extracted_at": "2024-01-01T00:00:00Z"
},
"entities": {
"ProductOffering": {
"label": "ProductOffering",
"properties": [...]
}
},
"relationships": [...]
}
Cypher Format
See schema_examples/neo4j/tmf620_productoffering_scalable_model.cypher for example.
Integration with Existing Tools
With TMF MCP Builder
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "scripts"))
from extract_and_populate import extract_and_populate
from neo4j import GraphDatabase
# Extract and populate
extract_and_populate(
image_path="diagrams/tmf620_productoffering.png",
neo4j_password="password"
)
# Query for relevant subgraph
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
result = session.run("""
MATCH (e:Entity {name: 'ProductOffering'})-[r:RELATES_TO*1..2]->(related)
WHERE r.type IN ['has_specification', 'has_price']
RETURN e, r, related
""")
# Process results...
driver.close()
Best Practices
-
Pre-process Images
- Ensure high resolution
- Remove noise and artifacts
- Standardize format (PNG preferred)
-
Validate Extraction
- Review extracted YAML/JSON
- Verify entity names
- Check relationship cardinalities
-
Incremental Updates
- Use merge strategies
- Track changes
- Maintain provenance
-
Query Optimization
- Create indexes on common properties
- Use relationship type filters
- Limit hop depth
-
Error Handling
- Handle missing entities
- Validate relationships
- Log extraction issues
Examples
See examples/ directory for:
- Simple UML class diagram extraction
- TMF ProductOffering diagram extraction
- Batch processing example
- Custom extraction rules
References
references/SCALABLE_RELATIONSHIP_MODEL.md- Relationship modeling approachreferences/VISION_EXTRACTION_PROMPTS.md- Vision model promptsNEO4J_REQUIREMENTS.md- Neo4j server version requirementsschema_examples/neo4j/- Example Cypher scripts
Neo4j Server Requirements
Important: Relationship property indexes require Neo4j server version 4.3+.
- The
requirements.txtspecifies the Python driver version, not the server version - Check your Neo4j server version:
neo4j versionorCALL dbms.components() - See
NEO4J_REQUIREMENTS.mdfor full compatibility details
Troubleshooting
Common Issues
-
Low Extraction Quality
- Increase image resolution
- Use better vision model
- Provide more context in prompts
-
Missing Relationships
- Check diagram clarity
- Verify relationship detection logic
- Review extraction output
-
Neo4j Population Errors
- Check constraints
- Verify relationship types
- Review Cypher syntax
-
Performance Issues
- Batch operations
- Use transactions
- Create indexes
Future Enhancements
- Support for sequence diagrams
- Support for activity diagrams
- Multi-page diagram handling
- Automatic relationship inference
- Diagram versioning and diff
微信扫一扫