Back to skills
extension
Category: Content & MediaNo API key required

entity-resolution

Identifies and merges duplicate entities in Knowledge Graph projects.

personAuthor: jakexiaohubgithub

Entity Resolution Skill

Identifies and merges duplicate entities in Knowledge Graph projects.

When to Use

  • Proactively after extraction: "I extracted 15 entities. Let me check for potential duplicates..."
  • On user request: "Can you check for duplicates?" or "These seem like the same person"
  • When graph seems noisy: Multiple similar-looking nodes that may be the same entity

Workflow

1. Scan for Duplicates

Use find_duplicate_entities with project_id

The tool uses multiple signals to detect duplicates:

  • String similarity (Jaro-Winkler on labels)
  • Alias overlap (Jaccard similarity)
  • Type matching (same entity type bonus)
  • Graph context (shared neighbors)

2. Apply Confidence-Based Automation

| Confidence | Action | |------------|--------| | >= 0.9 (HIGH) | Auto-merge with merge_entities_tool. Inform user: "I merged X and Y (95% confident they're the same)" | | 0.7-0.9 (MEDIUM) | Ask user: "I found potential duplicates: X and Y (82% match). Should I merge them?" | | < 0.7 (LOW) | Mention if relevant: "X and Y might be related but confidence is low (65%)" |

3. Handle User Decisions

  • If user approves: approve_merge or merge_entities_tool
  • If user rejects: reject_merge
  • If user wants more info: compare_entities_semantic for detailed analysis

Proactive Triggers

After any extract_to_kg operation, automatically:

  1. Call find_duplicate_entities
  2. Process HIGH confidence matches silently (auto-merge)
  3. Report MEDIUM confidence matches to user
  4. Mention LOW confidence only if user asks

Example Responses

After extraction:

"I extracted 12 entities and 8 relationships. I noticed 'Elon Musk' and 'Musk' appear to be the same person (94% confidence), so I merged them. I also found 'SpaceX' and 'Space X' might be duplicates (78% confidence) - should I merge these too?"

On duplicate scan:

"I scanned for duplicates and found 3 potential matches:

  1. 'OpenAI' and 'Open AI' (91% - auto-merged)
  2. 'Sam Altman' and 'Samuel Altman' (85% - awaiting your approval)
  3. 'Microsoft' and 'MS' (68% - low confidence, skipped)"

When comparing entities:

"Comparing 'Dr. John Smith' and 'J. Smith':

| Signal | Score | |--------|-------| | Name similarity | 72% | | Same type (Person) | Yes | | Shared connections | 3 |

Overall: 78% match. They share connections to MIT and OpenAI. Would you like to merge them?"

Available Tools

| Tool | Description | |------|-------------| | find_duplicate_entities | Scan for duplicates in a project | | merge_entities_tool | Execute a merge directly (for high confidence) | | review_pending_merges | See pending candidates awaiting approval | | approve_merge | Approve a pending candidate | | reject_merge | Reject a pending candidate | | compare_entities_semantic | Deep comparison of two specific entities |

Tool Parameters

find_duplicate_entities

{
  "project_id": "abc123",
  "min_confidence": 0.7
}

merge_entities_tool

{
  "project_id": "abc123",
  "survivor_id": "node_to_keep",
  "merged_id": "node_to_remove"
}

compare_entities_semantic

{
  "project_id": "abc123",
  "node_a_id": "first_entity_id",
  "node_b_id": "second_entity_id"
}

Merge Behavior

When entities are merged:

  1. Survivor keeps its primary label
  2. Merged entity's label becomes an alias of survivor
  3. All aliases transfer to survivor
  4. All relationships redirect to survivor
  5. Properties merge (survivor wins on conflict)
  6. Source IDs combine for provenance tracking

Error Handling

| Issue | Response | |-------|----------| | No project selected | "Please select a Knowledge Graph project first" | | Empty graph | "Your graph doesn't have any entities yet. Extract content first" | | No duplicates found | "No potential duplicates found above the confidence threshold" | | Entity not found | "Entity 'X' was not found. It may have been merged or deleted" |

Follow-Up Suggestions Format

After presenting duplicate scan results, offer interactive follow-ups:

### Explore Further

- "Merge Sam Altman and Samuel Altman" - Merge these entities
- "Compare Sam Altman and Samuel Altman" - See detailed similarity analysis
- "Reject the Sam Altman merge" - Keep them as separate entities
- "Show me all pending merges" - Review all candidates

Integration with KG Insights

After merging entities, the graph may reveal new insights:

  • "After merging, [Entity] is now connected to 5 more entities"
  • "The merge resolved an isolated topic - [Entity] now links to the main graph"
  • "Consider running ask_about_graph with question_type: key_entities to see updated rankings"

Best Practices

  1. Be transparent - Always explain what was merged and why
  2. Preserve information - Merged labels become aliases, nothing is lost
  3. Ask when uncertain - Only auto-merge above 90% confidence
  4. Show evidence - Include signal breakdown for user decisions
  5. Suggest next steps - Offer to scan again or explore the updated graph