返回 Skill 列表
extension
分类: 开发与工程无需 API Key

mcp-local-rag

提供分数解释(<0.3为好,>0.5跳过)、查询优化以及为query_documents、ingest_file、ingest_data工具的源命名。在使用RAG、搜索文档、导入文件、保存网页内容或处理PDF、HTML、DOCX、TXT、Markdown时,请使用此技能。

person作者: jakexiaohubgithub

MCP Local RAG Skills

Tools

| Tool | Use When | |------|----------| | ingest_file | Local files (PDF, DOCX, TXT, MD) | | ingest_data | Raw content (HTML, text) with source URL | | query_documents | Semantic + keyword hybrid search | | delete_file / list_files / status | Management |

Search: Core Rules

Hybrid search combines vector (semantic) and keyword (BM25).

Score Interpretation

Lower = better match. Use this to filter noise.

| Score | Action | |-------|--------| | < 0.3 | Use directly | | 0.3-0.5 | Include if mentions same concept/entity | | > 0.5 | Skip unless no better results |

Limit Selection

| Intent | Limit | |--------|-------| | Specific answer (function, error) | 5 | | General understanding | 10 | | Comprehensive survey | 20 |

Query Formulation

| Situation | Why Transform | Action | |-----------|---------------|--------| | Specific term mentioned | Keyword search needs exact match | KEEP term | | Vague query | Vector search needs semantic signal | ADD context | | Error stack or code block | Long text dilutes relevance | EXTRACT core keywords | | Multiple distinct topics | Single query conflates results | SPLIT queries | | Few/poor results | Term mismatch | EXPAND (see below) |

Query Expansion

When results are few or all score > 0.5, expand query terms:

  • Keep original term first, add 2-4 variants
  • Types: synonyms, abbreviations, related terms, word forms
  • Example: "config""config configuration settings configure"

Avoid over-expansion (causes topic drift).

Result Selection

When to include vs skip—based on answer quality, not just score.

INCLUDE if:

  • Directly answers the question
  • Provides necessary context
  • Score < 0.5

SKIP if:

  • Same keyword, unrelated context
  • Score > 0.7
  • Mentions term without explanation

Ingestion

ingest_file

ingest_file({ filePath: "/absolute/path/to/document.pdf" })

ingest_data

ingest_data({
  content: "<html>...</html>",
  metadata: { source: "https://example.com/page", format: "html" }
})

Format selection — match the data you have:

  • HTML string → format: "html"
  • Markdown string → format: "markdown"
  • Other → format: "text"

Source format:

  • Web page → Use URL: https://example.com/page
  • Other content → Use scheme: {type}://{date} or {type}://{date}/{detail}
    • Examples: clipboard://2024-12-30, chat://2024-12-30/project-discussion

HTML source options:

  • Static page → LLM fetch
  • SPA/JS-rendered → Browser MCP
  • Auth required → Manual paste

Re-ingest same source to update. Use same source in delete_file to remove.

References

For edge cases and examples: