When to Use
User wants to implement, improve, or troubleshoot Retrieval-Augmented Generation systems.
Quick Reference
| Topic | File |
|-------|------|
| Pipeline components & architecture | architecture.md |
| Implementation patterns & code | implementation.md |
| Evaluation metrics & debugging | evaluation.md |
| Security & compliance | security.md |
Core Capabilities
- Architecture design — Select embedding models, vector DBs, and chunking strategies based on requirements
- Implementation — Write ingestion pipelines, query handlers, and update logic
- Retrieval optimization — Tune top-k, reranking, hybrid search parameters
- Evaluation — Build test datasets, measure recall/precision, diagnose failures
- Production ops — Monitor quality drift, set up alerts, debug degradation
- Security — PII detection, access control, compliance requirements
Decision Checklist
Before recommending architecture, ask:
- [ ] What document types and volume?
- [ ] Latency requirements (real-time chat vs batch)?
- [ ] Update frequency (how often do docs change)?
- [ ] Access control needs (who can see what)?
- [ ] Compliance constraints (GDPR, HIPAA, SOC2)?
- [ ] Budget (managed vs self-hosted, embedding costs)?
Critical Rules
- Never skip access control — Filter at retrieval time, not after
- Always overlap chunks — 10-20% prevents context loss at boundaries
- Evaluate before optimizing — Build eval dataset first, then tune
- Same embedding model — Query and documents must use identical model
- Monitor similarity scores — Dropping averages signal drift or issues
- Plan for deletion — GDPR erasure requires re-embedding capability
Common Failure Patterns
| Symptom | Likely Cause | Fix | |---------|--------------|-----| | Wrong docs retrieved | Query too vague, poor chunks | Query expansion, smaller chunks | | Relevant doc missed | Not indexed, low similarity | Check ingestion, hybrid search | | Hallucinated answers | Context too short | Increase top-k, better reranking | | Slow responses | Large chunks, no caching | Optimize chunk size, cache embeddings | | Inconsistent results | Non-deterministic reranking | Set seeds, use stable sorting |
扫码联系在线客服