LlamaIndex Patterns
Comprehensive implementation patterns, templates, and examples for building production-ready RAG (Retrieval-Augmented Generation) applications with LlamaIndex.
Overview
This skill provides complete, functional implementations for:
- RAG Pipeline Setup: End-to-end document indexing and querying
- Custom Retrievers: Advanced retrieval strategies with filtering and reranking
- LlamaCloud Integration: Managed parsing and hosted indices
- Chat Engines: Conversational AI with memory management
- Multi-Document RAG: Cross-document reasoning and source attribution
All scripts, templates, and examples are production-ready and fully functional.
Scripts
1. setup-llamaindex.sh
Automated LlamaIndex installation with dependency management and environment setup.
bash scripts/setup-llamaindex.sh
Features:
- Detects Python version and virtual environment
- Installs LlamaIndex core and common integrations
- Sets up vector stores (Chroma, Pinecone, Qdrant)
- Configures embedding models (OpenAI, HuggingFace)
- Creates .env template with all required API keys
- Generates requirements.txt for reproducibility
- Creates data and storage directories
- Validates installation
Output:
.envfile with API key templatesrequirements.txtwith pinned versionsdata/directory for documentsstorage/directory for persisted indices- Installation verification report
2. create-index.sh
Create a VectorStoreIndex from documents with progress tracking.
bash scripts/create-index.sh [data_dir] [storage_dir] [index_name]
Arguments:
data_dir: Directory containing documents (default:./data)storage_dir: Where to persist the index (default:./storage)index_name: Name for the index (default:default_index)
Features:
- Loads documents from multiple formats (.txt, .pdf, .md, .csv, .json, .html)
- Configures optimal embedding model
- Shows progress during indexing
- Displays document statistics
- Persists index to disk
- Tests index with sample query
- Provides usage instructions
Example:
bash scripts/create-index.sh ./documents ./indices my_knowledge_base
3. test-llamaindex.sh
Comprehensive validation tests for LlamaIndex installation and configuration.
bash scripts/test-llamaindex.sh
Tests:
- Python 3 installation and version
- LlamaIndex core package and version
- Core imports (VectorStoreIndex, Settings, etc.)
- Environment file existence and configuration
- API key setup (OpenAI, Anthropic, etc.)
- Vector store integrations (Chroma, Pinecone, Qdrant)
- Embedding models (OpenAI, HuggingFace)
- LLM integrations (OpenAI, Anthropic)
- Basic functionality (index creation)
- Data and storage directories
Output:
- ✓ for passing tests
- ✗ for critical failures
- ⚠ for warnings
- Exit code 0 for success, 1 for failures
Templates
1. basic-rag-pipeline.py
Complete RAG pipeline implementation with best practices.
Features:
- Document loading from directory
- Index creation and persistence
- Query with source attribution
- Interactive chat interface
- Configurable LLM and embedding models
- Error handling and validation
Key Components:
class BasicRAGPipeline:
def load_or_create_index() # Smart index loading/creation
def query() # Simple question answering
def query_with_sources() # Answers with citations
def chat() # Interactive chat mode
Usage:
from basic_rag_pipeline import BasicRAGPipeline
pipeline = BasicRAGPipeline(
data_dir="./data",
storage_dir="./storage",
model="gpt-4o-mini"
)
pipeline.load_or_create_index()
response = pipeline.query("What is LlamaIndex?")
Use Cases:
- Document Q&A systems
- Knowledge base queries
- Research assistants
- Documentation search
2. custom-retriever.py
Advanced retrieval strategies with filtering, reranking, and hybrid search.
Retrievers Included:
MetadataFilteredRetriever:
- Filter results by metadata (category, author, date, etc.)
- Multi-tenant applications
- Document versioning
- Access control
HybridRetriever:
- Combines semantic search with keyword matching
- Configurable weights for vector vs keyword scores
- Better results for specific terminology
- Handles exact phrase matching
RerankedRetriever:
- Two-stage retrieval (broad then narrow)
- Custom scoring with multiple factors
- Recency weighting
- Document quality scoring
Example:
from custom_retriever import MetadataFilteredRetriever
retriever = MetadataFilteredRetriever(
index=index,
similarity_top_k=10,
metadata_filters={"category": "technical", "year": 2024}
)
nodes = retriever.retrieve("How to deploy?")
Use Cases:
- Filtered search (by category, date, author)
- Improved accuracy with hybrid search
- Production systems requiring precise results
- Applications with diverse document types
3. llamacloud-integration.py
LlamaCloud managed services integration template.
Features:
- LlamaParse for complex document parsing
- Managed index hosting
- Production deployment patterns
- Enterprise-ready architecture
Components:
LlamaParse Integration:
- Parse complex PDFs with tables/charts
- Multi-column layout handling
- OCR for scanned documents
- Academic paper processing
Managed Indices:
- Automatic scaling
- High availability
- Built-in monitoring
- Version control
Example:
from llamacloud_integration import LlamaCloudRAG
rag = LlamaCloudRAG(api_key="your_key")
documents = rag.parse_with_llamaparse("complex.pdf")
rag.create_managed_index(documents, "prod-index")
Use Cases:
- Complex document parsing (PDFs with tables)
- Production RAG applications
- Enterprise deployments
- Scalable knowledge bases
Note: Requires LlamaCloud account and llama-parse package for full functionality. Template includes fallbacks for development.
Examples
1. question-answering.py
Complete Q&A system with citations and interactive mode.
Run:
python examples/question-answering.py
Features:
- Automatic sample document creation
- Pre-configured example queries
- Source attribution with relevance scores
- Interactive chat mode
- Streaming responses (advanced)
- Metadata-aware queries
Demonstrates:
- Document loading and indexing
- Query engine configuration
- Source node extraction
- Response synthesis modes
- Interactive user interfaces
2. chatbot-with-memory.py
Conversational AI with memory management and context awareness.
Run:
python examples/chatbot-with-memory.py
Features:
- Conversation history tracking
- Context-aware multi-turn dialogues
- Memory summarization
- Session management
- Custom system prompts
- Sample knowledge base creation
Components:
class ConversationalChatbot:
def load_knowledge_base() # Setup knowledge
def initialize_chat_engine() # Configure memory
def chat() # Send/receive messages
def reset_conversation() # Clear memory
def get_conversation_summary() # History summary
def interactive_mode() # CLI interface
Commands:
/help- Show available commands/reset- Reset conversation memory/summary- View conversation history/exit- Exit chatbot
Use Cases:
- Customer service chatbots
- Technical support assistants
- Interactive documentation
- Educational tutoring systems
3. multi-document-rag.py
Advanced RAG with cross-document reasoning and filtering.
Run:
python examples/multi-document-rag.py
Features:
- Multiple document handling
- Metadata-based filtering
- Cross-document queries
- Source attribution by document
- Document comparison
- Category-based search
Components:
class MultiDocumentRAG:
def build_index() # Index multiple docs
def query_by_category() # Filtered queries
def cross_document_query() # Search all docs
def compare_documents() # Compare specific docs
Demonstrates:
- Document metadata enrichment
- Category-based filtering
- Cross-document reasoning
- Source tracking by document
- Comparative analysis
- Interactive multi-doc search
Use Cases:
- Multi-source research
- Document comparison
- Categorized knowledge bases
- Enterprise document search
Usage Instructions
Initial Setup
Step 1: Install Dependencies
cd plugins/rag-pipeline/skills/llamaindex-patterns
bash scripts/setup-llamaindex.sh
Step 2: Configure API Keys
Edit .env file:
OPENAI_API_KEY=sk-your-actual-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here # Optional
Step 3: Validate Installation
bash scripts/test-llamaindex.sh
Building Your First RAG Application
Option 1: Using Scripts
# 1. Add your documents to ./data directory
mkdir -p data
cp /path/to/your/docs/* data/
# 2. Create index
bash scripts/create-index.sh data storage my_index
# 3. Use the index in your code
python -c "
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir='storage/my_index')
index = load_index_from_storage(storage_context)
response = index.as_query_engine().query('Your question?')
print(response)
"
Option 2: Using Templates
# Copy template to your project
cp templates/basic-rag-pipeline.py my_rag_app.py
# Customize and run
python my_rag_app.py
Option 3: Using Examples
# Run examples directly
python examples/question-answering.py
python examples/chatbot-with-memory.py
python examples/multi-document-rag.py
Integration Patterns
For Next.js Applications:
# Use in API route: app/api/chat/route.ts
# Create Python backend with FastAPI:
from fastapi import FastAPI
from basic_rag_pipeline import BasicRAGPipeline
app = FastAPI()
pipeline = BasicRAGPipeline()
pipeline.load_or_create_index()
@app.post("/query")
async def query(question: str):
response = pipeline.query(question)
return {"answer": response}
For FastAPI Projects:
# Integrate into existing FastAPI app
from contextlib import asynccontextmanager
from basic_rag_pipeline import BasicRAGPipeline
rag_pipeline = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global rag_pipeline
rag_pipeline = BasicRAGPipeline()
rag_pipeline.load_or_create_index()
yield
app = FastAPI(lifespan=lifespan)
For Standalone Python Applications:
# Use directly in your Python code
from basic_rag_pipeline import BasicRAGPipeline
def main():
pipeline = BasicRAGPipeline(
data_dir="./knowledge_base",
storage_dir="./indices"
)
pipeline.load_or_create_index()
while True:
question = input("Ask: ")
answer = pipeline.query(question)
print(f"Answer: {answer}")
Advanced Patterns
Custom Node Parsing
from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter(
chunk_size=512, # Smaller chunks for precise retrieval
chunk_overlap=50, # Overlap for context continuity
)
index = VectorStoreIndex.from_documents(
documents,
node_parser=node_parser
)
Multi-Index Routing
# Use custom retriever template for routing between indices
tech_index = VectorStoreIndex.from_documents(tech_docs)
business_index = VectorStoreIndex.from_documents(business_docs)
# Route queries based on content
if "technical" in query.lower():
response = tech_index.as_query_engine().query(query)
else:
response = business_index.as_query_engine().query(query)
Streaming Responses
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Your question")
for text in streaming_response.response_gen:
print(text, end="", flush=True)
Persisting and Loading Indices
# Persist
index.storage_context.persist(persist_dir="./storage")
# Load
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
Production Deployment
Environment-Specific Configuration
# Development
export OPENAI_API_KEY=sk-dev-key
export ENVIRONMENT=development
# Production
export OPENAI_API_KEY=sk-prod-key
export ENVIRONMENT=production
export REDIS_URL=redis://prod-cache:6379 # For caching
Monitoring and Observability
# Enable logging
import logging
logging.basicConfig(level=logging.INFO)
# Track usage
from llama_index.core import set_global_handler
set_global_handler("simple")
Error Handling
try:
response = pipeline.query(question)
except Exception as e:
logger.error(f"Query failed: {e}")
response = "I encountered an error. Please try again."
Rate Limiting
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=10, period=60) # 10 calls per minute
def query_with_rate_limit(question: str):
return pipeline.query(question)
Performance Optimization
Caching
# Enable response caching
from llama_index.core.storage.cache import SimpleCache
Settings.cache = SimpleCache()
Batch Processing
# Process multiple queries efficiently
questions = ["Q1", "Q2", "Q3"]
responses = [pipeline.query(q) for q in questions]
Index Optimization
# Use appropriate similarity_top_k
query_engine = index.as_query_engine(
similarity_top_k=3 # Lower for speed, higher for accuracy
)
Troubleshooting
API Key Issues
# Validate environment
bash scripts/test-llamaindex.sh
# Check .env file
cat .env | grep OPENAI_API_KEY
Import Errors
# Reinstall dependencies
bash scripts/setup-llamaindex.sh
# Verify installation
python -c "import llama_index; print(llama_index.__version__)"
Index Not Loading
# Check storage directory exists
import os
assert os.path.exists("./storage"), "Storage directory not found"
# Verify index files
assert os.path.exists("./storage/docstore.json"), "Index not persisted"
Out of Memory
# Reduce chunk size
node_parser = SentenceSplitter(chunk_size=256) # Smaller chunks
# Process documents in batches
for batch in document_batches:
batch_index = VectorStoreIndex.from_documents(batch)
# Merge indices
References
Official Documentation
- LlamaIndex Framework: https://developers.llamaindex.ai/python/framework/
- VectorStoreIndex: https://developers.llamaindex.ai/python/framework/understanding/
- Query Engines: https://developers.llamaindex.ai/python/framework/use_cases/q_and_a
- LlamaCloud: https://docs.cloud.llamaindex.ai/
GitHub Resources
- Examples: https://github.com/run-llama/llama_index/tree/main/docs/examples
- Cookbooks: https://github.com/run-llama/llama_index/tree/main/docs/cookbooks
Community
- Discord: https://discord.gg/dGcwcsnxhU
- Twitter: @llama_index
- Blog: https://blog.llamaindex.ai/
Best Practices
- Always persist indices - Avoid rebuilding on every run
- Use appropriate chunk sizes - Balance between context and precision
- Add metadata - Enables filtering and better organization
- Monitor token usage - Track costs in production
- Implement error handling - Graceful degradation for API failures
- Cache responses - Reduce API calls for common queries
- Version your indices - Track changes over time
- Test with real data - Validate with actual use cases
- Configure embeddings wisely - Match model to use case
- Document your setup - Record configurations and decisions
Scan to join WeChat group