PDF RAG Knowledge Base Skill
This skill enables GitHub Copilot to search a locally-indexed knowledge base of PDF documentation (IC datasheets, FPGA manuals, technical specifications) using semantic search.
🎯 Fully Portable & Self-Contained
This skill is 100% self-contained in the .github/skills/pdf-rag-knowledge/ directory:
- ✅ Portable Python search script (
rag_search.py) - ✅ Repo-specific vector database (
vector_store.json) - ✅ Bash helper script (
search_rag.sh) - ✅ No external dependencies on project structure
Copy the entire folder to any repo to use it!
When to Use This Skill
Use this skill when users ask about:
- IC specifications (STM32, ESP32, microcontroller datasheets)
- FPGA documentation and configurations
- Hardware pin configurations and GPIO settings
- Register addresses and bit fields
- Timing specifications and electrical characteristics
- Communication protocols (I2C, SPI, UART, etc.) as documented in datasheets
- Power consumption and thermal specifications
- Any technical details that would be found in PDF datasheets
How It Works
- The user asks a question about hardware or technical specifications
- Copilot recognizes this matches the skill description
- The skill searches the indexed PDF knowledge base using semantic search
- Relevant content from datasheets is retrieved with source citations
- Copilot uses this context to provide accurate, sourced answers
Usage
Search the Knowledge Base
# Using the helper script
./search_rag.sh "your search query"
# Or directly with Python
python3 rag_search.py --search "GPIO configuration"
# Limit results
./search_rag.sh "FPGA power" 3
Index New PDFs
# Index a PDF
python3 rag_search.py --index path/to/datasheet.pdf
# Check status
python3 rag_search.py --stats
# Clear database
python3 rag_search.py --clear
Requirements
Python Dependencies:
requests- For Ollama API callsPyPDF2- For PDF indexing (only needed when adding PDFs)
External Service:
- Ollama running locally at
http://localhost:11434 - With model
mxbai-embed-largeinstalled
# Install dependencies
pip install requests PyPDF2
# Install Ollama and pull model
ollama pull mxbai-embed-large
File Structure
.github/skills/pdf-rag-knowledge/
├── SKILL.md # This file (skill definition)
├── rag_search.py # Portable search script
├── search_rag.sh # Bash helper script
└── vector_store.json # Repo-specific indexed PDFs
Examples
Example 1: GPIO Configuration
User: "How do I configure GPIO pins on STM32F407?"
Skill searches: ./search_rag.sh "GPIO configuration STM32F407"
Returns: Relevant sections from STM32F407 datasheet with page numbers
Example 2: FPGA Specifications
User: "What are the specifications for Artix-7 FPGAs?"
Skill searches: ./search_rag.sh "Artix-7 specifications"
Returns: Device specifications, logic resources, I/O counts
Example 3: Power Requirements
User: "What are the power requirements?"
Skill searches: ./search_rag.sh "power supply voltage requirements"
Returns: Voltage ranges, current consumption, power modes
Configuration
Environment variables (optional):
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=mxbai-embed-large
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=400
Making It Portable to Other Repos
Option 1: Copy the Entire Folder
# In your target repo
mkdir -p .github/skills
cp -r /path/to/source-repo/.github/skills/pdf-rag-knowledge .github/skills/
# Enable in VS Code
# Add to .vscode/settings.json:
{
"chat.useAgentSkills": true
}
Option 2: Fresh Start in New Repo
# In your new repo
mkdir -p .github/skills/pdf-rag-knowledge
cd .github/skills/pdf-rag-knowledge
# Copy just the scripts (not the vector store)
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/rag_search.py .
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/search_rag.sh .
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/SKILL.md .
# Index your repo-specific PDFs
python3 rag_search.py --index /path/to/your/pdfs/*.pdf
Each repo maintains its own vector_store.json with repo-specific documentation!
Technical Details
Search Process
- Query converted to 1024-dimension embedding via Ollama
- Cosine similarity calculated against all stored embeddings
- Top K most relevant chunks returned
- Results include similarity scores and source citations
Vector Store Format
JSON file with documents and embeddings:
{
"doc_id": {
"id": "unique_hash",
"content": "text chunk",
"embedding": [0.123, ...],
"source": "filename.pdf",
"page": 42,
"metadata": {...}
}
}
PDF Chunking
- Chunk Size: 2000 characters
- Overlap: 400 characters (preserves context)
- Min Size: 100 characters (filters noise)
Troubleshooting
Check Status
python3 rag_search.py --stats
Test Search
./search_rag.sh "test query"
Verify Ollama
curl http://localhost:11434/api/tags
Common Issues
No results found:
- Check if PDFs are indexed:
python3 rag_search.py --stats - Verify Ollama is running:
curl http://localhost:11434
Import errors:
- Install requirements:
pip install requests PyPDF2
Permission denied:
- Make scripts executable:
chmod +x *.sh *.py
Integration with VS Code Copilot
This skill integrates with GitHub Copilot through Agent Skills:
- Copilot detects hardware/datasheet questions
- Skill loads automatically (progressive disclosure)
- Search executes against repo-specific knowledge base
- Results seamlessly integrated into Copilot responses
- You don't manually invoke - just ask natural questions
Related Resources
- Ollama - Local embedding service
- Agent Skills Standard
- VS Code Agent Skills Docs
Examples
Example 1: GPIO Configuration
User: "How do I configure GPIO pins on STM32F407?"
Skill searches: ./search_rag.sh "GPIO configuration STM32F407"
Returns: Relevant sections from STM32F407 datasheet with page numbers
Example 2: FPGA Specifications
User: "What are the specifications for Artix-7 FPGAs?"
Skill searches: ./search_rag.sh "Artix-7 specifications"
Returns: Device specifications, logic resources, I/O counts
Example 3: Power Requirements
User: "What are the power requirements?"
Skill searches: ./search_rag.sh "power supply voltage requirements"
Returns: Voltage ranges, current consumption, power modes
Knowledge Base Management
Check Status
To see what's currently indexed:
python3 rag_search.py --stats
Index New PDFs
To add new documentation to the knowledge base:
python3 rag_search.py --index path/to/datasheet.pdf
Clear Database
To remove all indexed documents:
python3 rag_search.py --clear
Interactive Testing
Test searches directly:
./search_rag.sh "your query"
python3 rag_search.py --search "GPIO" --top-k 3
Technical Details
Search Process
- Query converted to 1024-dimension embedding via Ollama
- Cosine similarity calculated against all stored embeddings
- Top K most relevant chunks returned
- Results include similarity scores and source citations
Vector Store Format
JSON file with documents and embeddings:
{
"doc_id": {
"id": "unique_hash",
"content": "text chunk",
"embedding": [0.123, ...],
"source": "filename.pdf",
"page": 42,
"metadata": {...}
}
}
PDF Chunking
- Chunk Size: 2000 characters
- Overlap: 400 characters (preserves context)
- Min Size: 100 characters (filters noise)
Configuration
Environment variables (optional):
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=mxbai-embed-large
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=400
Important Notes
-
Repo-Specific: Each repository has its own
vector_store.jsonwith repo-specific documentation. -
Ollama Must Be Running: Ensure Ollama is running locally:
curl http://localhost:11434/api/tags -
Source Citations: Always reference the source document and page number when providing information from the knowledge base.
-
Context Limitations: The skill returns the most relevant chunks. For comprehensive answers, it may help to search multiple times with related queries.
Troubleshooting
Check Status
python3 rag_search.py --stats
Test Search
./search_rag.sh "test query"
Verify Ollama
curl http://localhost:11434/api/tags
Common Issues
No results found:
- Check if PDFs are indexed:
python3 rag_search.py --stats - Verify Ollama is running:
curl http://localhost:11434
Import errors:
- Install requirements:
pip install requests PyPDF2
Permission denied:
- Make scripts executable:
chmod +x *.sh *.py
Integration with VS Code Copilot
This skill integrates with GitHub Copilot through Agent Skills:
- Copilot detects hardware/datasheet questions
- Skill loads automatically (progressive disclosure)
- Search executes against repo-specific knowledge base
- Results seamlessly integrated into Copilot responses
- You don't manually invoke - just ask natural questions
Related Resources
- Ollama - Local embedding service
- Agent Skills Standard
- VS Code Agent Skills Docs
微信扫一扫