PDF RAG Knowledge Base Skill

This skill enables GitHub Copilot to search a locally-indexed knowledge base of PDF documentation (IC datasheets, FPGA manuals, technical specifications) using semantic search.

🎯 Fully Portable & Self-Contained

This skill is 100% self-contained in the .github/skills/pdf-rag-knowledge/ directory:

✅ Portable Python search script (rag_search.py)
✅ Repo-specific vector database (vector_store.json)
✅ Bash helper script (search_rag.sh)
✅ No external dependencies on project structure

Copy the entire folder to any repo to use it!

When to Use This Skill

Use this skill when users ask about:

IC specifications (STM32, ESP32, microcontroller datasheets)
FPGA documentation and configurations
Hardware pin configurations and GPIO settings
Register addresses and bit fields
Timing specifications and electrical characteristics
Communication protocols (I2C, SPI, UART, etc.) as documented in datasheets
Power consumption and thermal specifications
Any technical details that would be found in PDF datasheets

How It Works

The user asks a question about hardware or technical specifications
Copilot recognizes this matches the skill description
The skill searches the indexed PDF knowledge base using semantic search
Relevant content from datasheets is retrieved with source citations
Copilot uses this context to provide accurate, sourced answers

Usage

Search the Knowledge Base

# Using the helper script
./search_rag.sh "your search query"

# Or directly with Python
python3 rag_search.py --search "GPIO configuration"

# Limit results
./search_rag.sh "FPGA power" 3

Index New PDFs

# Index a PDF
python3 rag_search.py --index path/to/datasheet.pdf

# Check status
python3 rag_search.py --stats

# Clear database
python3 rag_search.py --clear

Requirements

Python Dependencies:

requests - For Ollama API calls
PyPDF2 - For PDF indexing (only needed when adding PDFs)

External Service:

Ollama running locally at http://localhost:11434
With model mxbai-embed-large installed

# Install dependencies
pip install requests PyPDF2

# Install Ollama and pull model
ollama pull mxbai-embed-large

File Structure

.github/skills/pdf-rag-knowledge/
├── SKILL.md              # This file (skill definition)
├── rag_search.py         # Portable search script
├── search_rag.sh         # Bash helper script
└── vector_store.json     # Repo-specific indexed PDFs

Examples

Example 1: GPIO Configuration

User: "How do I configure GPIO pins on STM32F407?"

Skill searches: ./search_rag.sh "GPIO configuration STM32F407"

Returns: Relevant sections from STM32F407 datasheet with page numbers

Example 2: FPGA Specifications

User: "What are the specifications for Artix-7 FPGAs?"

Skill searches: ./search_rag.sh "Artix-7 specifications"

Returns: Device specifications, logic resources, I/O counts

Example 3: Power Requirements

User: "What are the power requirements?"

Skill searches: ./search_rag.sh "power supply voltage requirements"

Returns: Voltage ranges, current consumption, power modes

Configuration

Environment variables (optional):

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=mxbai-embed-large
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=400

Making It Portable to Other Repos

Option 1: Copy the Entire Folder

# In your target repo
mkdir -p .github/skills
cp -r /path/to/source-repo/.github/skills/pdf-rag-knowledge .github/skills/

# Enable in VS Code
# Add to .vscode/settings.json:
{
  "chat.useAgentSkills": true
}

Option 2: Fresh Start in New Repo

# In your new repo
mkdir -p .github/skills/pdf-rag-knowledge
cd .github/skills/pdf-rag-knowledge

# Copy just the scripts (not the vector store)
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/rag_search.py .
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/search_rag.sh .
cp /path/to/source-repo/.github/skills/pdf-rag-knowledge/SKILL.md .

# Index your repo-specific PDFs
python3 rag_search.py --index /path/to/your/pdfs/*.pdf

Each repo maintains its own vector_store.json with repo-specific documentation!

Technical Details

Search Process

Query converted to 1024-dimension embedding via Ollama
Cosine similarity calculated against all stored embeddings
Top K most relevant chunks returned
Results include similarity scores and source citations

Vector Store Format

JSON file with documents and embeddings:

{
  "doc_id": {
    "id": "unique_hash",
    "content": "text chunk",
    "embedding": [0.123, ...],
    "source": "filename.pdf",
    "page": 42,
    "metadata": {...}
  }
}

PDF Chunking

Chunk Size: 2000 characters
Overlap: 400 characters (preserves context)
Min Size: 100 characters (filters noise)

Troubleshooting

Check Status

python3 rag_search.py --stats

Test Search

./search_rag.sh "test query"

Verify Ollama

curl http://localhost:11434/api/tags

Common Issues

No results found:

Check if PDFs are indexed: python3 rag_search.py --stats
Verify Ollama is running: curl http://localhost:11434

Import errors:

Install requirements: pip install requests PyPDF2

Permission denied:

Make scripts executable: chmod +x *.sh *.py

Integration with VS Code Copilot

This skill integrates with GitHub Copilot through Agent Skills:

Copilot detects hardware/datasheet questions
Skill loads automatically (progressive disclosure)
Search executes against repo-specific knowledge base
Results seamlessly integrated into Copilot responses
You don't manually invoke - just ask natural questions

Related Resources

Examples

Example 1: GPIO Configuration

User: "How do I configure GPIO pins on STM32F407?"

Skill searches: ./search_rag.sh "GPIO configuration STM32F407"

Returns: Relevant sections from STM32F407 datasheet with page numbers

Example 2: FPGA Specifications

User: "What are the specifications for Artix-7 FPGAs?"

Skill searches: ./search_rag.sh "Artix-7 specifications"

Returns: Device specifications, logic resources, I/O counts

Example 3: Power Requirements

User: "What are the power requirements?"

Skill searches: ./search_rag.sh "power supply voltage requirements"

Returns: Voltage ranges, current consumption, power modes

Knowledge Base Management

Check Status

To see what's currently indexed:

python3 rag_search.py --stats

Index New PDFs

To add new documentation to the knowledge base:

python3 rag_search.py --index path/to/datasheet.pdf

Clear Database

To remove all indexed documents:

python3 rag_search.py --clear

Interactive Testing

Test searches directly:

./search_rag.sh "your query"
python3 rag_search.py --search "GPIO" --top-k 3

Technical Details

Search Process

Query converted to 1024-dimension embedding via Ollama
Cosine similarity calculated against all stored embeddings
Top K most relevant chunks returned
Results include similarity scores and source citations

Vector Store Format

JSON file with documents and embeddings:

{
  "doc_id": {
    "id": "unique_hash",
    "content": "text chunk",
    "embedding": [0.123, ...],
    "source": "filename.pdf",
    "page": 42,
    "metadata": {...}
  }
}

PDF Chunking

Chunk Size: 2000 characters
Overlap: 400 characters (preserves context)
Min Size: 100 characters (filters noise)

Configuration

Environment variables (optional):

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=mxbai-embed-large
export CHUNK_SIZE=2000
export CHUNK_OVERLAP=400

Important Notes

Repo-Specific: Each repository has its own vector_store.json with repo-specific documentation.
Ollama Must Be Running: Ensure Ollama is running locally:
```
curl http://localhost:11434/api/tags
```
Source Citations: Always reference the source document and page number when providing information from the knowledge base.
Context Limitations: The skill returns the most relevant chunks. For comprehensive answers, it may help to search multiple times with related queries.

Troubleshooting

Check Status

python3 rag_search.py --stats

Test Search

./search_rag.sh "test query"

Verify Ollama

curl http://localhost:11434/api/tags

Common Issues

No results found:

Check if PDFs are indexed: python3 rag_search.py --stats
Verify Ollama is running: curl http://localhost:11434

Import errors:

Install requirements: pip install requests PyPDF2

Permission denied:

Make scripts executable: chmod +x *.sh *.py

Integration with VS Code Copilot

This skill integrates with GitHub Copilot through Agent Skills:

Copilot detects hardware/datasheet questions
Skill loads automatically (progressive disclosure)
Search executes against repo-specific knowledge base
Results seamlessly integrated into Copilot responses
You don't manually invoke - just ask natural questions