Zotero Literature Verification
Complete workflow for verifying academic literature citations using Zotero MCP with 100% word-by-word PDF reading and token budget management.
Quick Start
When invoked with /zotero-literature-verification, guide the user through:
- Token Budget Estimation - Calculate required tokens based on page count
- PDF Extraction - Extract complete PDF text using PyMuPDF
- Sequential Reading - Read every word from first to last page in chunks
- Citation Verification - Verify citations with line numbers and exact quotes
- Reference Generation - Generate ACM-formatted reference list from Zotero
Zero-Tolerance Protocol ⚠️
CRITICAL: When verifying citations, you MUST:
- ✅ Extract COMPLETE PDF text to /tmp
- ✅ Read EVERY word from first page to last page
- ✅ Record line numbers for all verified citations
- ✅ NEVER use keyword search as substitute for complete reading
- ✅ Monitor token usage after each paper
FORBIDDEN:
- ❌ Reading only Abstract and Conclusion
- ❌ Using grep/search without full text reading
- ❌ Skipping middle sections of papers
- ❌ Assuming content without reading
Token Budget Estimation
Formula: Pages × 700 + 1000 = Estimated Tokens
| Papers | Pages | Est. Tokens | Safe? | |--------|-------|-------------|-------| | 1-2 | 10-20 | ~20k | ✅ Safe | | 3-4 | 30-40 | ~50k | ✅ Safe | | 5-6 | 50-90 | ~90k | ✅ Safe | | 7+ | 100+ | ~140k+ | ⚠️ Split sessions |
Token Zones:
- Safe: < 100k used (100k+ remaining)
- Caution: 100-150k used (50-100k remaining)
- Danger: > 150k used (< 50k remaining) → Pause and wait for user
Workflow
Step 1: Extract PDF
import fitz
doc = fitz.open('/Users/Zhuanz/Zotero/storage/XXXXXXXX/Paper.pdf')
text = '\n'.join([p.get_text() for p in doc])
with open('/tmp/paper_full.txt', 'w') as f:
f.write(text)
print(f"✅ {doc.page_count} pages, {len(text)} chars")
Step 2: Read Sequentially
Read in chunks of 250-300 lines:
# Chunk 1: Lines 0-250 (Title, Abstract, Introduction)
Read("/tmp/paper_full.txt", offset=0, limit=250)
# Chunk 2: Lines 250-500 (Methods, early Results)
Read("/tmp/paper_full.txt", offset=250, limit=250)
# Chunk 3: Lines 500-750 (Results, Discussion)
Read("/tmp/paper_full.txt", offset=500, limit=250)
# Chunk 4: Lines 750-1000 (Conclusion, References)
Read("/tmp/paper_full.txt", offset=750, limit=250)
Step 3: Verify Citations
After complete reading, locate exact quotes:
# Find exact line number
grep -n "exact quoted phrase" /tmp/paper_full.txt
# Read context (±20 lines)
Read("/tmp/paper_full.txt", offset=436, limit=40)
Step 4: Get Metadata from Zotero
Use Zotero MCP tools:
mcp__zotero__zotero_get_item_metadata- Get complete metadatamcp__zotero__zotero_search_items- Search by author/year/titlemcp__zotero__zotero_get_item_fulltext- Get PDF text
Step 5: Generate Report
## Verification Report
| Paper | Pages | Status | Issues |
|-------|-------|--------|--------|
| Author 2025 | 15 | ✅ | Corrected: false attribution |
| Author 2024 | 19 | ✅ | None - accurate |
## References (ACM Format)
[Author Year] FirstName LastName, FirstName LastName. Year.
Title of Paper. In Proceedings of CONF (CONF 'YY), Vol. 19.
AAAI Press, Pages. DOI: https://doi.org/XX.XXXX/XXXXXXX
Token Management
Monitor after each paper:
if tokens_remaining < 50000:
print("⚠️ WARNING: Less than 50k tokens remaining")
print("Recommend: Save progress and continue in new session")
Real Performance (from 2026-02-05):
- 6 papers, 91 pages, 369,478 characters
- Token used: 113,000 / 200,000 (56.5%)
- Time: ~2 hours
- Result: 100% accurate verification
Emergency Procedures
Token Budget Exhausted (< 30k remaining)
# Save progress
cat > /tmp/verification_progress.txt << EOF
Completed: Paper1, Paper2, Paper3
Current: Paper4 (line 500/1200)
Pending: Paper5, Paper6
Token used: 150,000
EOF
# Report to user and STOP
Quality Checklist
Before claiming "verification complete":
- [ ] Read complete text (first page → last page)
- [ ] Located References section (proves completeness)
- [ ] Recorded line numbers for all citations
- [ ] Verified numerical data (N=X, p<0.05)
- [ ] Checked author's evaluation words
- [ ] Collected complete metadata from Zotero
- [ ] Generated ACM-formatted reference list
- [ ] Stayed within token budget
Dependencies
# Install PyMuPDF
pip install PyMuPDF
# Ensure Zotero is running with local API enabled
# Settings → Advanced → "Allow other applications to communicate with Zotero"
Documentation
instructions.md- Complete workflow detailsQUICK_REFERENCE.md- Quick reference cardexample_workflow.py- Working exampleREADME.md- Project overview
Version History
- v2.0.0 (2026-02-05): Added token management, 6-paper verification workflow
- v1.0.0 (2026-02-02): Initial release
微信扫一扫