academic-retrieval
Sciverse academic paper retrieval: structured metadata search, semantic chunk retrieval for RAG, and byte-range content reading. For agent workflows that need citation-grade scientific literature.
When to use
Trigger this skill when the user's request involves any of:
- Locating academic papers by structured criteria (authors, year, journal, subjects)
- Grounding answers in paper excerpts (RAG / citations)
- Expanding the original text around a known doc_id (more bytes before/after a chunk)
Authentication
This skill requires the SCIVERSE_API_TOKEN environment variable
(obtain from https://sciverse.space). Optionally set SCIVERSE_BASE_URL
to override the default API base URL.
Tools
search_papers
Search academic papers by structured filters (title, authors, journal, year, subjects, etc.). Use when: "find Hinton's papers from 2020-2023", "Nature papers on CRISPR". Not for: natural-language Q&A retrieval (use semantic_search) or full-text snippets (use read_content). Returns: list of papers; each entry has doc_id, title, author, abstract, publication_venue_name, publication_published_year.
Invoke: node scripts/search_papers.mjs '<JSON args>'
semantic_search
Natural-language semantic search returning relevant paper chunks for RAG-style answering. Use when: "How does Transformer attention work?", "What are recent methods for protein structure prediction?". Not for: precise field filtering (use search_papers) or fetching full original text (use read_content). Returns: list of chunks; each entry has chunk_id, doc_id, abstract, chunk, score, title, offset. Typical chain: semantic_search → pick chunk → read_content(doc_id, offset).
Invoke: node scripts/semantic_search.mjs '<JSON args>'
list_catalog
Returns the schema catalog for search_papers: every field name, type, whether it's filterable / sortable, default-return status, human description, and applicable FilterOperators. Use when: "Which field do I filter by DOI?", "What values can access_oa_status take?", "What's the right enum for metadata_type?". Not for: actually searching papers (use search_papers / semantic_search). Typical pattern: call once when first encountering Sciverse or facing an ambiguous field need, then construct precise search_papers filters from the returned schema. Pass include_sample_values=true to also fetch top-20 values for enum-like fields (OpenSearch terms aggregation, 24h cached).
Invoke: node scripts/list_catalog.mjs '<JSON args>'
read_content
Read a UTF-8 byte range of a paper's original text. Typically used with a doc_id/offset returned by semantic_search to expand context (read more bytes before or after a chunk). Returns: text fragment, bytes_returned, next_offset, more (boolean).
Invoke: node scripts/read_content.mjs '<JSON args>'
get_resource
Returns the binary bytes of a paper figure / table image referenced
inside read_content's Markdown via  placeholders.
Use when the user asks to see / display / describe a figure and
read_content output contains an image reference.
Input file_name comes from the Markdown URL part (relative path,
no \\ or ..).
Returns: raw image stream + image/* Content-Type. The SDK / MCP
server wraps the bytes as base64 + mimeType so Claude (multimodal)
can read the image directly.
Invoke: node scripts/get_resource.mjs '<JSON args>'
Bootstrap: learn the schema first
If you're unsure which fields exist or what values an enum takes
(e.g. metadata_type, language, access_oa_status), call
list_catalog once at the start. Sample values are returned for
low-cardinality fields. Use it instead of guessing field names —
guessing wastes turns.
list_catalog(include_sample_values=true)
└─▶ fields[].name + sample_values → precise filter construction
Recipes
RAG flow (natural-language Q&A):
semantic_search(query=...) → hits[i].doc_id, hits[i].offset
└─▶ read_content(doc_id, offset)
Lookup by DOI:
search_papers(filters_advanced=[{field: "doi", value: "10.1038/..."}])
OA + year filter:
search_papers(
year_from=2024,
filters_advanced=[{field: "access_is_oa", value: "true"}]
)
Structured + semantic hybrid:
search_papers(authors=[...], year_from=2020) → doc_ids
semantic_search(query=...) → filter hits client-side by doc_ids
Fetch a paper figure / image:
When read_content Markdown contains , call
get_resource with the file_name to fetch image binary.
read_content(doc_id, offset) → markdown 
└─▶ get_resource(file_name="dt=xxx/p/f3.png")
Exit codes
0— success; stdout is the JSON response1— HTTP 4xx/5xx; stderr contains status code and response body2— argument error (missing token, malformed JSON, required field absent)
Scan to contact