academic-retrieval

Sciverse academic paper retrieval: structured metadata search, semantic chunk retrieval for RAG, and byte-range content reading. For agent workflows that need citation-grade scientific literature.

When to use

Trigger this skill when the user's request involves any of:

Locating academic papers by structured criteria (authors, year, journal, subjects)
Grounding answers in paper excerpts (RAG / citations)
Expanding the original text around a known doc_id (more bytes before/after a chunk)

Authentication

This skill requires the SCIVERSE_API_TOKEN environment variable (obtain from https://sciverse.space). Optionally set SCIVERSE_BASE_URL to override the default API base URL.

Tools

search_papers

Search academic papers by structured filters (title, authors, journal, year, subjects, etc.). Use when: "find Hinton's papers from 2020-2023", "Nature papers on CRISPR". Not for: natural-language Q&A retrieval (use semantic_search) or full-text snippets (use read_content). Returns: list of papers; each entry has doc_id, title, author, abstract, publication_venue_name, publication_published_year.

Invoke: node scripts/search_papers.mjs '<JSON args>'

semantic_search

Natural-language semantic search returning relevant paper chunks for RAG-style answering. Use when: "How does Transformer attention work?", "What are recent methods for protein structure prediction?". Not for: precise field filtering (use search_papers) or fetching full original text (use read_content). Returns: list of chunks; each entry has chunk_id, doc_id, abstract, chunk, score, title, offset. Typical chain: semantic_search → pick chunk → read_content(doc_id, offset).

Invoke: node scripts/semantic_search.mjs '<JSON args>'

list_catalog

Returns the schema catalog for search_papers: every field name, type, whether it's filterable / sortable, default-return status, human description, and applicable FilterOperators. Use when: "Which field do I filter by DOI?", "What values can access_oa_status take?", "What's the right enum for metadata_type?". Not for: actually searching papers (use search_papers / semantic_search). Typical pattern: call once when first encountering Sciverse or facing an ambiguous field need, then construct precise search_papers filters from the returned schema. Pass include_sample_values=true to also fetch top-20 values for enum-like fields (OpenSearch terms aggregation, 24h cached).

Invoke: node scripts/list_catalog.mjs '<JSON args>'

read_content

Read a UTF-8 byte range of a paper's original text. Typically used with a doc_id/offset returned by semantic_search to expand context (read more bytes before or after a chunk). Returns: text fragment, bytes_returned, next_offset, more (boolean).

Invoke: node scripts/read_content.mjs '<JSON args>'

get_resource

Returns the binary bytes of a paper figure / table image referenced inside read_content's Markdown via ![alt](file_name) placeholders. Use when the user asks to see / display / describe a figure and read_content output contains an image reference. Input file_name comes from the Markdown URL part (relative path, no \\ or ..). Returns: raw image stream + image/* Content-Type. The SDK / MCP server wraps the bytes as base64 + mimeType so Claude (multimodal) can read the image directly.

Invoke: node scripts/get_resource.mjs '<JSON args>'

Bootstrap: learn the schema first

If you're unsure which fields exist or what values an enum takes (e.g. metadata_type, language, access_oa_status), call list_catalog once at the start. Sample values are returned for low-cardinality fields. Use it instead of guessing field names — guessing wastes turns.

list_catalog(include_sample_values=true)
    └─▶ fields[].name + sample_values  →  precise filter construction

Recipes

RAG flow (natural-language Q&A):

semantic_search(query=...) → hits[i].doc_id, hits[i].offset
    └─▶ read_content(doc_id, offset)

Lookup by DOI:

search_papers(filters_advanced=[{field: "doi", value: "10.1038/..."}])

OA + year filter:

search_papers(
    year_from=2024,
    filters_advanced=[{field: "access_is_oa", value: "true"}]
)

Structured + semantic hybrid:

search_papers(authors=[...], year_from=2020) → doc_ids
semantic_search(query=...) → filter hits client-side by doc_ids

Fetch a paper figure / image:

When read_content Markdown contains ![alt](file_name), call get_resource with the file_name to fetch image binary.

read_content(doc_id, offset) → markdown ![Figure 3](dt=xxx/p/f3.png)
    └─▶ get_resource(file_name="dt=xxx/p/f3.png")

Exit codes

0 — success; stdout is the JSON response
1 — HTTP 4xx/5xx; stderr contains status code and response body
2 — argument error (missing token, malformed JSON, required field absent)