返回 Skill 列表
extension
分类: 开发与工程需要 API Key

PDF to markdown converter

通过PDF2Markdown CLI将PDF及图片文档转换为干净的Markdown。适用于提取PDF文本、转换文档格式等场景。

person作者: qthanshubclawhub

PDF2Markdown CLI

Convert PDF and image documents to Markdown. Supports both pdf2markdown and pdf2md commands.

Run pdf2markdown --help or pdf2md <command> --help for options.

Prerequisites

Install and authenticate. Check with pdf2markdown --status.

pdf2markdown login
# or set PDF2MARKDOWN_API_KEY

If not ready, see rules/install.md. For output handling, see rules/security.md.

Workflow

| Need | Command | When | | ------------------- | -------------- | ------------------------------------------------------- | | Convert PDF/image | parse | File under ~30MB, have path or URL | | Large file (async) | parse-async | File over ~30MB, or sync returns file_too_large error |

Quick start

Parse (sync, ~30MB):

pdf2markdown document.pdf -o .pdf2markdown/output.md
pdf2markdown parse --url "https://example.com/doc.pdf" -o .pdf2markdown/doc.md
pdf2markdown parse file1.pdf file2.png -o .pdf2markdown/

# JSON output
pdf2markdown parse document.pdf --format json -o .pdf2markdown/result.json

Parse-async (large files, up to 100MB):

# Submit and wait
pdf2markdown parse-async large.pdf --wait -o .pdf2markdown/output.md
pdf2markdown parse-async --url "https://cdn.example.com/big.pdf" --wait -o .pdf2markdown/doc.md

# Submit only (poll later)
pdf2markdown parse-async large.pdf  # returns task_id
pdf2markdown parse-async <task_id> --status
pdf2markdown parse-async <task_id> --result -o .pdf2markdown/output.md

Options

| Command | Key options | | ------------- | --------------------------------------------------------------------------- | | parse | -u, --url, -o, --output, -f, --format (markdown, json, all), --page-images, --json, --pretty | | parse-async | -u, --url, -o, --output, --wait, --status, --result, --poll-interval, --timeout |

Run pdf2markdown <command> --help for full details.

Output & Organization

Write results to .pdf2markdown/ with -o. Add .pdf2markdown/ to .gitignore.

pdf2markdown document.pdf -o .pdf2markdown/doc.md
pdf2markdown parse file1.pdf file2.pdf -o .pdf2markdown/

Naming: .pdf2markdown/{name}.md. For large outputs, use grep, head, or incremental reads. Always quote URLs — shell interprets ? and & as special characters.

Documentation