Source to Markdown

Use this skill only for format conversion into Markdown. Do not perform requirement analysis, product scoping, hypothesis extraction, summaries, recommendations, handoff generation, or team coordination.

Core Contract

Convert source files into Markdown text.
Preserve original visible or spoken content as faithfully as possible.
Do not summarize, explain, answer, classify, or infer business meaning.
Do not create structured analysis files, source indexes, demo paths, hypotheses, or project deliverables.
Write outputs to the user-requested path; if no path is specified, use a local converted/ folder near the source or project deliverable folder.
Use .raw.md for OCR/ASR/LLM transcription outputs.

Supported Inputs

| Input | Default Handling | |---|---| | .md, .txt | Read directly and normalize if needed | | .docx | Convert with MarkItDown | | .pdf | Convert with MarkItDown; use Tencent OCR for scanned pages | | .pptx | Convert slide text with MarkItDown | | .xlsx, .xls | Convert tables with MarkItDown | | .csv | Convert table export with MarkItDown | | .html, .htm | Convert saved web/API documentation with MarkItDown | | .json | Convert structured JSON dump with MarkItDown | | .xml | Convert structured XML/config dump with MarkItDown | | .png, .jpg, .jpeg, .webp, .gif, .bmp | Default Tencent OCR; visual LLM fallback | | .wav, .pcm, .ogg, .speex, .silk, .mp3, .m4a, .aac, .amr | Default Tencent ASR | | .mp3, .wav, .m4a, .flac, .ogg | LLM audio fallback when Tencent ASR is unsuitable | | .zip | Batch route contained files and write manifest.md |

Unsupported by default: video, YouTube URLs, EPUB, and arbitrary binary dumps. Ask for a common-format export when needed.

Provider Configuration

API credentials, provider options, and local runtime paths are configured in providers.json at the skill root. Provider value precedence is:

providers.json > environment variable > script default

Read references/providers.md only when you need the full JSON template, environment fallback table, or provider fields. providers.json may contain real local secrets; never paste its values into chat, converted Markdown, logs, or user-visible output.

Route Selection

| Source | Default | Fallback | |---|---|---| | Office/text documents | MarkItDown | Ask for cleaner .md, .pdf, .docx, or .csv export | | CSV/HTML/JSON/XML structured exports | MarkItDown | Ask for cleaner .csv, .md, or source-system export | | Image text OCR | scripts/tencent_ocr_to_markdown.py | scripts/vision_to_markdown.py | | Visual layout/context transcription | scripts/vision_to_markdown.py | Ask for text/PDF/DOCX export | | Audio transcript | scripts/tencent_asr_to_markdown.py | scripts/llm_audio_to_markdown.py | | ZIP material package | scripts/source_to_markdown.py batch router | Convert files one by one and record failures |

Unified Router

Prefer the unified router for normal use. It routes by file extension, writes single-file outputs, expands ZIP packages safely, and writes a batch manifest.md.

Single file:

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "input.docx" `
  "converted/input.md"

Structured exports supported through MarkItDown:

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "api-response.json" `
  "converted/api-response.md"

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "export.csv" `
  "converted/export.md"

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "saved-page.html" `
  "converted/saved-page.md"

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "device-config.xml" `
  "converted/device-config.md"

Image route selection:

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "input.png" `
  "converted/input.raw.md" `
  --image-route ocr

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "input.png" `
  "converted/input.raw.md" `
  --image-route vision

Audio route selection:

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "meeting.m4a" `
  "converted/meeting.raw.md" `
  --audio-route asr

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "meeting.mp3" `
  "converted/meeting.raw.md" `
  --audio-route llm `
  --request-timeout 600

ZIP material package:

python skills/source-to-markdown/scripts/source_to_markdown.py `
  "materials.zip" `
  "converted/materials"

ZIP output layout:

converted/materials/
├── manifest.md
├── source-a.md
├── table-export.md
├── screenshot.raw.md
└── nested/path/spec.md

manifest.md records every contained file, route, output path, status, and failure/skipped reason. Do not treat failed or skipped files as converted evidence.

Document Conversion

Use MarkItDown for normal document sources:

markitdown "input.docx" -o "converted/input.md"
markitdown "input.pdf" -o "converted/input.md"
markitdown "input.pptx" -o "converted/input.md"
markitdown "input.xlsx" -o "converted/input.md"
markitdown "input.csv" -o "converted/input.md"
markitdown "input.html" -o "converted/input.md"
markitdown "input.json" -o "converted/input.md"
markitdown "input.xml" -o "converted/input.md"

For .md or .txt, read directly and preserve the original text unless normalization is explicitly requested.

For scanned PDFs, use Tencent OCR page by page when MarkItDown cannot extract text:

python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
  "scanned.pdf" `
  "converted/scanned-page-1.raw.md" `
  --pdf-page-number 1

On Windows, set UTF-8 output if Chinese text prints incorrectly:

$env:PYTHONIOENCODING='utf-8'

Image Conversion

Use Tencent OCR first for images, screenshots, scanned notes, and scanned PDF pages:

python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
  "input.png" `
  "converted/input.raw.md" `
  --request-timeout 300

For large images with small text, add:

--enable-detect-split

Use the visual LLM route only when deterministic OCR is insufficient and the image needs visual layout/context transcription:

python skills/source-to-markdown/scripts/vision_to_markdown.py `
  "input.png" `
  "converted/input.raw.md" `
  --request-timeout 300

Image output must be raw visible text or faithful visual transcription. Do not add interpretation, analysis, or conclusions.

Audio Conversion

Use Tencent ASR first for recordings:

python skills/source-to-markdown/scripts/tencent_asr_to_markdown.py `
  "meeting.m4a" `
  "converted/meeting.raw.md" `
  --request-timeout 300

Tencent ASR uses tencent_asr in providers.json, supports common recording formats, and outputs only recognized transcript text from flash_result.

Use the LLM audio route only when Tencent ASR is unavailable, unsuitable, or explicitly requested:

python skills/source-to-markdown/scripts/llm_audio_to_markdown.py `
  "meeting.mp3" `
  "converted/meeting.raw.md" `
  --request-timeout 300

For long recordings, increase timeout:

--request-timeout 600

For known-good audio on the LLM fallback route, bypass normalization only when needed:

--normalize-audio never

Audio output must be raw transcript text only. Do not add generated headings, summaries, action items, analysis, “识别不确定处”, or invented “无” sections.

Output Rules

Use .md for document conversions.
Use .md for MarkItDown-routed structured exports such as CSV, HTML, JSON, and XML.
Use .raw.md for OCR, ASR, and LLM transcription outputs.
For ZIP packages, write a manifest.md and one output file per converted source file.
Preserve original order, wording, numbers, timestamps, labels, and table structure as much as possible.
Mark unclear OCR as [无法识别] or [不确定: ...].
Mark unclear audio as [听不清] or [不确定: ...].
Do not add metadata headers unless a script option explicitly requests them.
Do not include API keys, provider config values, request signatures, or credentials in outputs.

Failure Handling

If conversion fails:

Record the failed file and the error message.
Try a simpler conversion route:
- .docx -> ask for .pdf or .md
- .pdf -> ask for text PDF or Word source if scanned/poorly extracted
- .pptx -> ask for speaker notes or exported .pdf
- .xlsx -> ask for .csv only if spreadsheet parsing fails
- image -> clearer image, text/PDF/DOCX export, or visual LLM fallback
- audio -> supported format, shorter audio, clearer recording, or LLM audio fallback
Do not invent missing text.
Report which files converted and which files failed.