Source to Markdown
Use this skill only for format conversion into Markdown. Do not perform requirement analysis, product scoping, hypothesis extraction, summaries, recommendations, handoff generation, or team coordination.
Core Contract
- Convert source files into Markdown text.
- Preserve original visible or spoken content as faithfully as possible.
- Do not summarize, explain, answer, classify, or infer business meaning.
- Do not create structured analysis files, source indexes, demo paths, hypotheses, or project deliverables.
- Write outputs to the user-requested path; if no path is specified, use a local
converted/folder near the source or project deliverable folder. - Use
.raw.mdfor OCR/ASR/LLM transcription outputs.
Supported Inputs
| Input | Default Handling |
|---|---|
| .md, .txt | Read directly and normalize if needed |
| .docx | Convert with MarkItDown |
| .pdf | Convert with MarkItDown; use Tencent OCR for scanned pages |
| .pptx | Convert slide text with MarkItDown |
| .xlsx, .xls | Convert tables with MarkItDown |
| .csv | Convert table export with MarkItDown |
| .html, .htm | Convert saved web/API documentation with MarkItDown |
| .json | Convert structured JSON dump with MarkItDown |
| .xml | Convert structured XML/config dump with MarkItDown |
| .png, .jpg, .jpeg, .webp, .gif, .bmp | Default Tencent OCR; visual LLM fallback |
| .wav, .pcm, .ogg, .speex, .silk, .mp3, .m4a, .aac, .amr | Default Tencent ASR |
| .mp3, .wav, .m4a, .flac, .ogg | LLM audio fallback when Tencent ASR is unsuitable |
| .zip | Batch route contained files and write manifest.md |
Unsupported by default: video, YouTube URLs, EPUB, and arbitrary binary dumps. Ask for a common-format export when needed.
Provider Configuration
API credentials, provider options, and local runtime paths are configured in providers.json at the skill root. Provider value precedence is:
providers.json > environment variable > script default
Read references/providers.md only when you need the full JSON template, environment fallback table, or provider fields. providers.json may contain real local secrets; never paste its values into chat, converted Markdown, logs, or user-visible output.
Route Selection
| Source | Default | Fallback |
|---|---|---|
| Office/text documents | MarkItDown | Ask for cleaner .md, .pdf, .docx, or .csv export |
| CSV/HTML/JSON/XML structured exports | MarkItDown | Ask for cleaner .csv, .md, or source-system export |
| Image text OCR | scripts/tencent_ocr_to_markdown.py | scripts/vision_to_markdown.py |
| Visual layout/context transcription | scripts/vision_to_markdown.py | Ask for text/PDF/DOCX export |
| Audio transcript | scripts/tencent_asr_to_markdown.py | scripts/llm_audio_to_markdown.py |
| ZIP material package | scripts/source_to_markdown.py batch router | Convert files one by one and record failures |
Unified Router
Prefer the unified router for normal use. It routes by file extension, writes single-file outputs, expands ZIP packages safely, and writes a batch manifest.md.
Single file:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.docx" `
"converted/input.md"
Structured exports supported through MarkItDown:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"api-response.json" `
"converted/api-response.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"export.csv" `
"converted/export.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"saved-page.html" `
"converted/saved-page.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"device-config.xml" `
"converted/device-config.md"
Image route selection:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--image-route ocr
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--image-route vision
Audio route selection:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"meeting.m4a" `
"converted/meeting.raw.md" `
--audio-route asr
python skills/source-to-markdown/scripts/source_to_markdown.py `
"meeting.mp3" `
"converted/meeting.raw.md" `
--audio-route llm `
--request-timeout 600
ZIP material package:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"materials.zip" `
"converted/materials"
ZIP output layout:
converted/materials/
├── manifest.md
├── source-a.md
├── table-export.md
├── screenshot.raw.md
└── nested/path/spec.md
manifest.md records every contained file, route, output path, status, and failure/skipped reason. Do not treat failed or skipped files as converted evidence.
Document Conversion
Use MarkItDown for normal document sources:
markitdown "input.docx" -o "converted/input.md"
markitdown "input.pdf" -o "converted/input.md"
markitdown "input.pptx" -o "converted/input.md"
markitdown "input.xlsx" -o "converted/input.md"
markitdown "input.csv" -o "converted/input.md"
markitdown "input.html" -o "converted/input.md"
markitdown "input.json" -o "converted/input.md"
markitdown "input.xml" -o "converted/input.md"
For .md or .txt, read directly and preserve the original text unless normalization is explicitly requested.
For scanned PDFs, use Tencent OCR page by page when MarkItDown cannot extract text:
python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
"scanned.pdf" `
"converted/scanned-page-1.raw.md" `
--pdf-page-number 1
On Windows, set UTF-8 output if Chinese text prints incorrectly:
$env:PYTHONIOENCODING='utf-8'
Image Conversion
Use Tencent OCR first for images, screenshots, scanned notes, and scanned PDF pages:
python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--request-timeout 300
For large images with small text, add:
--enable-detect-split
Use the visual LLM route only when deterministic OCR is insufficient and the image needs visual layout/context transcription:
python skills/source-to-markdown/scripts/vision_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--request-timeout 300
Image output must be raw visible text or faithful visual transcription. Do not add interpretation, analysis, or conclusions.
Audio Conversion
Use Tencent ASR first for recordings:
python skills/source-to-markdown/scripts/tencent_asr_to_markdown.py `
"meeting.m4a" `
"converted/meeting.raw.md" `
--request-timeout 300
Tencent ASR uses tencent_asr in providers.json, supports common recording formats, and outputs only recognized transcript text from flash_result.
Use the LLM audio route only when Tencent ASR is unavailable, unsuitable, or explicitly requested:
python skills/source-to-markdown/scripts/llm_audio_to_markdown.py `
"meeting.mp3" `
"converted/meeting.raw.md" `
--request-timeout 300
For long recordings, increase timeout:
--request-timeout 600
For known-good audio on the LLM fallback route, bypass normalization only when needed:
--normalize-audio never
Audio output must be raw transcript text only. Do not add generated headings, summaries, action items, analysis, “识别不确定处”, or invented “无” sections.
Output Rules
- Use
.mdfor document conversions. - Use
.mdfor MarkItDown-routed structured exports such as CSV, HTML, JSON, and XML. - Use
.raw.mdfor OCR, ASR, and LLM transcription outputs. - For ZIP packages, write a
manifest.mdand one output file per converted source file. - Preserve original order, wording, numbers, timestamps, labels, and table structure as much as possible.
- Mark unclear OCR as
[无法识别]or[不确定: ...]. - Mark unclear audio as
[听不清]or[不确定: ...]. - Do not add metadata headers unless a script option explicitly requests them.
- Do not include API keys, provider config values, request signatures, or credentials in outputs.
Failure Handling
If conversion fails:
- Record the failed file and the error message.
- Try a simpler conversion route:
.docx-> ask for.pdfor.md.pdf-> ask for text PDF or Word source if scanned/poorly extracted.pptx-> ask for speaker notes or exported.pdf.xlsx-> ask for.csvonly if spreadsheet parsing fails- image -> clearer image, text/PDF/DOCX export, or visual LLM fallback
- audio -> supported format, shorter audio, clearer recording, or LLM audio fallback
- Do not invent missing text.
- Report which files converted and which files failed.
Scan to join WeChat group