Markdown to DOCX with Images
Convert Markdown (.md) documents to Word (.docx) with embedded images, proper heading styles, tables, and formatting preservation.
When to Use
- User asks to convert a .md file to .docx / Word
- User wants images from the markdown embedded in the Word document
- User needs heading styles, tables, and formatting preserved
Prerequisites
- pandoc (required for conversion)
- python (for helper scripts)
Scripts
All scripts are in the scripts/ directory relative to this skill file. Always use these scripts instead of writing inline code.
| Script | Purpose |
|--------|---------|
| scripts/install_pandoc.py | Auto-install pandoc on Windows/macOS/Linux |
| scripts/check_images.py | Verify all image paths resolve correctly before conversion |
| scripts/md_to_docx.py | Convert markdown to docx with image embedding and caption fixing |
| scripts/fix_captions.py | Fix image captions in an existing docx: center align, 8pt, no italic |
Conversion Workflow
Step 1: Check for Pandoc
pandoc --version
Step 1b: Auto-Install Pandoc if Missing
If pandoc is not found, run the install script:
py scripts/install_pandoc.py
After installation, verify:
pandoc --version
IMPORTANT: After successfully installing pandoc, you MUST tell the user explicitly:
"Pandoc has been automatically installed. You may need to restart your terminal for the PATH changes to take effect. If
pandoc --versionstill fails, please restart your terminal and try again."
If auto-install fails, show this message:
"Could not automatically install pandoc. Please install it manually:
- Windows: Download from https://github.com/jgm/pandoc/releases/latest
- macOS:
brew install pandoc- Linux:
sudo apt install pandoc"
Step 2: Check Image Paths
Before converting, verify all image references resolve correctly:
py scripts/check_images.py input.md
This checks that all  references point to existing files. Pandoc resolves paths relative to the markdown file location.
Step 3: Convert Markdown to DOCX
Recommended: use the bundled script (handles conversion + caption fixing automatically):
py scripts/md_to_docx.py input.md output.docx
This runs pandoc and then automatically fixes image captions (center aligned, 8pt font, no italic).
Manual conversion with pandoc:
pandoc input.md -o output.docx
With reference document (preserves styling from an existing .docx):
pandoc input.md -o output.docx --reference-doc=template.docx
If you used pandoc directly (not the script), fix image captions afterwards:
py scripts/fix_captions.py output.docx
Step 3b: Fix Image Alt Text (Optional)
If your markdown has image alt text like ![image2.png] (with file extension), pandoc will use the full filename including .png as the caption text. Fix this before conversion:
import re
with open('input.md', 'r', encoding='utf-8') as f:
content = f.read()
content = re.sub(r'!\[(image\d+)\.\w+\]\(', r'
with open('input.md', 'w', encoding='utf-8') as f:
f.write(content)
Step 4: Fix Image Captions
Pandoc inserts image alt text as a plain Normal style paragraph after each image. The bundled scripts fix this to:
- Center aligned
- 8pt font (smaller than body text)
- No italic, no bold
If using scripts/md_to_docx.py, this is done automatically. Otherwise run:
py scripts/fix_captions.py output.docx
Step 4: Verify
Check that images were properly embedded:
python -c "
import zipfile
with zipfile.ZipFile('output.docx', 'r') as z:
images = [f for f in z.namelist() if 'word/media/' in f]
print(f'Embedded images: {len(images)}')
for img in sorted(images):
print(f' {img}')
"
Check document structure:
python -c "
from docx import Document
doc = Document('output.docx')
headings = [p for p in doc.paragraphs if p.style and p.style.name.startswith('Heading')]
print(f'Headings: {len(headings)}')
print(f'Tables: {len(doc.tables)}')
print(f'Paragraphs: {len(doc.paragraphs)}')
"
Key Learnings from Real-World Conversion
Round-trip conversion (docx → md → docx)
When converting docx → md → docx, be aware of:
-
Tables: Pandoc's markdown table format only handles simple tables. Complex tables (merged cells, nested tables) in the original docx may not survive the round-trip. The md → docx step will correctly convert all markdown tables, but complex table structures from the original are lost after the first conversion.
-
Images: Pandoc embeds images correctly from
./img/relative paths. All 32 images in a typical document convert successfully. -
Headings: Heading levels are preserved through the round-trip.
## Headingin markdown becomesHeading 2style in docx. -
Formatting: Bold (
**text**) and italic (*text*) are preserved. -
TOC: Table of contents entries in the original docx become regular links in markdown, then plain text in the reconverted docx.
Image path resolution
- Pandoc resolves image paths relative to the markdown file location
works whenimg/is a sibling of the.mdfile- Remote URLs (
http://...) are downloaded and embedded by pandoc automatically - If an image path is broken, pandoc will fail with an error — always run
check_images.pyfirst
Reference document styling
Using --reference-doc allows you to control:
- Heading font sizes and colors
- Table styles
- Paragraph spacing
- Page margins
- Header/footer content
Create a reference docx by:
- Converting a sample markdown to docx without
--reference-doc - Editing the styles in Word
- Saving as
template.docx - Using
--reference-doc=template.docxfor future conversions
Output Structure
project/
├── input.md # Source markdown
├── output.docx # Converted Word document
└── img/ # Images referenced by markdown
├── image1.png
├── image2.jpeg
└── ...
Notes
- Pandoc is required — there is no python-only fallback that handles images as well
- On Windows, run Python scripts via
py script.pynot inline in PowerShell (escaping issues) .emfimages (Windows metafile) are embedded but may not render on non-Windows systems- For best results, ensure all image paths use relative paths (
./img/) before conversion - The generated docx uses pandoc's default styles unless a
--reference-docis provided
扫码联系在线客服