Markdown to DOCX with Images

Convert Markdown (.md) documents to Word (.docx) with embedded images, proper heading styles, tables, and formatting preservation.

When to Use

User asks to convert a .md file to .docx / Word
User wants images from the markdown embedded in the Word document
User needs heading styles, tables, and formatting preserved

Prerequisites

pandoc (required for conversion)
python (for helper scripts)

Scripts

All scripts are in the scripts/ directory relative to this skill file. Always use these scripts instead of writing inline code.

| Script | Purpose | |--------|---------| | scripts/install_pandoc.py | Auto-install pandoc on Windows/macOS/Linux | | scripts/check_images.py | Verify all image paths resolve correctly before conversion | | scripts/md_to_docx.py | Convert markdown to docx with image embedding and caption fixing | | scripts/fix_captions.py | Fix image captions in an existing docx: center align, 8pt, no italic |

Conversion Workflow

Step 1: Check for Pandoc

pandoc --version

Step 1b: Auto-Install Pandoc if Missing

If pandoc is not found, run the install script:

py scripts/install_pandoc.py

After installation, verify:

pandoc --version

IMPORTANT: After successfully installing pandoc, you MUST tell the user explicitly:

"Pandoc has been automatically installed. You may need to restart your terminal for the PATH changes to take effect. If pandoc --version still fails, please restart your terminal and try again."

If auto-install fails, show this message:

"Could not automatically install pandoc. Please install it manually:

Windows: Download from https://github.com/jgm/pandoc/releases/latest

macOS: brew install pandoc

Linux: sudo apt install pandoc"

Step 2: Check Image Paths

Before converting, verify all image references resolve correctly:

py scripts/check_images.py input.md

This checks that all ![alt](path) references point to existing files. Pandoc resolves paths relative to the markdown file location.

Step 3: Convert Markdown to DOCX

Recommended: use the bundled script (handles conversion + caption fixing automatically):

py scripts/md_to_docx.py input.md output.docx

This runs pandoc and then automatically fixes image captions (center aligned, 8pt font, no italic).

Manual conversion with pandoc:

pandoc input.md -o output.docx

With reference document (preserves styling from an existing .docx):

pandoc input.md -o output.docx --reference-doc=template.docx

If you used pandoc directly (not the script), fix image captions afterwards:

py scripts/fix_captions.py output.docx

Step 3b: Fix Image Alt Text (Optional)

If your markdown has image alt text like ![image2.png] (with file extension), pandoc will use the full filename including .png as the caption text. Fix this before conversion:

import re
with open('input.md', 'r', encoding='utf-8') as f:
    content = f.read()
content = re.sub(r'!\[(image\d+)\.\w+\]\(', r'![\1](', content)
with open('input.md', 'w', encoding='utf-8') as f:
    f.write(content)

Step 4: Fix Image Captions

Pandoc inserts image alt text as a plain Normal style paragraph after each image. The bundled scripts fix this to:

Center aligned
8pt font (smaller than body text)
No italic, no bold

If using scripts/md_to_docx.py, this is done automatically. Otherwise run:

py scripts/fix_captions.py output.docx

Step 4: Verify

Check that images were properly embedded:

python -c "
import zipfile
with zipfile.ZipFile('output.docx', 'r') as z:
    images = [f for f in z.namelist() if 'word/media/' in f]
    print(f'Embedded images: {len(images)}')
    for img in sorted(images):
        print(f'  {img}')
"

Check document structure:

python -c "
from docx import Document
doc = Document('output.docx')
headings = [p for p in doc.paragraphs if p.style and p.style.name.startswith('Heading')]
print(f'Headings: {len(headings)}')
print(f'Tables: {len(doc.tables)}')
print(f'Paragraphs: {len(doc.paragraphs)}')
"

Key Learnings from Real-World Conversion

Round-trip conversion (docx → md → docx)

When converting docx → md → docx, be aware of:

Tables: Pandoc's markdown table format only handles simple tables. Complex tables (merged cells, nested tables) in the original docx may not survive the round-trip. The md → docx step will correctly convert all markdown tables, but complex table structures from the original are lost after the first conversion.
Images: Pandoc embeds images correctly from ./img/ relative paths. All 32 images in a typical document convert successfully.
Headings: Heading levels are preserved through the round-trip. ## Heading in markdown becomes Heading 2 style in docx.
Formatting: Bold (**text**) and italic (*text*) are preserved.
TOC: Table of contents entries in the original docx become regular links in markdown, then plain text in the reconverted docx.

Image path resolution

Pandoc resolves image paths relative to the markdown file location
![alt](./img/image1.png) works when img/ is a sibling of the .md file
Remote URLs (http://...) are downloaded and embedded by pandoc automatically
If an image path is broken, pandoc will fail with an error — always run check_images.py first

Reference document styling

Using --reference-doc allows you to control:

Heading font sizes and colors
Table styles
Paragraph spacing
Page margins
Header/footer content

Create a reference docx by:

Converting a sample markdown to docx without --reference-doc
Editing the styles in Word
Saving as template.docx
Using --reference-doc=template.docx for future conversions

Output Structure

project/
├── input.md            # Source markdown
├── output.docx         # Converted Word document
└── img/                # Images referenced by markdown
    ├── image1.png
    ├── image2.jpeg
    └── ...

Notes

Pandoc is required — there is no python-only fallback that handles images as well
On Windows, run Python scripts via py script.py not inline in PowerShell (escaping issues)
.emf images (Windows metafile) are embedded but may not render on non-Windows systems
For best results, ensure all image paths use relative paths (./img/) before conversion
The generated docx uses pandoc's default styles unless a --reference-doc is provided