Back to skills
extension
Category: OtherNo API key required

md-to-docx-with-images

Convert Markdown (.md) to Word (.docx) with embedded images, heading styles, and tables. Supports reference docx for consistent styling. Auto-installs pandoc when missing. 将 Markdown 文档转为 Word,自动嵌入图片、保留标题样式和表格,支持参考模板文档。

personAuthor: user_29e2fa9ehubcommunity

Markdown to DOCX with Images

Convert Markdown (.md) documents to Word (.docx) with embedded images, proper heading styles, tables, and formatting preservation.

When to Use

  • User asks to convert a .md file to .docx / Word
  • User wants images from the markdown embedded in the Word document
  • User needs heading styles, tables, and formatting preserved

Prerequisites

  • pandoc (required for conversion)
  • python (for helper scripts)

Scripts

All scripts are in the scripts/ directory relative to this skill file. Always use these scripts instead of writing inline code.

| Script | Purpose | |--------|---------| | scripts/install_pandoc.py | Auto-install pandoc on Windows/macOS/Linux | | scripts/check_images.py | Verify all image paths resolve correctly before conversion | | scripts/md_to_docx.py | Convert markdown to docx with image embedding and caption fixing | | scripts/fix_captions.py | Fix image captions in an existing docx: center align, 8pt, no italic |

Conversion Workflow

Step 1: Check for Pandoc

pandoc --version

Step 1b: Auto-Install Pandoc if Missing

If pandoc is not found, run the install script:

py scripts/install_pandoc.py

After installation, verify:

pandoc --version

IMPORTANT: After successfully installing pandoc, you MUST tell the user explicitly:

"Pandoc has been automatically installed. You may need to restart your terminal for the PATH changes to take effect. If pandoc --version still fails, please restart your terminal and try again."

If auto-install fails, show this message:

"Could not automatically install pandoc. Please install it manually:

  • Windows: Download from https://github.com/jgm/pandoc/releases/latest
  • macOS: brew install pandoc
  • Linux: sudo apt install pandoc"

Step 2: Check Image Paths

Before converting, verify all image references resolve correctly:

py scripts/check_images.py input.md

This checks that all ![alt](path) references point to existing files. Pandoc resolves paths relative to the markdown file location.

Step 3: Convert Markdown to DOCX

Recommended: use the bundled script (handles conversion + caption fixing automatically):

py scripts/md_to_docx.py input.md output.docx

This runs pandoc and then automatically fixes image captions (center aligned, 8pt font, no italic).

Manual conversion with pandoc:

pandoc input.md -o output.docx

With reference document (preserves styling from an existing .docx):

pandoc input.md -o output.docx --reference-doc=template.docx

If you used pandoc directly (not the script), fix image captions afterwards:

py scripts/fix_captions.py output.docx

Step 3b: Fix Image Alt Text (Optional)

If your markdown has image alt text like ![image2.png] (with file extension), pandoc will use the full filename including .png as the caption text. Fix this before conversion:

import re
with open('input.md', 'r', encoding='utf-8') as f:
    content = f.read()
content = re.sub(r'!\[(image\d+)\.\w+\]\(', r'![\1](', content)
with open('input.md', 'w', encoding='utf-8') as f:
    f.write(content)

Step 4: Fix Image Captions

Pandoc inserts image alt text as a plain Normal style paragraph after each image. The bundled scripts fix this to:

  • Center aligned
  • 8pt font (smaller than body text)
  • No italic, no bold

If using scripts/md_to_docx.py, this is done automatically. Otherwise run:

py scripts/fix_captions.py output.docx

Step 4: Verify

Check that images were properly embedded:

python -c "
import zipfile
with zipfile.ZipFile('output.docx', 'r') as z:
    images = [f for f in z.namelist() if 'word/media/' in f]
    print(f'Embedded images: {len(images)}')
    for img in sorted(images):
        print(f'  {img}')
"

Check document structure:

python -c "
from docx import Document
doc = Document('output.docx')
headings = [p for p in doc.paragraphs if p.style and p.style.name.startswith('Heading')]
print(f'Headings: {len(headings)}')
print(f'Tables: {len(doc.tables)}')
print(f'Paragraphs: {len(doc.paragraphs)}')
"

Key Learnings from Real-World Conversion

Round-trip conversion (docx → md → docx)

When converting docx → md → docx, be aware of:

  1. Tables: Pandoc's markdown table format only handles simple tables. Complex tables (merged cells, nested tables) in the original docx may not survive the round-trip. The md → docx step will correctly convert all markdown tables, but complex table structures from the original are lost after the first conversion.

  2. Images: Pandoc embeds images correctly from ./img/ relative paths. All 32 images in a typical document convert successfully.

  3. Headings: Heading levels are preserved through the round-trip. ## Heading in markdown becomes Heading 2 style in docx.

  4. Formatting: Bold (**text**) and italic (*text*) are preserved.

  5. TOC: Table of contents entries in the original docx become regular links in markdown, then plain text in the reconverted docx.

Image path resolution

  • Pandoc resolves image paths relative to the markdown file location
  • ![alt](./img/image1.png) works when img/ is a sibling of the .md file
  • Remote URLs (http://...) are downloaded and embedded by pandoc automatically
  • If an image path is broken, pandoc will fail with an error — always run check_images.py first

Reference document styling

Using --reference-doc allows you to control:

  • Heading font sizes and colors
  • Table styles
  • Paragraph spacing
  • Page margins
  • Header/footer content

Create a reference docx by:

  1. Converting a sample markdown to docx without --reference-doc
  2. Editing the styles in Word
  3. Saving as template.docx
  4. Using --reference-doc=template.docx for future conversions

Output Structure

project/
├── input.md            # Source markdown
├── output.docx         # Converted Word document
└── img/                # Images referenced by markdown
    ├── image1.png
    ├── image2.jpeg
    └── ...

Notes

  • Pandoc is required — there is no python-only fallback that handles images as well
  • On Windows, run Python scripts via py script.py not inline in PowerShell (escaping issues)
  • .emf images (Windows metafile) are embedded but may not render on non-Windows systems
  • For best results, ensure all image paths use relative paths (./img/) before conversion
  • The generated docx uses pandoc's default styles unless a --reference-doc is provided