Back to skills
extension
Category: Productivity & OfficeNo API key required

Read Microsoft Word documents (.docx and .doc) with Chinese support

Read Microsoft Word documents (.docx and .doc) with Chinese support. Extract text, search keywords, and save as UTF-8 text files. No Microsoft Word installat...

personAuthor: vincent-big-fishhubclawhub

Read Word Document

A professional tool for reading Microsoft Word documents, supporting both modern .docx and legacy .doc formats with full Chinese language support.

Features

  • Read .docx files - Word 2007 and later format
  • Read .doc files - Word 97-2003 format via OLE parsing
  • Auto format detection - Automatically identifies file type
  • Full Chinese support - Handles Chinese encoding correctly
  • Keyword search - Search for keywords across all paragraphs
  • Export to text - Save as UTF-8 text files
  • Document analysis - Get document statistics and info
  • No Word required - Works without Microsoft Word installation

Installation

Prerequisites

pip install python-docx olefile

Install Skill

# Copy to your OpenClaw skills directory
cp -r read-word ~/.openclaw/skills/

Usage

Command Line

# Basic reading (shows first 100 paragraphs)
python ~/.openclaw/skills/read-word/read_word.py "document.docx"

# Show more content
python ~/.openclaw/skills/read-word/read_word.py "document.docx" --limit 200

# Search for keywords
python ~/.openclaw/skills/read-word/read_word.py "document.docx" --search "keyword1,keyword2"

# Save as text file
python ~/.openclaw/skills/read-word/read_word.py "document.docx" --output "output.txt"

# Show document info only
python ~/.openclaw/skills/read-word/read_word.py "document.docx" --info

Python API

# Method 1: Import functions
import sys
sys.path.insert(0, '~/.openclaw/skills/read-word')
from read_word import read_word_document, search_in_document

# Read document
paragraphs = read_word_document("document.docx")
for para in paragraphs:
    print(para)

# Search keywords
results = search_in_document("document.docx", ["keyword1", "keyword2"])

Examples

Example 1: Read and Analyze

from read_word import read_word_document

paragraphs = read_word_document("report.docx")
print(f"Document has {len(paragraphs)} paragraphs")

# Show first 10 paragraphs
for i, p in enumerate(paragraphs[:10]):
    print(f"{i+1}. {p}")

Example 2: Search Keywords

from read_word import search_in_document

# Find paragraphs containing "kitchen" or "feng shui"
results = search_in_document("book.docx", ["kitchen", "feng shui"])
for r in results:
    print(r)

Example 3: Batch Processing

from pathlib import Path
from read_word import read_word_document

desktop = Path.home() / "Desktop"
for doc_file in desktop.glob("*.docx"):
    paragraphs = read_word_document(doc_file)
    print(f"{doc_file.name}: {len(paragraphs)} paragraphs")

API Reference

read_word_document(filepath)

Read a Word document and return a list of paragraphs.

Parameters:

  • filepath (str|Path): Path to the Word document

Returns:

  • list: List of paragraph strings

Raises:

  • FileNotFoundError: If file doesn't exist
  • ValueError: If file format is not supported

search_in_document(filepath, keywords)

Search for keywords in a Word document.

Parameters:

  • filepath (str|Path): Path to the Word document
  • keywords (list): List of keywords to search for

Returns:

  • list: Matching paragraphs with format "[Paragraph N] content"

save_as_text(paragraphs, output_path)

Save paragraphs to a UTF-8 text file.

Parameters:

  • paragraphs (list): List of paragraph strings
  • output_path (str|Path): Output file path

analyze_document(filepath)

Analyze document and return statistics.

Returns:

  • dict: Contains filename, size, paragraphs count, total characters

Troubleshooting

Error: ModuleNotFoundError: No module named 'docx'

Solution: pip install python-docx

Error: Legacy .doc file shows garbled text

Reason: OLE parsing has limitations with complex formatting Solution: Convert .doc to .docx using Microsoft Word, then read

Error: Chinese characters display incorrectly

Reason: Terminal encoding issue Solution: Use --output to save to file, then open with editor

File Support

| Format | Extension | Support Level | |--------|-----------|---------------| | Word 2007+ | .docx | Full | | Word 97-2003 | .doc | Partial (text only) | | Word 95/6.0 | .doc | Not supported | | Rich Text | .rtf | Not supported |

Permissions

  • Read: User-specified Word documents
  • Write (optional): Output .txt files when using --output
  • Network: None

Security

Risk Level: LOW - Local file operations only, no network access, original files are never modified.

Changelog

v1.0.0 (2026-03-20)

  • Initial release
  • Support .docx and .doc formats
  • Keyword search functionality
  • Text export capability
  • Chinese encoding support

Author

叶文洁 (Ye Wenjie) - Created for reading Feng Shui books and Word documents

License

MIT License