Smart Screenshot

Intelligent screen capture with OCR, markdown conversion, and smart formatting. Capture screen regions, extract text, convert images/PDFs to markdown, and save with automated formatting.

Quick Start

Trigger methods:

Keyboard shortcut: Press PrtSc (customizable)
Command line: python scripts/capture.py
Claude Code: Ask Claude to "take a screenshot"

Workflow:

Press PrtSc → Capture mode activates
Choose: Image or Text
Select region/window
If Image: Save with annotation options
If Text: OCR → MarkItDown → Save markdown

Prerequisites

System Requirements

Windows 10/11, macOS 10.14+, or Linux
Python 3.8+
Screen with display access

Install Dependencies

Core (required):

# Screenshot and OCR
pip install pillow pyautogui mss pytesseract pyscreenshot --break-system-packages

# MarkItDown (Microsoft's converter)
pip install markitdown --break-system-packages

# Keyboard hooks
pip install keyboard pynput --break-system-packages

# GUI for dialogs
pip install tkinter --break-system-packages  # May be pre-installed

OCR engine (Tesseract):

Windows:

# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki
# Install to: C:\Program Files\Tesseract-OCR\
# Add to PATH

macOS:

brew install tesseract

Linux:

sudo apt-get install tesseract-ocr
# or
sudo dnf install tesseract

Optional enhancements:

# Better OCR (EasyOCR - slower but more accurate)
pip install easyocr --break-system-packages

# PDF handling
pip install pdf2image pypdf2 --break-system-packages

# Image enhancement
pip install opencv-python --break-system-packages

# Clipboard integration
pip install pyperclip --break-system-packages

See reference/setup-guide.md for detailed installation.

Features

Capture Modes

1. Region Selection

Click and drag to select area
Real-time preview
Pixel-perfect selection

2. Window Capture

Automatically detect windows
Capture specific application
Includes/excludes borders

3. Full Screen

Entire display
Multi-monitor support
All screens at once

4. Scrolling Capture

Capture long web pages
Auto-scroll and stitch
Perfect for documentation

Text Extraction

OCR Engines:

Tesseract - Fast, free, 100+ languages
EasyOCR - Slower, more accurate
Cloud OCR - Azure/Google (highest accuracy)

Smart text processing:

Automatic language detection
Text cleanup and formatting
Table recognition
Layout preservation

Markdown Conversion

Using MarkItDown (Microsoft):

Images → Markdown with alt text
PDFs → Clean markdown
Screenshots → Formatted text
Tables → Markdown tables
Code blocks → Syntax highlighting

Conversion features:

Smart heading detection
List preservation
Link extraction
Code formatting
Table structure recognition

Core Operations

Quick Capture

Keyboard shortcut:

# Run as background service
python scripts/screenshot_service.py

# Now press PrtSc anytime:
# 1. Screen freezes
# 2. Choose "Image" or "Text"
# 3. Select region
# 4. Auto-process and save

Command line:

# Capture with UI
python scripts/capture.py

# Capture full screen immediately
python scripts/capture.py --fullscreen --output screenshot.png

# Capture region with coordinates
python scripts/capture.py --region 100,100,800,600 --output region.png

Text Mode (OCR → Markdown)

Interactive:

# Start capture
python scripts/capture.py --mode text

# Process:
# 1. Select region
# 2. OCR extracts text
# 3. MarkItDown formats
# 4. Save dialog opens
# 5. Save as .md file

Automatic:

# Capture and OCR
python scripts/capture_text.py --output extracted.md

# With specific language
python scripts/capture_text.py --lang eng+fra --output text.md

# With enhancement
python scripts/capture_text.py --enhance --output clean.md

Image Mode

Interactive:

# Start capture
python scripts/capture.py --mode image

# Process:
# 1. Select region
# 2. Annotation tools appear
# 3. Add arrows, boxes, text
# 4. Save dialog opens

With annotations:

# Capture and annotate
python scripts/capture_annotate.py --output annotated.png

# Annotation tools:
# - Arrow
# - Rectangle
# - Circle
# - Text
# - Highlight
# - Blur (redact sensitive info)

PDF to Markdown

Convert PDF to markdown:

# Using MarkItDown
python scripts/pdf_to_markdown.py --input document.pdf --output document.md

# With OCR for scanned PDFs
python scripts/pdf_to_markdown.py --input scanned.pdf --ocr --output text.md

# Batch convert folder
python scripts/batch_pdf_convert.py --input ./pdfs/ --output ./markdown/

Screenshot from Image

Process existing image:

# Extract text to markdown
python scripts/image_to_markdown.py --input screenshot.png --output text.md

# Clean up image first
python scripts/enhance_and_extract.py --input noisy.png --output clean.md

Configuration

Settings file: config.yaml

# Keyboard shortcut
hotkey: "Print"  # or "ctrl+shift+s", "cmd+shift+5", etc.

# Default capture mode
default_mode: "prompt"  # "image", "text", or "prompt"

# OCR settings
ocr:
  engine: "tesseract"  # "tesseract", "easyocr", or "cloud"
  language: "eng"
  enhance: true  # Pre-process image for better OCR

# Output settings
output:
  directory: "~/Screenshots"
  filename_pattern: "Screenshot-{date}-{time}"
  auto_save: false  # true = skip save dialog
  clipboard: true   # Copy to clipboard

# Markdown settings
markdown:
  format_code_blocks: true
  detect_tables: true
  preserve_formatting: true
  
# Annotation defaults
annotation:
  arrow_color: "#FF0000"
  box_color: "#0000FF"
  text_color: "#000000"
  text_size: 12
  line_width: 2

Common Workflows

Workflow 1: Code Documentation

Scenario: Capture code from screen → Markdown documentation

# 1. Run screenshot service
python scripts/screenshot_service.py &

# 2. Press PrtSc on your keyboard

# 3. Select "Text" mode

# 4. Select code region on screen

# 5. OCR extracts code

# 6. MarkItDown formats as code block:
```python
def example_function():
    return "formatted code"

7. Save dialog opens → Save as code-snippet.md


### Workflow 2: Meeting Notes from Slides

**Scenario:** Capture presentation slides → Formatted notes

```bash
# Capture multiple slides
python scripts/capture_sequence.py \
  --count 5 \
  --delay 3 \
  --mode text \
  --output slides.md

# Result: All slides as markdown in one file

Workflow 3: Email/Document Processing

Scenario: Screenshot email → Extract and format text

# Capture email
python scripts/capture.py --mode text --enhance

# Text extracted, formatted, and saved
# Perfect for archiving or processing

Workflow 4: Research Paper Annotation

Scenario: Screenshot paper → Annotate → Save

# Capture and annotate
python scripts/capture_annotate.py --output paper-notes.png

# Add arrows, highlights, notes
# Save annotated version

Workflow 5: Batch PDF Conversion

Scenario: Convert all PDFs to markdown

# Convert folder of PDFs
python scripts/batch_pdf_convert.py \
  --input ~/Documents/PDFs/ \
  --output ~/Documents/Markdown/ \
  --ocr  # Enable OCR for scanned docs
  
# Progress shown for each file
# All PDFs → Clean markdown

MarkItDown Features

Microsoft's MarkItDown converts:

Images:

Screenshots → Extracted text
Diagrams → Alt text descriptions
Charts → Data tables

PDFs:

Native PDFs → Clean markdown
Scanned PDFs → OCR + markdown
Preserve structure and formatting

Documents:

Word docs → Markdown
PowerPoint → Slide content
Excel → Markdown tables

Code:

Syntax highlighted code blocks
Language detection
Proper indentation

Tables:

Visual tables → Markdown tables
Preserved alignment
Header detection

Keyboard Shortcuts

During capture:

Esc - Cancel capture
Space - Toggle crosshair/selection
Enter - Confirm selection
Ctrl+Z - Undo annotation
Ctrl+C - Copy to clipboard
Ctrl+S - Save

Annotation mode:

A - Arrow tool
R - Rectangle tool
C - Circle tool
T - Text tool
H - Highlight tool
B - Blur tool
Delete - Remove last annotation

OCR Accuracy Tips

Better results:

Enhance image first - Increase contrast, denoise
Correct language - Specify language(s)
Proper DPI - Higher resolution = better OCR
Clean background - Remove clutter
Good lighting - For camera captures
Straight text - Rotate if needed

Pre-processing:

# Enhance before OCR
python scripts/enhance_image.py \
  --input screenshot.png \
  --output enhanced.png \
  --operations "grayscale,contrast,denoise"

# Then OCR
python scripts/image_to_markdown.py --input enhanced.png

Multi-Monitor Support

Capture from specific monitor:

# List monitors
python scripts/list_monitors.py

# Capture from monitor 2
python scripts/capture.py --monitor 2

# Capture all monitors
python scripts/capture.py --all-monitors

Save Dialog Options

When save dialog appears:

Filename: Auto-generated or custom
Location: Last used or default directory
Format: .md, .txt, .png, .jpg
Options:
- Copy to clipboard
- Open in editor
- Share

Skip dialog (auto-save):

# config.yaml
output:
  auto_save: true
  directory: "~/Screenshots"

Integration with Clipboard

Copy to clipboard automatically:

# Text mode - copies markdown
python scripts/capture.py --mode text --clipboard

# Image mode - copies image
python scripts/capture.py --mode image --clipboard

# Both
python scripts/capture.py --clipboard --save

Paste from clipboard:

# Process clipboard image
python scripts/process_clipboard.py --output result.md

Running as Service

Windows

Install as Windows service:

# Install
python scripts/install_windows_service.py

# Service runs on startup
# PrtSc always available

Or use Task Scheduler:

1. Open Task Scheduler
2. Create Basic Task
3. Trigger: At log on
4. Action: Start program
5. Program: python
6. Arguments: path\to\screenshot_service.py

macOS

LaunchAgent setup:

# Install service
python scripts/install_macos_service.py

# Creates ~/Library/LaunchAgents/com.screenshot.service.plist
# Runs on login

Manual:

# Create LaunchAgent plist
# Load with launchctl
launchctl load ~/Library/LaunchAgents/com.screenshot.service.plist

Linux

Systemd service:

# Install
python scripts/install_linux_service.py

# Creates ~/.config/systemd/user/screenshot.service
# Enable and start:
systemctl --user enable screenshot
systemctl --user start screenshot

Or use autostart:

# Copy desktop entry
cp screenshot.desktop ~/.config/autostart/

Cloud OCR (Optional)

For highest accuracy:

Azure Computer Vision:

export AZURE_CV_KEY="your-key"
export AZURE_CV_ENDPOINT="https://your-region.api.cognitive.microsoft.com/"

python scripts/capture.py --mode text --ocr-engine azure

Google Vision:

export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

python scripts/capture.py --mode text --ocr-engine google

Costs:

Azure: 1000 transactions/month free, then $1/1000
Google: 1000 units/month free, then $1.50/1000

Scripts Reference

Capture:

capture.py - Main interactive capture
capture_text.py - Text mode only
capture_annotate.py - Image with annotations
capture_sequence.py - Multiple captures

Service:

screenshot_service.py - Background service
install_windows_service.py - Windows installer
install_macos_service.py - macOS installer
install_linux_service.py - Linux installer

Conversion:

image_to_markdown.py - Image → Markdown
pdf_to_markdown.py - PDF → Markdown
batch_pdf_convert.py - Batch conversion

Processing:

enhance_image.py - Image enhancement
process_clipboard.py - Clipboard processing
extract_tables.py - Table extraction

Utilities:

list_monitors.py - List displays
test_ocr.py - Test OCR accuracy
configure.py - Interactive config

Best Practices

Run as service - Always available with hotkey
Configure hotkey - Choose comfortable shortcut
Enable clipboard - Quick copy-paste workflow
Enhance first - Better OCR results
Use appropriate OCR - Tesseract for speed, Cloud for accuracy
Organize output - Set default directory
Backup settings - Save config.yaml
Test thoroughly - Verify OCR accuracy for your use case

Troubleshooting

"Tesseract not found"

# Install Tesseract
# Windows: Download installer
# macOS: brew install tesseract
# Linux: apt install tesseract-ocr

# Check installation
tesseract --version

"Permission denied" (screenshot)

Windows: Run as Administrator
macOS: System Preferences → Security → Privacy → Screen Recording
Linux: Check X11 permissions

"Keyboard hook failed"

# Requires administrator/root privileges
# Windows: Run as Administrator
# macOS: Grant Accessibility permissions
# Linux: Run with sudo or add user to input group

"Poor OCR quality"

# Enhance image first
python scripts/enhance_image.py --input screenshot.png

# Try different OCR engine
python scripts/capture.py --ocr-engine easyocr

# Specify language
python scripts/capture.py --lang eng+fra

"MarkItDown not working"

pip install --upgrade markitdown --break-system-packages

# Check version
python -c "import markitdown; print(markitdown.__version__)"

Platform-Specific Notes

Windows

PrtSc key native support
Windows Ink integration available
OneDrive sync compatible
Notification system integration

macOS

cmd+shift+5 alternative
Quick Look preview
iCloud Drive sync
Notification Center integration

Linux

Wayland/X11 support
Various hotkey daemons
Desktop environment integration
Screenshot directories vary

Integration Examples

See examples/ for complete workflows:

examples/documentation-workflow.md - Code docs
examples/research-notes.md - Paper processing
examples/meeting-capture.md - Meeting slides
examples/email-archival.md - Email processing

Reference Documentation

reference/setup-guide.md - Complete setup
reference/ocr-engines.md - OCR comparison
reference/markitdown-guide.md - MarkItDown features
reference/hotkey-config.md - Keyboard shortcuts
reference/service-install.md - Service setup

smart-screenshot

Smart Screenshot

Quick Start

Prerequisites

System Requirements

Install Dependencies

Features

Capture Modes

Text Extraction

Markdown Conversion

Core Operations

Quick Capture

Text Mode (OCR → Markdown)

Image Mode

PDF to Markdown

Screenshot from Image

Configuration

Common Workflows

Workflow 1: Code Documentation

7. Save dialog opens → Save as code-snippet.md

Workflow 3: Email/Document Processing

Workflow 4: Research Paper Annotation

Workflow 5: Batch PDF Conversion

MarkItDown Features

Keyboard Shortcuts

OCR Accuracy Tips

Multi-Monitor Support

Save Dialog Options

Integration with Clipboard

Running as Service

Windows

macOS

Linux

Cloud OCR (Optional)

Scripts Reference

Best Practices

Troubleshooting

Platform-Specific Notes

Windows

macOS

Linux

Integration Examples

Reference Documentation