sql-splitter Skill
This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.
When to Use This Skill
Use sql-splitter when:
- The user mentions SQL dump files (
.sql,.sql.gz,.sql.bz2,.sql.xz,.sql.zst) - The user wants to migrate, restore, or work with database dump files
- The user needs to validate, analyze, split, merge, convert, sample, shard, or query dumps
- Working with MySQL, PostgreSQL, SQLite, or MSSQL dump formats
- The user wants to run SQL analytics on a dump file without loading it into a database
When NOT to Use This Skill
Do not use sql-splitter when:
- Running complex ad-hoc SQL queries against a live database (use
psql/mysql/sqlcmddirectly) - No dump file exists; only a running database is available
- The user needs interactive data editing rather than dump manipulation
- Working with dialects beyond MySQL/PostgreSQL/SQLite/MSSQL
- Working with MSSQL binary backup files (.bak) or DACPAC/BACPAC formats (only script-based .sql dumps are supported)
Command Reference
split
Split a dump into per-table files.
sql-splitter split dump.sql --output tables/ --progress
sql-splitter split dump.sql --tables users,orders --output tables/
sql-splitter split dump.sql --schema-only --output schema/
sql-splitter split dump.sql --data-only --output data/
merge
Merge per-table files back into a single dump.
sql-splitter merge tables/ --output restored.sql
sql-splitter merge tables/ --output restored.sql --transaction
sql-splitter merge tables/ --exclude logs,cache --output restored.sql
analyze
Get statistics about a dump (read-only).
sql-splitter analyze dump.sql --progress
sql-splitter analyze "dumps/*.sql" --fail-fast
convert
Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).
sql-splitter convert mysql.sql --to postgres --output pg.sql
sql-splitter convert pg_dump.sql --to mysql --output mysql.sql
sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
sql-splitter convert mysql.sql --to mssql --output mssql.sql
sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"
validate
Check dump integrity (syntax, encoding, PK/FK).
sql-splitter validate dump.sql --strict --progress
sql-splitter validate "dumps/*.sql" --json --fail-fast
sql-splitter validate dump.sql --no-fk-checks --progress
sample
Create reduced datasets with FK preservation.
sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders
sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42
shard
Extract tenant-specific data.
sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql
sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/
diff
Compare two SQL dumps for schema and data changes.
sql-splitter diff old.sql new.sql --progress
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
sql-splitter diff old.sql new.sql --format json --output diff.json
sql-splitter diff old.sql new.sql --format sql --output migration.sql
sql-splitter diff old.sql new.sql --tables users,orders --progress
sql-splitter diff old.sql new.sql --verbose # Show sample PKs
sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at" # Ignore columns
sql-splitter diff old.sql new.sql --primary-key logs:ts+msg # Override PK
sql-splitter diff old.sql new.sql --allow-no-pk # Tables without PK
redact
Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.
# Using YAML config file
sql-splitter redact dump.sql --output safe.sql --config redact.yaml
# Using CLI flags
sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"
# Mask credit cards (keep last 4 digits)
sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card"
# Generate config by analyzing input file
sql-splitter redact dump.sql --generate-config --output redact.yaml
# Reproducible with seed
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42
# Validate config only
sql-splitter redact dump.sql --config redact.yaml --validate
# With specific locale for fake data
sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de
Strategies:
--null "pattern": Replace with NULL--hash "pattern": SHA256 hash (deterministic, preserves FK integrity)--fake "pattern": Generate realistic fake data--mask "pattern=column": Partial masking--constant "column=value": Fixed value replacement
Fake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.
graph
Generate Entity-Relationship Diagrams (ERD) from SQL dumps.
# Interactive HTML ERD with dark/light mode and panzoom
sql-splitter graph dump.sql --output schema.html
# Graphviz DOT format with ERD-style tables
sql-splitter graph dump.sql --output schema.dot
# Mermaid erDiagram syntax (paste into GitHub/GitLab)
sql-splitter graph dump.sql --output schema.mmd --format mermaid
# JSON with full schema details
sql-splitter graph dump.sql --json
# Filter tables
sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*"
# Show only circular dependencies
sql-splitter graph dump.sql --cycles-only
# Focus on specific table and its dependencies
sql-splitter graph dump.sql --table orders --transitive
# Show tables that depend on users
sql-splitter graph dump.sql --table users --reverse
order
Reorder SQL dump in topological FK order for safe imports.
# Rewrite in safe import order
sql-splitter order dump.sql --output ordered.sql
# Check for cycles without rewriting
sql-splitter order dump.sql --check
# Reverse order (for DROP operations)
sql-splitter order dump.sql --reverse --output drop_order.sql
query
Run SQL analytics on dump files using embedded DuckDB (no database required).
# Single query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
# Interactive REPL
sql-splitter query dump.sql --interactive
# Export to JSON/CSV
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json
sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv
# With caching (400x faster on repeated queries)
sql-splitter query dump.sql "SELECT ..." --cache
# Disk mode for large dumps (>2GB auto-enabled)
sql-splitter query huge.sql "SELECT ..." --disk
# Filter tables to import (faster startup)
sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users
# Memory limit
sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB
# Cache management
sql-splitter query --list-cache
sql-splitter query --clear-cache
REPL commands:
.tables- List all tables.schema [table]- Show schema.describe <table>- Describe table.count <table>- Count rows.sample <table> [n]- Sample rows.format <fmt>- Set output format (table, json, csv, tsv).export <file> <query>- Export query results.exit- Exit REPL
Step-by-Step Patterns
Pattern 1: Validate Before Use
Before using any dump from an external source:
-
Validate integrity
sql-splitter validate path/to/dump.sql.gz --strict --progress -
If validation fails, check:
- Incorrect dialect? Try
--dialect=postgres,--dialect=mysql, or--dialect=mssql - Encoding issues? Report specific errors to user
- Truncated file? Check file size and completeness
- For MSSQL: Ensure GO batch separators are on their own lines
- Incorrect dialect? Try
-
Analyze structure
sql-splitter analyze path/to/dump.sql.gz --progress
Pattern 2: Database Migration
For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):
-
Validate source
sql-splitter validate source.sql.gz --strict --progress -
Convert dialect
sql-splitter convert source.sql.gz --to postgres --output target.sql --strict # or for MSSQL sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql -
Validate converted output
sql-splitter validate target.sql --dialect=postgres --strict -
Or stream directly
sql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"
Pattern 3: Create Dev Dataset
For creating smaller realistic data for development:
-
Analyze to understand sizes
sql-splitter analyze prod.sql.zst --progress -
Sample with FK preservation
sql-splitter sample prod.sql.zst \ --output dev_seed.sql \ --percent 10 \ --preserve-relations \ --progress -
Restore to dev database
psql "$DEV_DB" < dev_seed.sql
Pattern 4: CI Validation Gate
For validating dumps in CI pipelines:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict
Parse with jq:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \
| jq '.results[] | select(.passed == false)'
Pattern 5: Per-Table Editing
When the user needs to edit specific tables:
-
Split
sql-splitter split dump.sql --output tables/ --progress -
Edit the per-table files (
tables/users.sql, etc.) -
Merge back
sql-splitter merge tables/ --output updated.sql --transaction
Pattern 6: Tenant Extraction
For multi-tenant databases:
-
Identify tenant column (often
tenant_id,account_id,company_id) -
Extract tenant data
sql-splitter shard dump.sql \ --tenant-value 12345 \ --tenant-column tenant_id \ --output tenant_12345.sql \ --progress
Pattern 7: Comparing Dumps for Changes
For detecting schema or data changes between two versions:
-
Full comparison (schema + data)
sql-splitter diff old_dump.sql new_dump.sql --progress -
Schema-only comparison (fast, no data parsing)
sql-splitter diff old.sql new.sql --schema-only -
Generate migration script
sql-splitter diff old.sql new.sql --format sql --output migration.sql -
JSON output for automation
sql-splitter diff old.sql new.sql --format json | jq '.summary'
Pattern 8: Data Anonymization
For creating safe development/testing datasets:
-
Generate redaction config by analyzing dump
sql-splitter redact dump.sql --generate-config --output redact.yaml -
Review and customize the generated config
-
Apply redaction
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress -
Or use inline patterns for quick redaction
sql-splitter redact dump.sql --output safe.sql \ --null "*.ssn,*.tax_id" \ --hash "*.email" \ --fake "*.name,*.phone" -
Validate the redacted output
sql-splitter validate safe.sql --strict
Pattern 9: Schema Visualization
For understanding complex database schemas:
-
Generate interactive ERD
sql-splitter graph dump.sql --output schema.html # Opens in browser with dark/light mode, zoom/pan -
For documentation (Mermaid)
sql-splitter graph dump.sql --output docs/schema.mmd --format mermaid # Paste into GitHub/GitLab/Notion -
Focus on specific area
# What does orders depend on? sql-splitter graph dump.sql --table orders --transitive --output orders.html # What depends on users? sql-splitter graph dump.sql --table users --reverse --output users_deps.html -
Find circular dependencies
sql-splitter graph dump.sql --cycles-only
Pattern 10: Safe Import Order
For ensuring FK constraints don't fail during restore:
-
Check for cycles
sql-splitter order dump.sql --check -
Reorder if needed
sql-splitter order dump.sql --output ordered.sql -
For DROP operations (reverse order)
sql-splitter order dump.sql --reverse --output drop_order.sql
Pattern 11: Ad-hoc SQL Analytics
For running SQL queries on dump files without loading into a database:
-
Quick exploratory query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users" -
Interactive exploration (REPL)
sql-splitter query dump.sql --interactive # sql> .tables # sql> SELECT * FROM orders LIMIT 10 # sql> .count users -
Export analysis results
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv -
Repeated queries with caching (400x speedup)
# First run imports and caches sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache # Subsequent runs use cache sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache -
Complex analytics
sql-splitter query dump.sql " SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name ORDER BY total_spent DESC LIMIT 10 " -f json
Common Flag Combinations
| Goal | Flags |
|------|-------|
| CI validation | --strict --fail-fast --json |
| Safe exploration | --dry-run --progress |
| Reproducible sampling | --seed 42 --preserve-relations |
| Fast progress feedback | --progress |
| Compressed output | Pipe to gzip -c or zstd -c |
Error Handling
Dialect Detection Issues
If auto-detection fails, specify explicitly:
sql-splitter validate dump.sql --dialect=postgres
sql-splitter validate mssql_dump.sql --dialect=mssql
sql-splitter convert dump.sql --from mysql --to postgres --output out.sql
sql-splitter convert dump.sql --from mssql --to mysql --output out.sql
Validation Failures
- Parse
--jsonoutput for specific errors - Check for encoding issues, missing FKs, duplicate PKs
- Use
--no-fk-checksto skip expensive integrity checks
Large Files
- sql-splitter uses constant ~50MB memory
- Downstream tools may be bottlenecks
- Test with
samplebefore full operations
Implementation Checklist
When using this skill:
- Detect applicability: Check for
.sqlfiles or dump-related tasks - Clarify intent: Validation? Conversion? Sampling? Splitting?
- Choose pattern: Map goal to one of the patterns above
- Propose plan: Explain steps before executing
- Use safe flags:
--dry-runfirst, then--progressfor feedback - Summarize results: Report success/failure with key stats
Scan to join WeChat group