ML Visualizer

A data toolkit for ingesting, transforming, querying, and visualizing machine learning datasets. Manage your entire data pipeline — from raw ingestion through profiling and validation — all from the command line.

Commands

| Command | Description | |---------|-------------| | ml-visualizer ingest <input> | Ingest raw data or record a data source entry | | ml-visualizer transform <input> | Log a data transformation step or operation | | ml-visualizer query <input> | Record a query against your dataset | | ml-visualizer filter <input> | Log a filter operation applied to data | | ml-visualizer aggregate <input> | Record an aggregation or rollup operation | | ml-visualizer visualize <input> | Log a visualization request or chart specification | | ml-visualizer export <input> | Record an export operation or export all data | | ml-visualizer sample <input> | Log a data sampling operation | | ml-visualizer schema <input> | Record or describe a data schema | | ml-visualizer validate <input> | Log a data validation check | | ml-visualizer pipeline <input> | Record a full pipeline definition or step | | ml-visualizer profile <input> | Log a data profiling run | | ml-visualizer stats | Show summary statistics across all entry types | | ml-visualizer export <fmt> | Export all data (formats: json, csv, txt) | | ml-visualizer search <term> | Search across all entries by keyword | | ml-visualizer recent | Show the 20 most recent activity log entries | | ml-visualizer status | Health check — version, disk usage, last activity | | ml-visualizer help | Show the built-in help message | | ml-visualizer version | Print the current version (v2.0.0) |

Each data command (ingest, transform, query, etc.) works in two modes:

Without arguments — displays the 20 most recent entries of that type
With arguments — saves the input as a new timestamped entry

Data Storage

All data is stored as plain-text log files in ~/.local/share/ml-visualizer/:

Each command type gets its own log file (e.g., ingest.log, transform.log, visualize.log)
Entries are stored in timestamp|value format for easy parsing
A unified history.log tracks all activity across command types
Export to JSON, CSV, or TXT at any time with the export command

Set the ML_VISUALIZER_DIR environment variable to override the default data directory.

Requirements

Bash 4.0+ (uses set -euo pipefail)
Standard Unix utilities: date, wc, du, tail, grep, sed, cat
No external dependencies or API keys required

When to Use

Building a data pipeline journal — use ingest, transform, and pipeline to document each step of your ML data preparation workflow
Tracking data quality — use validate and profile to log validation checks and profiling runs, ensuring data integrity before model training
Logging visualization requests — use visualize to record what charts and plots you've generated for model diagnostics (confusion matrices, ROC curves, feature importance)
Managing dataset schemas — use schema to document the structure of your datasets, track schema changes over time, and share definitions with your team
Auditing data operations — use search, recent, and stats to review your complete data processing history and find specific operations

Examples

# Ingest a new data source
ml-visualizer ingest "Loaded training set from s3://ml-data/train.csv — 50,000 rows, 24 features"

# Record a transformation step
ml-visualizer transform "Applied StandardScaler to numeric columns, one-hot encoded categoricals"

# Log a visualization
ml-visualizer visualize "Generated confusion matrix for RandomForest classifier — 94% accuracy"

# Define a schema entry
ml-visualizer schema "users table: id(int), age(int), income(float), segment(str), churn(bool)"

# Search past operations
ml-visualizer search "StandardScaler"

Output

All commands print results to stdout. Redirect to a file if needed:

ml-visualizer stats > pipeline-report.txt
ml-visualizer export json