ML Visualizer
A data toolkit for ingesting, transforming, querying, and visualizing machine learning datasets. Manage your entire data pipeline — from raw ingestion through profiling and validation — all from the command line.
Commands
| Command | Description |
|---------|-------------|
| ml-visualizer ingest <input> | Ingest raw data or record a data source entry |
| ml-visualizer transform <input> | Log a data transformation step or operation |
| ml-visualizer query <input> | Record a query against your dataset |
| ml-visualizer filter <input> | Log a filter operation applied to data |
| ml-visualizer aggregate <input> | Record an aggregation or rollup operation |
| ml-visualizer visualize <input> | Log a visualization request or chart specification |
| ml-visualizer export <input> | Record an export operation or export all data |
| ml-visualizer sample <input> | Log a data sampling operation |
| ml-visualizer schema <input> | Record or describe a data schema |
| ml-visualizer validate <input> | Log a data validation check |
| ml-visualizer pipeline <input> | Record a full pipeline definition or step |
| ml-visualizer profile <input> | Log a data profiling run |
| ml-visualizer stats | Show summary statistics across all entry types |
| ml-visualizer export <fmt> | Export all data (formats: json, csv, txt) |
| ml-visualizer search <term> | Search across all entries by keyword |
| ml-visualizer recent | Show the 20 most recent activity log entries |
| ml-visualizer status | Health check — version, disk usage, last activity |
| ml-visualizer help | Show the built-in help message |
| ml-visualizer version | Print the current version (v2.0.0) |
Each data command (ingest, transform, query, etc.) works in two modes:
- Without arguments — displays the 20 most recent entries of that type
- With arguments — saves the input as a new timestamped entry
Data Storage
All data is stored as plain-text log files in ~/.local/share/ml-visualizer/:
- Each command type gets its own log file (e.g.,
ingest.log,transform.log,visualize.log) - Entries are stored in
timestamp|valueformat for easy parsing - A unified
history.logtracks all activity across command types - Export to JSON, CSV, or TXT at any time with the
exportcommand
Set the ML_VISUALIZER_DIR environment variable to override the default data directory.
Requirements
- Bash 4.0+ (uses
set -euo pipefail) - Standard Unix utilities:
date,wc,du,tail,grep,sed,cat - No external dependencies or API keys required
When to Use
- Building a data pipeline journal — use
ingest,transform, andpipelineto document each step of your ML data preparation workflow - Tracking data quality — use
validateandprofileto log validation checks and profiling runs, ensuring data integrity before model training - Logging visualization requests — use
visualizeto record what charts and plots you've generated for model diagnostics (confusion matrices, ROC curves, feature importance) - Managing dataset schemas — use
schemato document the structure of your datasets, track schema changes over time, and share definitions with your team - Auditing data operations — use
search,recent, andstatsto review your complete data processing history and find specific operations
Examples
# Ingest a new data source
ml-visualizer ingest "Loaded training set from s3://ml-data/train.csv — 50,000 rows, 24 features"
# Record a transformation step
ml-visualizer transform "Applied StandardScaler to numeric columns, one-hot encoded categoricals"
# Log a visualization
ml-visualizer visualize "Generated confusion matrix for RandomForest classifier — 94% accuracy"
# Define a schema entry
ml-visualizer schema "users table: id(int), age(int), income(float), segment(str), churn(bool)"
# Search past operations
ml-visualizer search "StandardScaler"
Output
All commands print results to stdout. Redirect to a file if needed:
ml-visualizer stats > pipeline-report.txt
ml-visualizer export json
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
微信扫一扫