ML Batch Processing Pattern
Classification
- Domain: Computer Science, AI/ML
- Category: ML System Design Patterns
- Novelty: 6/10 (established pattern with modern evolution)
- Practitioner Evidence: 10/10 (Google, industry standard)
Mental Model
Batch processing decouples prediction from real-time requests by pre-computing predictions on scheduled intervals. Like meal prep for the week instead of cooking each meal on-demand—you process predictions in bulk during off-peak hours, store results, and serve them instantly when requested.
When to Use
- Predictions needed for all users/items at regular intervals (daily recommendations, weekly reports)
- Training data arrives in batches rather than continuously
- Cost optimization prioritized over real-time freshness (batch = cheaper compute)
- Predictions can tolerate staleness (hours/days old acceptable)
- High-throughput scenarios where latency isn't critical
Core Framework
1. Schedule Determination
Identify prediction cadence based on business requirements
- Daily batch: Nightly recommendation refresh for morning users
- Hourly batch: Stock predictions updating each trading hour
- Weekly batch: Monthly subscription churn predictions
- Event-triggered: Batch after data warehouse ETL completion
2. Data Ingestion Setup
Configure batch data pipeline from sources to ML system
- Extract from data warehouse/data lake (BigQuery, Snowflake, S3)
- Apply feature transformations matching training pipeline
- Validate schema consistency with model expectations
- Handle missing values using same imputation as training
3. Distributed Processing Architecture
Parallelize prediction computation across infrastructure
- Use MapReduce/Spark for horizontal scaling across datasets
- Partition data by entity (user_id, product_id) for independent processing
- Configure batch size based on memory constraints (1K-100K records/batch)
- Implement checkpointing for fault tolerance on long-running jobs
4. Model Serving Configuration
Deploy model in batch-optimized inference mode
- Load model once per batch job (avoid reload overhead)
- Use batch prediction APIs (TensorFlow batch_predict, PyTorch batch inference)
- Enable GPU batching for deep learning models (32-512 samples/batch)
- Leverage model compilation (TensorRT, ONNX) for throughput optimization
5. Prediction Storage Design
Store pre-computed predictions for fast lookup
- Key-value store for individual lookups (Redis, DynamoDB: user_id → prediction)
- Columnar storage for analytics (Parquet, BigQuery: all predictions for analysis)
- Include metadata (model_version, prediction_timestamp, confidence_score)
- Set TTL based on batch frequency (1.5x batch interval for overlap)
6. Keyed Predictions Pattern
Enable distributed batch prediction with result matching
- Attach unique keys to input records (primary keys, composite keys)
- Preserve keys through prediction pipeline (input → features → predictions)
- Join predictions back to original entities using keys
- Handle missing predictions (timeouts, errors) with fallback logic
7. Monitoring & Alerting
Track batch job health and prediction quality
- Job completion metrics (duration, throughput, failure rate)
- Data quality checks (null rate, distribution shifts, schema violations)
- Model performance monitoring (prediction distribution, confidence intervals)
- Alerting on batch failures or stale predictions (SLA breaches)
Practical Application
E-commerce Recommendation System
Problem: Generate personalized product recommendations for 10M users Batch Solution:
- Nightly job extracts user behavior (purchases, views, clicks) from data warehouse
- Spark cluster processes 10M users in parallel (10K users/partition × 1K partitions)
- Recommendation model generates top-100 products per user (batch size: 256 users)
- Predictions stored in Redis with 36-hour TTL (user_id → [product_ids + scores])
- Web app reads pre-computed recommendations in <5ms (vs. 200ms real-time inference)
Credit Card Fraud Detection (Batch Component)
Problem: Update fraud risk scores for all accounts daily Batch Solution:
- Daily batch (3am) processes all 50M accounts using last 30 days transactions
- Feature engineering pipeline computes aggregates (transaction velocity, geography patterns)
- XGBoost model scores all accounts (1M accounts/minute on 100-node cluster)
- Risk scores stored in Aurora DB (account_id, risk_score, score_date)
- Real-time transactions query batch scores + apply real-time rules for final decision
Edge Cases & Nuances
Cold Start Problem: New users/items without predictions
- Fallback to popularity-based or demographic-based defaults
- Trigger on-demand prediction for high-value new entities
- Include new entities in next batch cycle with minimal features
Prediction Staleness: Batch predictions lag reality
- Hybrid approach: batch for stable predictions + real-time updates for high-velocity features
- Monitor staleness impact on business metrics (click-through rate decay over time)
- Decrease batch interval if staleness hurts performance (daily → hourly)
Batch Job Failures: Incomplete or failed batch runs
- Implement idempotent batch jobs (can safely re-run without duplicates)
- Use transactional writes to prediction store (all-or-nothing semantics)
- Maintain previous batch predictions as fallback until new batch succeeds
Cost vs. Freshness Tradeoff: More frequent batches = higher cost
- Profile actual prediction change rate (how often do top-10 recommendations shift?)
- A/B test batch frequencies to measure impact on engagement metrics
- Use event-triggered batches for critical updates (product catalog changes)
Anti-Patterns
Batch for Latency-Critical Applications: Using batch for fraud detection that must block transactions in real-time Over-Engineering Batch Infrastructure: Building distributed system for 10K records processable on single machine Ignoring Data Freshness Requirements: Daily batches for inventory predictions when stock changes hourly No Fallback Strategy: System breaks when batch job fails with no stale predictions
Trade-offs
Batch vs. Online Inference:
- Batch: Lower cost (bulk processing), higher latency (stale predictions), simpler ops (scheduled jobs)
- Online: Higher cost (per-request compute), lower latency (fresh predictions), complex ops (SLA-driven)
Batch Frequency:
- More frequent (hourly): Fresher predictions, higher compute cost, more operational complexity
- Less frequent (daily): Stale predictions, lower cost, simpler ops, higher storage requirements
Distributed vs. Single-Node:
- Distributed: Scales to billions of records, complex infrastructure, slower for small datasets
- Single-node: Simple, fast for <10M records, memory/compute constraints, no fault tolerance
Related Frameworks
- Streaming ML Pattern: Continuous prediction updates from streaming data (complements batch)
- Online Learning Pattern: Incremental model updates as new data arrives (batch retraining alternative)
- Lambda Architecture: Batch layer + speed layer for hybrid batch/streaming systems
- Feature Store Pattern: Centralized feature computation for batch and online consistency
Practitioner Sources
- Google ML Design Patterns (Lakshmanan et al.): Batch Serving pattern (#17), Keyed Predictions pattern
- ML System Design: Batch vs. online prediction serving tradeoffs, architecture patterns
- Apache Spark MLlib: Distributed batch prediction at scale, best practices
- AWS SageMaker Batch Transform: Managed batch inference service, cost optimization
Scan to join WeChat group