ML Batch Processing Pattern

Classification

Domain: Computer Science, AI/ML
Category: ML System Design Patterns
Novelty: 6/10 (established pattern with modern evolution)
Practitioner Evidence: 10/10 (Google, industry standard)

Mental Model

Batch processing decouples prediction from real-time requests by pre-computing predictions on scheduled intervals. Like meal prep for the week instead of cooking each meal on-demand—you process predictions in bulk during off-peak hours, store results, and serve them instantly when requested.

When to Use

Predictions needed for all users/items at regular intervals (daily recommendations, weekly reports)
Training data arrives in batches rather than continuously
Cost optimization prioritized over real-time freshness (batch = cheaper compute)
Predictions can tolerate staleness (hours/days old acceptable)
High-throughput scenarios where latency isn't critical

Core Framework

1. Schedule Determination

Identify prediction cadence based on business requirements

Daily batch: Nightly recommendation refresh for morning users
Hourly batch: Stock predictions updating each trading hour
Weekly batch: Monthly subscription churn predictions
Event-triggered: Batch after data warehouse ETL completion

2. Data Ingestion Setup

Configure batch data pipeline from sources to ML system

Extract from data warehouse/data lake (BigQuery, Snowflake, S3)
Apply feature transformations matching training pipeline
Validate schema consistency with model expectations
Handle missing values using same imputation as training

3. Distributed Processing Architecture

Parallelize prediction computation across infrastructure

Use MapReduce/Spark for horizontal scaling across datasets
Partition data by entity (user_id, product_id) for independent processing
Configure batch size based on memory constraints (1K-100K records/batch)
Implement checkpointing for fault tolerance on long-running jobs

4. Model Serving Configuration

Deploy model in batch-optimized inference mode

Load model once per batch job (avoid reload overhead)
Use batch prediction APIs (TensorFlow batch_predict, PyTorch batch inference)
Enable GPU batching for deep learning models (32-512 samples/batch)
Leverage model compilation (TensorRT, ONNX) for throughput optimization

5. Prediction Storage Design

Store pre-computed predictions for fast lookup

Key-value store for individual lookups (Redis, DynamoDB: user_id → prediction)
Columnar storage for analytics (Parquet, BigQuery: all predictions for analysis)
Include metadata (model_version, prediction_timestamp, confidence_score)
Set TTL based on batch frequency (1.5x batch interval for overlap)

6. Keyed Predictions Pattern

Enable distributed batch prediction with result matching

Attach unique keys to input records (primary keys, composite keys)
Preserve keys through prediction pipeline (input → features → predictions)
Join predictions back to original entities using keys
Handle missing predictions (timeouts, errors) with fallback logic

7. Monitoring & Alerting

Track batch job health and prediction quality

Job completion metrics (duration, throughput, failure rate)
Data quality checks (null rate, distribution shifts, schema violations)
Model performance monitoring (prediction distribution, confidence intervals)
Alerting on batch failures or stale predictions (SLA breaches)

Practical Application

E-commerce Recommendation System

Problem: Generate personalized product recommendations for 10M users Batch Solution:

Nightly job extracts user behavior (purchases, views, clicks) from data warehouse
Spark cluster processes 10M users in parallel (10K users/partition × 1K partitions)
Recommendation model generates top-100 products per user (batch size: 256 users)
Predictions stored in Redis with 36-hour TTL (user_id → [product_ids + scores])
Web app reads pre-computed recommendations in <5ms (vs. 200ms real-time inference)

Credit Card Fraud Detection (Batch Component)

Problem: Update fraud risk scores for all accounts daily Batch Solution:

Daily batch (3am) processes all 50M accounts using last 30 days transactions
Feature engineering pipeline computes aggregates (transaction velocity, geography patterns)
XGBoost model scores all accounts (1M accounts/minute on 100-node cluster)
Risk scores stored in Aurora DB (account_id, risk_score, score_date)
Real-time transactions query batch scores + apply real-time rules for final decision

Edge Cases & Nuances

Cold Start Problem: New users/items without predictions

Fallback to popularity-based or demographic-based defaults
Trigger on-demand prediction for high-value new entities
Include new entities in next batch cycle with minimal features

Prediction Staleness: Batch predictions lag reality

Hybrid approach: batch for stable predictions + real-time updates for high-velocity features
Monitor staleness impact on business metrics (click-through rate decay over time)
Decrease batch interval if staleness hurts performance (daily → hourly)

Batch Job Failures: Incomplete or failed batch runs

Implement idempotent batch jobs (can safely re-run without duplicates)
Use transactional writes to prediction store (all-or-nothing semantics)
Maintain previous batch predictions as fallback until new batch succeeds

Cost vs. Freshness Tradeoff: More frequent batches = higher cost

Profile actual prediction change rate (how often do top-10 recommendations shift?)
A/B test batch frequencies to measure impact on engagement metrics
Use event-triggered batches for critical updates (product catalog changes)

Anti-Patterns

Batch for Latency-Critical Applications: Using batch for fraud detection that must block transactions in real-time Over-Engineering Batch Infrastructure: Building distributed system for 10K records processable on single machine Ignoring Data Freshness Requirements: Daily batches for inventory predictions when stock changes hourly No Fallback Strategy: System breaks when batch job fails with no stale predictions

Trade-offs

Batch vs. Online Inference:

Batch: Lower cost (bulk processing), higher latency (stale predictions), simpler ops (scheduled jobs)
Online: Higher cost (per-request compute), lower latency (fresh predictions), complex ops (SLA-driven)

Batch Frequency:

More frequent (hourly): Fresher predictions, higher compute cost, more operational complexity
Less frequent (daily): Stale predictions, lower cost, simpler ops, higher storage requirements

Distributed vs. Single-Node:

Distributed: Scales to billions of records, complex infrastructure, slower for small datasets
Single-node: Simple, fast for <10M records, memory/compute constraints, no fault tolerance

Related Frameworks

Streaming ML Pattern: Continuous prediction updates from streaming data (complements batch)
Online Learning Pattern: Incremental model updates as new data arrives (batch retraining alternative)
Lambda Architecture: Batch layer + speed layer for hybrid batch/streaming systems
Feature Store Pattern: Centralized feature computation for batch and online consistency

Practitioner Sources

Google ML Design Patterns (Lakshmanan et al.): Batch Serving pattern (#17), Keyed Predictions pattern
ML System Design: Batch vs. online prediction serving tradeoffs, architecture patterns
Apache Spark MLlib: Distributed batch prediction at scale, best practices
AWS SageMaker Batch Transform: Managed batch inference service, cost optimization