返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

ff-ml-modeling

关于如何为梦幻足球球员预测模型提供机器学习和特征工程的专业指导。在构建预测模型、从球员统计数据中提取特征、选择合适的机器学习算法或解决特定于体育的机器学习挑战时使用此技能。涵盖了用于梦幻足球分析的特征工程模式、模型选择框架、验证策略以及可解释性技术。

person作者: jakexiaohubgithub

Machine Learning & Feature Engineering for Fantasy Football

Overview

Provide expert guidance on building ML-based player projection models using research-backed feature engineering patterns, appropriate model selection, and sports-specific validation strategies. Apply domain expertise to help design features, choose models, avoid common pitfalls, and create interpretable predictions.

When to Use This Skill

Trigger this skill for queries involving:

  • Feature engineering: "What features should I include?" "How do I create age curve features?" "What are good opportunity metrics?"
  • Model selection: "Which ML model should I use?" "Random Forest or XGBoost?" "When to use regularized regression?"
  • Validation strategies: "How do I validate sports models?" "What's wrong with standard cross-validation?" "How to avoid data leakage?"
  • Sports-specific challenges: "How to handle small sample sizes?" "How to model position differences?" "Handling regime changes?"
  • Feature selection: "How to reduce 109 stats to key features?" "Lasso vs Ridge?" "How to handle multicollinearity?"
  • Model interpretability: "How to explain predictions?" "What features matter most?" "SHAP values for fantasy?"

Note: For dynasty strategy questions (player valuation, trade analysis, roster construction), use ff-dynasty-strategy. For statistical methods (regression types, simulations, GAMs), use ff-statistical-methods.

Core Capabilities

1. Feature Engineering

Core Principle: Feature engineering is more important than model selection for sports predictions.

Key Feature Categories:

Age Curves

  • Marcel system: 3-year weighted average + age adjustment + regression to mean
  • Position-specific peaks: RB 23-26, WR 26-28, QB 28-33, TE 26-29
  • Implementation: age_factor = 1 - (age - peak_age) * 0.003 for decline phase

Opportunity Metrics

  • Target share, snap share, weighted opportunities (carries + targets×1.5)
  • Points per opportunity (efficiency measure)
  • Volume is king: opportunity metrics predict better than TDs

Efficiency Statistics

  • Yards per route run (YPRR), yards per carry (YPC)
  • Yards after contact (YAC), catch rate
  • Warning: Noisy with small samples, use rolling averages

Interaction Terms

  • QB quality × target share (receiver production context)
  • Opponent strength adjustments
  • Game script (leading = rushing, trailing = passing)
  • ~40% of team performance from synergy effects

Rolling Averages

  • Last 3 games, last 5 games, season-long
  • Trend features: recent form vs established baseline
  • Lag features: last game, same opponent last season

Reference: references/feature_engineering.md for formulas, implementation patterns, and common mistakes.

2. Model Selection

Decision Framework:

Primary Goal?
├─ Interpretability → Linear/Ridge/Lasso Regression
└─ Performance
   ├─ Small (<1000) → Ridge/Lasso/Elastic Net
   ├─ Medium (1K-10K) → Random Forest or XGBoost
   └─ Large (>10K) → XGBoost/LightGBM or Ensemble

Model Types:

Linear Regression: Baseline, interpretability, small samples

Regularized Regression: High-dimensional data, multicollinearity, automatic feature selection (Lasso)

Random Forest: Medium data, robustness, feature importance

XGBoost/LightGBM: Best single-model performance, handles missing values

Ensemble: Combine Ridge + RF + XGBoost (weighted 1:2:2), often 2-5% improvement

Position-Specific Modeling: Train separate models per position (RB features ≠ WR features)

Reference: references/model_selection.md for detailed comparisons, hyperparameters, and implementation.

3. Validation Strategies

Critical Rule: NEVER use standard cross-validation with shuffle=True

Wrong: KFold(n_splits=5, shuffle=True) → Data leakage!

Correct: TimeSeriesSplit(n_splits=5) → Train on past, test on future

Time-Series Split: Always predict future from past data

Appropriate Metrics:

  • MAE (Mean Absolute Error): Most interpretable
  • RMSE: Penalizes large errors
  • R²: Proportion of variance explained

Nested Cross-Validation: Outer loop for evaluation, inner loop for hyperparameter tuning

Reference: references/validation_strategies.md for detailed workflows and common mistakes.

4. Sports-Specific Challenges

Small Sample Sizes: NFL = 17 games/season → Use regularization (Ridge/Lasso)

Position-Specific Modeling: Separate models per position with different feature sets

Regime Changes: Weight recent seasons heavier, use sliding window validation

Data Leakage Prevention: Only use data available at prediction time, time-series validation

Reference: references/model_selection.md sections on sports-specific considerations.

Workflow: Building a Player Projection Model

Step 1: Feature Engineering

  • Start with raw stats (yards, TDs, targets, snaps)
  • Create opportunity metrics (target share, snap %)
  • Add efficiency features (YPRR, YPC)
  • Generate rolling averages (3-game, 5-game)
  • Include age curves and interaction terms
  • Use assets/player_projection_model_template.py as starting point

Step 2: Feature Selection

  • Check correlation (remove highly correlated features)
  • Use Lasso for automatic selection
  • SHAP values for importance
  • Domain knowledge: prioritize opportunity > efficiency > TDs

Step 3: Model Selection

  • Establish baseline (linear regression or Marcel)
  • Try regularized model (Elastic Net)
  • Test tree-based (Random Forest, then XGBoost)
  • Position-specific models
  • Ensemble top 2-3 models

Step 4: Validation

  • Hold out most recent season as final test
  • TimeSeriesSplit on training data
  • Nested CV for hyperparameter tuning
  • Evaluate MAE by position

Step 5: Interpretability

  • SHAP values for feature importance
  • Partial dependence plots for age curves
  • Validate on new season

Identifying Data Requirements

For Player Projection Models:

  • Historical performance (3+ years for aging curves)
  • Opportunity metrics (targets, snaps, routes run, carries)
  • Efficiency stats (YPRR, YPC, catch rate)
  • Contextual data (opponent strength, QB quality, game script)
  • Position and age

For Feature Engineering:

  • Player-level: Stats, age, position, career year
  • Team-level: Total targets, snaps, carries (for share calculations)
  • Game-level: Score differential, home/away, opponent defense rank
  • Season-level: Rule changes, schedule strength

Integrating with Other Skills

Complement with ff-dynasty-strategy when:

  • Need domain knowledge for feature selection (aging curves, TD regression)
  • Interpreting model outputs (sell-high candidates)
  • Understanding position-specific patterns

Complement with ff-statistical-methods when:

  • Choosing regression type (OLS vs Lasso vs GAMs)
  • Running Monte Carlo simulations using predictions
  • Performing variance analysis

Best Practices

Feature Engineering Over Model Complexity - Well-engineered features make simple models outperform complex ones

Always Use Time-Series Validation - Standard CV inflates performance 15-20%

Position-Specific Models - RB features ≠ WR features ≠ QB features

Regularization for Small Samples - NFL has limited games (17/season)

Prioritize Interpretability - SHAP values for explainability, start simple

References

  • references/feature_engineering.md - Age curves, opportunity metrics, efficiency stats, interaction terms
  • references/model_selection.md - Decision framework, model types, hyperparameters
  • references/validation_strategies.md - Time-series splits, nested CV, metrics

Assets

  • assets/player_projection_model_template.py - Python template for building player projection models