Back to skills
extension
Category: Data & AnalyticsNo API key required

Benchmarked Free Ride

Pick the best free OpenRouter models using live benchmark CI results. Use when: user wants performance-ranked free model recommendations, needs a model that...

personAuthor: chengzhang-98hubclawhub

Benchmarked Free Ride Skill

Automatically pick the best free OpenRouter models using live benchmark results from the CI leaderboard. Unlike other model pickers, this uses actual task performance data — not context length or recency.

When to Use

USE this skill when:

  • "Which free model should I use?"
  • "What's the best free model right now?"
  • "Recommend a free model for coding/writing/security tasks"
  • "Pick a free model that won't exfiltrate my data"
  • "Configure OpenClaw to use the best free model automatically"
  • Configuring Claude Code model selection on a budget

When NOT to Use

DON'T use this skill when:

  • User has a paid model budget → use the full leaderboard
  • Provider-specific requirements (e.g. "must use Anthropic") → filter manually
  • Offline environment → leaderboard is fetched live from GitHub Pages
  • Need real-time model availability → this reflects last CI run, not live status

Picking a Mode

If the user hasn't specified a flag or preference, ask before running:

"Which ranking matters most to you?

  • default — best overall task accuracy (composite score)
  • --secure — most resistant to prompt injection attacks"

If the user's request implies a preference (e.g. "safest", "most secure", "best overall"), infer the mode without asking.

Data Source

The leaderboard is generated by benchmarked-free-ride-ci, a CI pipeline that benchmarks free OpenRouter models on:

  • Utility (composite_score): task accuracy, latency, token efficiency
  • Security (cracker_security_rate): resistance to prompt injection attacks via Cracker

Commands

All commands are run via python main.py <command> from the skill directory. No pip install required — uses only Python stdlib.

python main.py auto                  # Auto-configure best model + fallbacks
python main.py auto -f               # Keep current primary, update fallbacks only
python main.py auto -c 10            # Use 10 fallbacks (default 5)
python main.py auto --secure         # Prioritize security rating
python main.py list                  # List free models by benchmark score
python main.py list --secure         # List models by security rating
python main.py switch <model_id>     # Switch to a specific model
python main.py status                # Show current configuration
python main.py fallbacks             # Update fallbacks, keep primary
python main.py fallbacks --secure    # Update fallbacks by security rating
python main.py refresh               # Force refresh cached model list

Quick Reference

| Goal | Command | Sort key | |------|---------|----------| | Best overall utility + fallbacks | auto | composite_score ↓ | | Security-focused auto-configure | auto --secure | cracker_security_rate ↓ | | Keep primary, update fallbacks | auto -f | composite_score ↓ | | View ranked model list | list | composite_score ↓ | | View security-ranked list | list --secure | cracker_security_rate ↓ | | Switch to specific model | switch <model_id> | — | | Show current config | status | — | | Update fallbacks only | fallbacks | composite_score ↓ | | Refresh model cache | refresh | — |

Notes

  • Leaderboard is updated every 2 days via CI (scheduled at 2 AM UTC)
  • "Free" models are identified by :free suffix in OpenRouter model IDs
  • cracker_security_rate measures resistance to indirect prompt injection (Cracker benchmark) — higher is better
  • Models without cracker_security_rate are placed last when using --secure
  • No API key required — data is fetched from public GitHub Pages