返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

mteb-leaderboard

关于查询机器学习模型排行榜和基准测试(如MTEB、HuggingFace、嵌入式基准)的指导。当任务涉及在特定基准上查找表现最佳的模型、跨排行榜比较模型性能或回答有关当前基准排名的问题时,这项技能非常有用。涵盖了访问实时排行榜数据的策略、处理时间要求以及避免使用过时来源时常见的陷阱。

person作者: jakexiaohubgithub

MTEB Leaderboard Query Skill

This skill provides guidance for accurately querying machine learning model leaderboards and benchmarks, particularly the Massive Text Embedding Benchmark (MTEB) and related embedding leaderboards.

When to Use This Skill

  • Finding top-performing models on specific benchmarks (MTEB, Scandinavian Embedding Benchmark, etc.)
  • Answering questions about current leaderboard standings
  • Comparing model performance across different benchmarks
  • Tasks with specific temporal requirements (e.g., "as of August 2025")

Core Approach

Step 1: Identify Authoritative Data Sources

Before searching for results, establish which sources contain authoritative, current data:

  1. Primary Sources (prefer these):

    • Official leaderboard websites (e.g., mteb-leaderboard on HuggingFace Spaces)
    • GitHub repositories with raw benchmark data
    • API endpoints or JSON data files from leaderboard maintainers
  2. Secondary Sources (use with caution):

    • Academic papers (often outdated by publication time)
    • Blog posts and articles (may reference outdated results)
    • News articles about benchmark results

Step 2: Verify Temporal Alignment

When a task specifies a time constraint (e.g., "as of August 2025"):

  1. Check source publication/update dates - Academic papers are typically 6-18 months behind current leaderboard state
  2. Look for "last updated" timestamps on leaderboard pages
  3. Never assume paper results reflect current standings without verification
  4. Be explicit about temporal gaps - If using data from June 2024 to answer about August 2025, this is a 14+ month gap that likely invalidates the data

Step 3: Access Live Leaderboard Data

When web pages don't render properly (interactive charts, JavaScript-heavy pages):

  1. Look for raw data endpoints:

    • Check for /api/ or /data/ endpoints
    • Search for JSON files in the page source
    • Look for GitHub repositories backing the leaderboard
  2. Try alternative access methods:

    • HuggingFace Spaces often have Gradio APIs
    • Many leaderboards publish CSV/JSON exports
    • Check GitHub issues/discussions for data access tips
  3. Search for data repositories:

    • site:github.com [leaderboard name] results json
    • site:huggingface.co [benchmark name] leaderboard

Step 4: Validate Model Eligibility

Do not make assumptions about which models "count" on a leaderboard:

  1. Check official leaderboard criteria - Some include API models, some don't
  2. Verify the answer format requirements against actual leaderboard entries
  3. Do not exclude models based on assumptions about what can be represented in a given format
  4. Consider all model types: open-source, API-based, fine-tuned variants

Verification Strategies

Cross-Reference Multiple Sources

  • Compare results from at least 2-3 independent sources
  • If sources disagree, prioritize the most recent authoritative source
  • Document discrepancies and their potential causes

Sanity Check Results

  • Verify the model actually appears on the leaderboard
  • Confirm the model name/organization format matches the source
  • Check if the model was released before the specified date

Test Alternative Access Methods

When primary access fails:

  1. Try the Wayback Machine for historical snapshots
  2. Search for leaderboard maintainer announcements
  3. Look for community discussions about recent changes
  4. Check if there's a programmatic API

Common Pitfalls to Avoid

1. Relying on Outdated Academic Papers

Academic papers have publication delays of 3-12 months. A paper published in June 2024 contains data from early 2024 at best. Never use paper results for questions about current standings.

2. Giving Up When Web Scraping Fails

Interactive leaderboards often don't render in simple web fetches. Always try:

  • Looking for underlying data files
  • Checking GitHub repositories
  • Finding API endpoints
  • Searching for data exports

3. Making Assumptions About Model Format

Do not assume API models (OpenAI, Cohere, etc.) cannot be valid answers. Check the actual task requirements and leaderboard contents.

4. Premature Conclusion Without Verification

Before writing a final answer:

  • Verify the model appears on the actual leaderboard
  • Confirm the ranking is current
  • Check that the model meets all task requirements

5. Ignoring Temporal Requirements

If a task asks about a specific date, ensure data sources reflect that timeframe. A 14-month gap between data and required date is unacceptable.

Systematic Search Strategy

When searching for leaderboard information:

  1. Start broad, then narrow:

    • [benchmark name] leaderboard 2025
    • [benchmark name] top models current
    • site:huggingface.co [benchmark name]
  2. Search for raw data:

    • [benchmark name] results github
    • [benchmark name] json data
    • [benchmark name] api
  3. Search for recent updates:

    • [benchmark name] new top model [current year]
    • [benchmark name] leaderboard update
  4. Avoid repetitive similar queries - If a query pattern isn't working after 2-3 attempts, change the approach rather than making minor variations

Output Checklist

Before submitting an answer, verify:

  • [ ] Data source is current (not outdated paper)
  • [ ] Model appears on the actual leaderboard
  • [ ] Temporal requirements are met
  • [ ] Model format matches requirements
  • [ ] No unvalidated assumptions were made
  • [ ] Answer was cross-referenced where possible