kernelgen-flagos — Unified GPU Operator Generation Skill

This is a unified entry point that bundles generation and optimization sub-skills into one:

| Sub-skill file | Purpose | |---|---| | Generation | | | kernelgen-generate.md | Generate GPU kernels for any Python/Triton repository | | kernelgen-generate-for-flaggems.md | Specialized generation for FlagGems repositories | | kernelgen-generate-for-vllm.md | Specialized generation for vLLM repositories | | Optimization | | | kernelgen-optimize.md | Optimize existing Triton kernels via MCP iterative optimization (general purpose) | | kernelgen-optimize-for-flaggems.md | Optimize Triton operators and integrate into FlagGems (3 modes: built-in/external/experimental) | | kernelgen-optimize-for-vllm.md | Optimize Triton operators and integrate into vLLM (with CustomOp registration) | | Platform Specialization | | | kernelgen-specialize.md | Specialize Triton operators to target platforms (e.g., GPU → Ascend NPU) via MCP specialize_kernel | | kernelgen-specialize-for-flaggems.md | Platform specialization + FlagGems integration (4 modes: vendor-ops/vendor-fused/override-builtin/experimental) | | MCP Configuration | | | kernelgen-mcp-setup.md | Check and auto-configure the kernelgen-server MCP service (URL built-in, user only provides Token) | | Feedback | | | kernelgen-submit-feedback.md | Submit bug reports and feedback via GitHub or email |

All sub-skill files are located in the same directory as this SKILL.md file.

Routing Protocol — Follow This BEFORE Doing Anything Else

Phase 0: MCP Configuration Check

Before anything else, ensure the kernelgen-server MCP service is configured and ready.

Use the Glob tool to find kernelgen-mcp-setup.md in this skill's directory:

Glob: **/skills/kernelgen-flagos/kernelgen-mcp-setup.md

Then use the Read tool to read the matched file and follow its instructions exactly.

If MCP is already configured → proceed to Phase 1.
If MCP is not configured → the setup skill will guide the user through configuration. Once configuration is written and the user is prompted to restart, stop here — do not continue to Phase 1.

Phase 1: Detect Repository Type

Use the Glob tool to check for project identity files in the current working directory:

Glob: pyproject.toml
Glob: setup.py
Glob: setup.cfg

Then use the Read tool to read whichever file exists. Determine the project name from the file contents (e.g., name = "flag_gems" in pyproject.toml, or name='vllm' in setup.py).

Also use the Glob tool to check for characteristic directory structures:

FlagGems indicators (match ANY):

src/flag_gems/ directory exists
Project name is flag_gems or flag-gems or FlagGems
import flag_gems appears in test files

vLLM indicators (match ANY):

vllm/ directory exists at the repo root (with vllm/__init__.py)
Project name is vllm
csrc/ directory exists alongside vllm/

Phase 2: Dispatch to Sub-skill

Based on the detection result, use the Read tool to read the appropriate sub-skill file from this skill's directory, then follow the instructions in that file exactly.

To locate the sub-skill files: They are in the same directory as this SKILL.md. Use the Glob tool to find the path:

Glob: **/skills/kernelgen-flagos/kernelgen-generate.md

Then use the Read tool to read the matched path.

Decision Table

Generation requests (user wants to create/generate a new operator):

| Detection Result | Action | |---|---| | FlagGems repository detected | Read kernelgen-generate-for-flaggems.md and follow it | | vLLM repository detected | Read kernelgen-generate-for-vllm.md and follow it | | Neither detected (or unknown) | Read kernelgen-generate.md and follow it |

Optimization requests (user wants to optimize an existing operator, mentions "optimize", "speedup", "improve performance"):

| Detection Result | Action | |---|---| | FlagGems repository detected | Read kernelgen-optimize-for-flaggems.md and follow it | | vLLM repository detected | Read kernelgen-optimize-for-vllm.md and follow it | | Neither detected (or unknown) | Read kernelgen-optimize.md and follow it |

Specialization requests (user wants to migrate/specialize an operator to a different platform, mentions "specialize", "migrate to Ascend/NPU", "platform migration"):

| Detection Result | Action | |---|---| | FlagGems repository detected | Read kernelgen-specialize-for-flaggems.md and follow it | | Neither detected (or unknown) | Read kernelgen-specialize.md and follow it |

Feedback requests:

| Detection Result | Action | |---|---| | User reports a bug or requests feedback submission | Read kernelgen-submit-feedback.md and follow it |

Important rules:

Always detect first, dispatch second. Never skip detection.
Read the entire sub-skill file before starting execution — do not partially read it.
Follow the sub-skill instructions exactly as if they were the main SKILL.md. All steps, rules, and protocols in the sub-skill apply fully.
Do not mix sub-skills. Once you dispatch to a sub-skill, follow it to completion.
If the user explicitly requests a specific sub-skill (e.g., "use the FlagGems version"), honor that request regardless of auto-detection results.
CRITICAL — MCP is mandatory: ALL operator code generation MUST go through the mcp__kernelgen-mcp__generate_kernel MCP tool. Optimization uses mcp__kernelgen-mcp__optimize_kernel, and platform specialization uses mcp__kernelgen-mcp__specialize_kernel. NEVER generate Triton kernels, PyTorch wrappers, or operator implementations yourself. If MCP is not configured, not reachable, or fails after all retries, STOP and report the issue — do NOT fall back to writing code manually.

Phase 3: Feedback Handling

At any point during the workflow, if the user reports a bug, says something is broken, or asks to submit feedback about the skill:

Use the Read tool to read kernelgen-submit-feedback.md from this skill's directory.
Follow the feedback submission workflow described in that file.
After feedback is submitted, ask the user if they want to continue with the operator generation workflow or stop.

Quick Reference for Users

# === Generation ===
# Generate a kernel operator (auto-detects repo type)
/kernelgen-flagos relu

# Generate with explicit function type
/kernelgen-flagos rms_norm --func-type normalization

# === Optimization ===
# Optimize an existing Triton kernel (auto-detects repo type)
# Just say "optimize the relu kernel" or "improve kernel performance"
# The skill will automatically dispatch to the right optimization sub-skill

# The skill will automatically:
# - Detect if you're in a FlagGems repo → use FlagGems-specific workflow
# - Detect if you're in a vLLM repo → use vLLM-specific workflow
# - Otherwise → use the general-purpose workflow

If you encounter any issues during generation, just say "submit feedback" or "report a bug" and the skill will guide you through the feedback submission process.