Security Agent Efficiency — Full Audit Orchestration
Orchestrate a deterministic 10-phase security audit of an arbitrary source code repository using specialized subagents and skills.
When to Apply
- Full security audit of a repository
- Advisory regression check combined with static analysis
- Deep vulnerability research on a specific codebase
- Large or unusual architectures where default SAST modeling is likely incomplete
- "Run the security agents" / "audit for vulnerabilities" / "is this secure?"
Pre-Audit Setup
Before starting any phase:
- Checkout audit branch:
git checkout -b audit(orgit checkout auditif it exists) - Create output directory:
mkdir -p <repo-root>/security/ - Reuse reports in
security/when the codebase and threat-model inputs have not changed - Update
<repo-root>/.gitignoreso audit-generated files undersecurity/are ignored by default, while keeping reports and PoCs trackable - When adding ignore rules, exclude audit artifacts such as transient databases, caches, scratch files, vendored queries/rules, and intermediate exports, but do not ignore reports or PoC files
10-Phase Workflow
flowchart TD
Start["Audit Request"] --> Setup["Setup + reuse check"]
Setup --> P1["1. Intelligence Gathering"]
P1 --> P2["2. Patch Bypass Analysis"]
P2 --> P3["3. Knowledge Base"]
P3 --> P4["4. Static Analysis"]
P4 --> P5["5. Enrichment + Security Relevance Filter"]
P5 --> P6["6. Spec Gap Analysis"]
P6 --> P7["7. Deep Bug Hunting"]
P7 --> P8["8. FP Elimination"]
P8 --> P9["9. Variant Analysis"]
P9 --> P10["10. Exploitation & Final Reporting"]
Phase 1 — Intelligence Gathering
Use the advisory-hunter workflow to collect:
- advisories, CVEs, GHSAs, and patch commits
- coarse architecture inventory: components, transports, execution contexts, trust boundaries
- security-relevant dependencies, with runtime context noted for each one
Treat dependency findings as hypotheses until the audit proves the affected runtime path is reachable.
Produce security/advisory-hunter-report.md.
Phase 2 — Patch Bypass Analysis
For each advisory patch:
- fetch the full diff and surrounding callers
- test bypass hypotheses: alternate entry points, config-gated checks, default-state gaps, compatibility branches, parser differentials, missing normalization
- check whether a sibling or related path remains vulnerable even if the patched path is sound
- cluster duplicate advisories by the same upstream commit or PR so one fix is not re-audited as multiple distinct bugs
Produce security/bypass-analysis-report.md.
Phase 3 — Knowledge Base
Build the project model from source:
- classify project type: web app, API, CLI, desktop, library, plugin, protocol, worker, CI action
- map attacker-controlled inputs, trust boundaries, and security-critical decisions
- build compact DFD/CFD slices only for the highest-risk flows
- record implemented specs and RFCs
Produce:
security/threat-model-report.mdsecurity/attack-surface-report.mdsecurity/knowledge-base-report.md
The Phase 3 threat model is mandatory input for all later phases.
Phase 4 — Static Analysis
CRITICAL ENFORCEMENT: You MUST physically execute the SAST tools. Do not hallucinate results or skip execution. You must ensure codeql successfully runs and that semgrep is run using the Pro engine (--pro) if available.
Baseline requirements:
- Build CodeQL databases for all supported languages when resources allow.
- Run built-in CodeQL security suites appropriate to the repo languages.
- Run built-in Semgrep baseline, language, and framework rulesets. Always attempt to run Semgrep Pro (
semgrep --pro) for deeper cross-file taint analysis. - Explicitly output the list of CodeQL queries and Semgrep rules that you actually ran.
- For Java applications, run SpotBugs with the FindSecBugs plugin as a required baseline pass; treat this as additive to CodeQL and Semgrep.
- Run GitHub Actions review with
agentic-actions-auditorwhen.github/workflows/exists.
Custom Architecture Generalization (Dynamic Rules):
- Do not solely rely on generic or pre-baked rules. You MUST dynamically generate custom CodeQL queries and Semgrep rules specifically tailored to the ad-hoc architecture, framework, and threat model identified in Phase 3 (e.g., custom MCP protocols, specific custom RPC boundaries).
- Store all dynamically generated custom rules in
security/codeql-queries/andsecurity/semgrep-rules/. - Document exactly what custom rules were created, why they match the Phase 3 architecture, and their execution results in
security/static-analysis-report.md.
Operational rules:
- Keep SAST concurrency low enough to avoid exhausting CPU/RAM.
- Merge SARIF outputs with
sarif-parsingif needed. - Delete transient CodeQL databases and Semgrep cache after reports are written.
Phase 5 — Enrichment and Security Relevance Filter
Before deep bug hunting, classify each candidate finding as one of:
- likely security
- likely correctness/robustness
- likely environment/tooling/admin-only
For every candidate, answer:
- What attacker controls the input?
- Which runtime executes the vulnerable path?
- What trust boundary is crossed?
- Is the effect cross-user, cross-tenant, cross-privilege, or only same-user?
- Is the vulnerable dependency/code path actually used in that runtime?
Downgrade or exclude by default when the issue is only:
- build-time, source-controlled, CI-only, test-only, or dev-only
- browser-only usage of a server-side CVE, or server-only usage of a browser-side CVE
- same-user state/cache/UI correctness without a broader data boundary break
- admin safety, migration robustness, retry/deadlock hardening, data-loss prevention, or workflow correctness
- local tooling behavior where the attacker already has equivalent code execution
Update security/knowledge-base-report.md with the enriched conclusions.
Phase 6 — Spec Gap Analysis
If the repo implements specs or RFCs:
- Fetch the relevant documents using built-in web search or fetch tools (do not restrict yourself to MCP tools).
- Research the RFC for historical attacks, known edge cases, and common implementation failures.
- Use
spec-to-code-compliance. - Focus on parsing, normalization, sanitization, canonicalization, and state-machine compliance.
- Identify gaps between the RFC spec and the codebase implementation clearly.
- Keep only medium-to-critical findings with a credible exploit path.
Produce security/spec-gaps-report.md.
Phase 7 — Deep Bug Hunting
Use the threat model and high-risk DFD/CFD slices to drive manual review.
Focus on:
- missing guards on sibling paths
- incorrect field/identity/tenant binding
- incomplete policy coverage
- parser or state-machine inconsistencies
- default-state bypasses and config-gated protections
- attack patterns specific to the project type
Do not start from grep noise alone.
Hint: If static analysis and manual deep bug hunting yield zero actionable results, consider whether serious dynamic testing or fuzzing is necessary. If dynamic analysis MCPs or external fuzzing servers (such as FuzzForge or similar dynamic toolchains) are configured and available in your environment, you may optionally formulate a plan to fuzz the critical components to uncover edge-case panics or memory corruption.
Append candidate findings to security/final-findings-report.md.
Phase 8 — FP Elimination
Apply fp-check to all candidate findings.
Retain only findings that are exploitable within the project's actual threat model.
- Relax strict requirements (e.g., that an attack vector MUST be network-based) and instead judge the vector contextually against the specific project's threat model and attack surface.
- Check the project's
SECURITY.mdor equivalent documentation to understand what the maintainers explicitly consider a vulnerability versus an accepted risk.
CRITICAL: Verify Intended Behavior vs. Bug You MUST cross-reference online framework documentation, user guides, and inline codebase comments to definitively prove a finding is an unintended flaw rather than an intended, documented feature (e.g., intended arbitrary file read for a system backup tool).
CRITICAL: Drop Theoretical/Unexploitable Bugs Exclude or downgrade findings that cannot realistically be exploited in bug-bounty or real-world scenarios. Specifically drop:
- Theoretical crypto vulnerabilities (e.g., static IVs where the attacker does not have the private key or ciphertext access).
- Timing vulnerabilities (unless they result in an explicit, easily reproducible real-world exploit).
- By-design behavior (referencing
SECURITY.mdand documentation). - Informational findings.
- Defense-in-depth only changes with no exploit path.
- Correctness/robustness issues with no crossed trust boundary.
- Dependency alerts with no reachable vulnerable runtime path.
Use verdicts:
VALIDFALSE POSITIVEBY DESIGNOUT OF SCOPE
Update security/final-findings-report.md with the verdicts and rationale.
Phase 9 — Variant Analysis
For each confirmed finding, search for variants using the same flow shape, not just the same syntax.
Use:
variant-analysis- DFD/CFD slices
- custom CodeQL queries and Semgrep rules when they help scale the variant hunt
Append confirmed variants to security/final-findings-report.md.
Phase 10 — Exploitation & Final Reporting
For each critical, high, and medium bug confirmed:
- Construct a realistic PoC on a real host or in a VM. You may spin up environments using the Azure CLI if already configured.
- Ensure PoCs are valid and do not trivially bypass a security guard unrepresentative of the real environment (e.g., executing a command directly on the host rather than through the intended sandbox).
- The PoC script must be minimized, clean, and highly effective—styled like a CTF exploit without excessive or unnecessary logging.
- Make sure that the generated report contains granular, step-by-step details required to reproduce the exact bug.
- Invoke the
vuln-reportskill to generate the final advisory. - Output all technical details and the PoC script for each single bug in its own dedicated subfolder under
security/findings/<bug-name>/.
Output Directory
All reports live in <repo-root>/security/ on the audit branch:
| File | Phase |
|------|-------|
| security/advisory-hunter-report.md | 1 |
| security/bypass-analysis-report.md | 2 |
| security/threat-model-report.md | 3 |
| security/attack-surface-report.md | 3 |
| security/knowledge-base-report.md | 3, 5 |
| security/static-analysis-report.md | 4 |
| security/actions-audit-report.md | 4 |
| security/codeql-queries/ | 4, 9 |
| security/semgrep-rules/ | 4, 9 |
| security/spec-gaps-report.md | 6 |
| security/final-findings-report.md | 7-9 |
| security/findings/<bug-name>/ | 10 |
Shared Rules
- Evidence over volume: every retained finding needs attacker control, a reachable path, and a crossed trust boundary.
- Threat-model first: browser, server, CLI, desktop, library, CI, and admin control planes have different security boundaries.
- Do not escalate correctness, robustness, operational safety, or data-loss-prevention fixes into security findings without a demonstrated trust-boundary break.
- Dependency advisories are not enough on their own; prove the vulnerable runtime path is used.
- Custom CodeQL or Semgrep coverage augments built-ins and should be architecture-driven.
- Deduplicate by upstream commit, PR, advisory, and sink so the same underlying bug is reported once.
- Keep repo
.gitignoreentries tight during audits: ignore generatedsecurity/audit artifacts by default, but preserve reports and PoCs as the only audit outputs meant to remain visible to git. - No fix recommendations by default unless the user asks.
Post-Audit Skill Improvement
After the audit, use:
prompt-optimizerto tighten weak promptsprompt-builderto refine targeted audit promptsskill-creatorto update recurring audit workflows when new patterns emerge
Scan to join WeChat group