Security Agent Efficiency — Full Audit Orchestration

Orchestrate a deterministic 10-phase security audit of an arbitrary source code repository using specialized subagents and skills.

When to Apply

Full security audit of a repository
Advisory regression check combined with static analysis
Deep vulnerability research on a specific codebase
Large or unusual architectures where default SAST modeling is likely incomplete
"Run the security agents" / "audit for vulnerabilities" / "is this secure?"

Pre-Audit Setup

Before starting any phase:

Checkout audit branch: git checkout -b audit (or git checkout audit if it exists)
Create output directory: mkdir -p <repo-root>/security/
Reuse reports in security/ when the codebase and threat-model inputs have not changed
Update <repo-root>/.gitignore so audit-generated files under security/ are ignored by default, while keeping reports and PoCs trackable
When adding ignore rules, exclude audit artifacts such as transient databases, caches, scratch files, vendored queries/rules, and intermediate exports, but do not ignore reports or PoC files

10-Phase Workflow

flowchart TD
    Start["Audit Request"] --> Setup["Setup + reuse check"]
    Setup --> P1["1. Intelligence Gathering"]
    P1 --> P2["2. Patch Bypass Analysis"]
    P2 --> P3["3. Knowledge Base"]
    P3 --> P4["4. Static Analysis"]
    P4 --> P5["5. Enrichment + Security Relevance Filter"]
    P5 --> P6["6. Spec Gap Analysis"]
    P6 --> P7["7. Deep Bug Hunting"]
    P7 --> P8["8. FP Elimination"]
    P8 --> P9["9. Variant Analysis"]
    P9 --> P10["10. Exploitation & Final Reporting"]

Phase 1 — Intelligence Gathering

Use the advisory-hunter workflow to collect:

advisories, CVEs, GHSAs, and patch commits
coarse architecture inventory: components, transports, execution contexts, trust boundaries
security-relevant dependencies, with runtime context noted for each one

Treat dependency findings as hypotheses until the audit proves the affected runtime path is reachable.

Produce security/advisory-hunter-report.md.

Phase 2 — Patch Bypass Analysis

For each advisory patch:

fetch the full diff and surrounding callers
test bypass hypotheses: alternate entry points, config-gated checks, default-state gaps, compatibility branches, parser differentials, missing normalization
check whether a sibling or related path remains vulnerable even if the patched path is sound
cluster duplicate advisories by the same upstream commit or PR so one fix is not re-audited as multiple distinct bugs

Produce security/bypass-analysis-report.md.

Phase 3 — Knowledge Base

Build the project model from source:

classify project type: web app, API, CLI, desktop, library, plugin, protocol, worker, CI action
map attacker-controlled inputs, trust boundaries, and security-critical decisions
build compact DFD/CFD slices only for the highest-risk flows
record implemented specs and RFCs

Produce:

security/threat-model-report.md
security/attack-surface-report.md
security/knowledge-base-report.md

The Phase 3 threat model is mandatory input for all later phases.

Phase 4 — Static Analysis

CRITICAL ENFORCEMENT: You MUST physically execute the SAST tools. Do not hallucinate results or skip execution. You must ensure codeql successfully runs and that semgrep is run using the Pro engine (--pro) if available.

Baseline requirements:

Build CodeQL databases for all supported languages when resources allow.
Run built-in CodeQL security suites appropriate to the repo languages.
Run built-in Semgrep baseline, language, and framework rulesets. Always attempt to run Semgrep Pro (semgrep --pro) for deeper cross-file taint analysis.
Explicitly output the list of CodeQL queries and Semgrep rules that you actually ran.
For Java applications, run SpotBugs with the FindSecBugs plugin as a required baseline pass; treat this as additive to CodeQL and Semgrep.
Run GitHub Actions review with agentic-actions-auditor when .github/workflows/ exists.

Custom Architecture Generalization (Dynamic Rules):

Do not solely rely on generic or pre-baked rules. You MUST dynamically generate custom CodeQL queries and Semgrep rules specifically tailored to the ad-hoc architecture, framework, and threat model identified in Phase 3 (e.g., custom MCP protocols, specific custom RPC boundaries).
Store all dynamically generated custom rules in security/codeql-queries/ and security/semgrep-rules/.
Document exactly what custom rules were created, why they match the Phase 3 architecture, and their execution results in security/static-analysis-report.md.

Operational rules:

Keep SAST concurrency low enough to avoid exhausting CPU/RAM.
Merge SARIF outputs with sarif-parsing if needed.
Delete transient CodeQL databases and Semgrep cache after reports are written.

Phase 5 — Enrichment and Security Relevance Filter

Before deep bug hunting, classify each candidate finding as one of:

likely security
likely correctness/robustness
likely environment/tooling/admin-only

For every candidate, answer:

What attacker controls the input?
Which runtime executes the vulnerable path?
What trust boundary is crossed?
Is the effect cross-user, cross-tenant, cross-privilege, or only same-user?
Is the vulnerable dependency/code path actually used in that runtime?

Downgrade or exclude by default when the issue is only:

build-time, source-controlled, CI-only, test-only, or dev-only
browser-only usage of a server-side CVE, or server-only usage of a browser-side CVE
same-user state/cache/UI correctness without a broader data boundary break
admin safety, migration robustness, retry/deadlock hardening, data-loss prevention, or workflow correctness
local tooling behavior where the attacker already has equivalent code execution

Update security/knowledge-base-report.md with the enriched conclusions.

Phase 6 — Spec Gap Analysis

If the repo implements specs or RFCs:

Fetch the relevant documents using built-in web search or fetch tools (do not restrict yourself to MCP tools).
Research the RFC for historical attacks, known edge cases, and common implementation failures.
Use spec-to-code-compliance.
Focus on parsing, normalization, sanitization, canonicalization, and state-machine compliance.
Identify gaps between the RFC spec and the codebase implementation clearly.
Keep only medium-to-critical findings with a credible exploit path.

Produce security/spec-gaps-report.md.

Phase 7 — Deep Bug Hunting

Use the threat model and high-risk DFD/CFD slices to drive manual review.

Focus on:

missing guards on sibling paths
incorrect field/identity/tenant binding
incomplete policy coverage
parser or state-machine inconsistencies
default-state bypasses and config-gated protections
attack patterns specific to the project type

Do not start from grep noise alone.

Hint: If static analysis and manual deep bug hunting yield zero actionable results, consider whether serious dynamic testing or fuzzing is necessary. If dynamic analysis MCPs or external fuzzing servers (such as FuzzForge or similar dynamic toolchains) are configured and available in your environment, you may optionally formulate a plan to fuzz the critical components to uncover edge-case panics or memory corruption.

Append candidate findings to security/final-findings-report.md.

Phase 8 — FP Elimination

Apply fp-check to all candidate findings.

Retain only findings that are exploitable within the project's actual threat model.

Relax strict requirements (e.g., that an attack vector MUST be network-based) and instead judge the vector contextually against the specific project's threat model and attack surface.
Check the project's SECURITY.md or equivalent documentation to understand what the maintainers explicitly consider a vulnerability versus an accepted risk.

CRITICAL: Verify Intended Behavior vs. Bug You MUST cross-reference online framework documentation, user guides, and inline codebase comments to definitively prove a finding is an unintended flaw rather than an intended, documented feature (e.g., intended arbitrary file read for a system backup tool).

CRITICAL: Drop Theoretical/Unexploitable Bugs Exclude or downgrade findings that cannot realistically be exploited in bug-bounty or real-world scenarios. Specifically drop:

Theoretical crypto vulnerabilities (e.g., static IVs where the attacker does not have the private key or ciphertext access).
Timing vulnerabilities (unless they result in an explicit, easily reproducible real-world exploit).
By-design behavior (referencing SECURITY.md and documentation).
Informational findings.
Defense-in-depth only changes with no exploit path.
Correctness/robustness issues with no crossed trust boundary.
Dependency alerts with no reachable vulnerable runtime path.

Use verdicts:

VALID
FALSE POSITIVE
BY DESIGN
OUT OF SCOPE

Update security/final-findings-report.md with the verdicts and rationale.

Phase 9 — Variant Analysis

For each confirmed finding, search for variants using the same flow shape, not just the same syntax.

Use:

variant-analysis
DFD/CFD slices
custom CodeQL queries and Semgrep rules when they help scale the variant hunt

Append confirmed variants to security/final-findings-report.md.

Phase 10 — Exploitation & Final Reporting

For each critical, high, and medium bug confirmed:

Construct a realistic PoC on a real host or in a VM. You may spin up environments using the Azure CLI if already configured.
Ensure PoCs are valid and do not trivially bypass a security guard unrepresentative of the real environment (e.g., executing a command directly on the host rather than through the intended sandbox).
The PoC script must be minimized, clean, and highly effective—styled like a CTF exploit without excessive or unnecessary logging.
Make sure that the generated report contains granular, step-by-step details required to reproduce the exact bug.
Invoke the vuln-report skill to generate the final advisory.
Output all technical details and the PoC script for each single bug in its own dedicated subfolder under security/findings/<bug-name>/.

Output Directory

All reports live in <repo-root>/security/ on the audit branch:

| File | Phase | |------|-------| | security/advisory-hunter-report.md | 1 | | security/bypass-analysis-report.md | 2 | | security/threat-model-report.md | 3 | | security/attack-surface-report.md | 3 | | security/knowledge-base-report.md | 3, 5 | | security/static-analysis-report.md | 4 | | security/actions-audit-report.md | 4 | | security/codeql-queries/ | 4, 9 | | security/semgrep-rules/ | 4, 9 | | security/spec-gaps-report.md | 6 | | security/final-findings-report.md | 7-9 | | security/findings/<bug-name>/ | 10 |

Shared Rules

Evidence over volume: every retained finding needs attacker control, a reachable path, and a crossed trust boundary.
Threat-model first: browser, server, CLI, desktop, library, CI, and admin control planes have different security boundaries.
Do not escalate correctness, robustness, operational safety, or data-loss-prevention fixes into security findings without a demonstrated trust-boundary break.
Dependency advisories are not enough on their own; prove the vulnerable runtime path is used.
Custom CodeQL or Semgrep coverage augments built-ins and should be architecture-driven.
Deduplicate by upstream commit, PR, advisory, and sink so the same underlying bug is reported once.
Keep repo .gitignore entries tight during audits: ignore generated security/ audit artifacts by default, but preserve reports and PoCs as the only audit outputs meant to remain visible to git.
No fix recommendations by default unless the user asks.

Post-Audit Skill Improvement

After the audit, use:

prompt-optimizer to tighten weak prompts
prompt-builder to refine targeted audit prompts
skill-creator to update recurring audit workflows when new patterns emerge