Skill Auditor v3.1.0

The definitive security scanner for OpenClaw/ClawHub skills. Best-in-class detection across 18 security checks including prompt injection detection — the first scanner to catch agent manipulation attacks in skill documentation. 5-dimension trust scoring, trend tracking, diff analysis, and benchmarking. Zero false positives on legitimate skills.

When to Activate

Installing a new skill from ClawHub - run inspect.sh for full pre-install validation
Auditing existing skills - use audit.sh to scan any skill directory
Generating trust scores - use trust_score.py for 0-100 rating across 5 dimensions
Comparing skills - use trust_score.py --compare for side-by-side analysis
Tracking improvements - use trust_score.py --save-trend to monitor score over time
Reviewing updates - use diff-audit.sh to compare before/after versions
Batch scanning - use audit-all.sh or benchmark.sh for fleet-wide analysis

Quick Start

# Audit a single skill
bash audit.sh /path/to/skill

# Trust score (0-100 across 5 dimensions)
python3 trust_score.py /path/to/skill

# Compare two skills side by side
python3 trust_score.py /path/to/skill1 --compare /path/to/skill2

# Track score over time
python3 trust_score.py /path/to/skill --save-trend
python3 trust_score.py /path/to/skill --trend

# Diff audit (before/after update)
bash diff-audit.sh /path/to/old-version /path/to/new-version

# Benchmark against a corpus
bash benchmark.sh /path/to/skills-dir

# Inspect a ClawHub skill before installing
bash inspect.sh skill-slug

# Audit all installed skills
bash audit-all.sh

# Generate a markdown report
bash report.sh

# Run test suite (28 assertions)
bash test.sh

Guardrails / Anti-Patterns

DO:

✓ Always audit skills before installing from untrusted sources
✓ Review trust scores - reject skills scoring below 60 (D grade)
✓ Use diff-audit.sh when updating skills to catch regressions
✓ Use --json output for CI/CD pipeline integration
✓ Run --save-trend periodically to track skill health

DON'T:

✗ Install skills scoring below 40 (F grade) without extensive manual review
✗ Ignore CRITICAL findings - they indicate potential security threats
✗ Blindly add skills to allowlist without understanding why they access credentials
✗ Skip audit because a skill is "popular" or "official"

Security Checks (18 total)

| # | Check | Severity | Description | |---|-------|----------|-------------| | 1 | credential-harvest | CRITICAL | Scripts reading API keys/tokens AND making network calls | | 2 | exfiltration-url | CRITICAL | webhook.site, requestbin, ngrok URLs in scripts | | 3 | obfuscated-payload | CRITICAL | Base64-encoded URLs or shell commands | | 4 | sensitive-fs | CRITICAL | /etc/passwd, ~/.ssh, ~/.aws/credentials access | | 5 | crypto-wallet | CRITICAL | Hardcoded ETH/BTC wallet addresses (drain attacks) | | 6 | dependency-confusion | CRITICAL | Internal/private-scoped packages in public deps | | 7 | typosquatting | CRITICAL | Misspelled package names (lodahs, requets, etc.) | | 8 | symlink-attack | CRITICAL | Symlinks targeting sensitive system paths | | 9 | code-execution | WARNING | eval(), exec(), subprocess patterns | | 10 | time-bomb | WARNING | Date/time comparisons that could trigger delayed payloads | | 11 | telemetry-detected | WARNING | Analytics SDKs, tracking pixels, phone-home behavior | | 12 | excessive-permissions | WARNING | >15 bins/env/config items requested | | 13 | unusual-ports | WARNING | Network calls to non-standard ports | | 14 | prompt-injection | CRITICAL | Agent manipulation in docs: "ignore instructions", role hijacking, hidden HTML directives | | 15 | download-execute | CRITICAL | curl|bash, wget|sh, eval $(curl), unsafe pip/npm installs | | 16 | hidden-file | WARNING | Suspicious dotfiles that may hide malicious content | | 17 | env-exfiltration | CRITICAL | Reading sensitive env vars + outbound network calls | | 18 | privilege-escalation | CRITICAL | sudo, chmod 777/setuid, writes to system paths |

Context-aware: credential mentions in documentation are INFO, not CRITICAL.

Trust Score (5 Dimensions)

| Dimension | Max | What's Measured | |-----------|-----|-----------------| | Security | 35 | Audit findings (criticals = -18, warnings = -4) | | Quality | 22 | Description, version, usage docs, examples, metadata, changelog | | Structure | 18 | File organization, tests, README, reasonable scope | | Transparency | 15 | License, no minified code, code comments | | Behavioral | 10 | Rate limiting, error handling, input validation |

Grades: A (90+), B (75+), C (60+), D (40+), F (<40)

Comparative Scoring

python3 trust_score.py /path/to/skill-a --compare /path/to/skill-b

Shows per-dimension deltas and overall score difference.

Trend Tracking

python3 trust_score.py /path/to/skill --save-trend   # Record score
python3 trust_score.py /path/to/skill --trend         # View history

Stores up to 50 entries per skill in trust_trends.json.

Tools

| File | Purpose | |------|---------| | audit.sh | Single skill security audit (18 checks) | | audit-all.sh | Batch scan all installed skills | | trust_score.py | Trust score calculator (5-dimension, 0-100) | | diff-audit.sh | Compare skill versions for security regressions | | benchmark.sh | Corpus-wide audit with aggregate statistics | | inspect.sh | ClawHub pre-install workflow | | report.sh | Markdown report generator | | test.sh | Automated test suite (28 assertions, 12 test skills) | | allowlist.json | Known-good credential skills |

Test Suite

12 test skills (8 malicious, 4 clean) with 28 automated assertions:

bash test.sh

Malicious fixtures: credential harvest, obfuscated payload, sensitive fs reads, crypto wallets, time bombs, symlink attacks, prompt injection, download-execute, privilege escalation. Clean fixtures: basic skill, credential docs (false positive check), network skill, dotfiles skill.

Exit Codes

0: PASS / safe to install
1: REVIEW / warnings found
2: FAIL / critical issues
3: Error / bad input

Changelog

See CHANGELOG.md for full version history.