Claude Agent Auditor
Your agent has no guardrails.
Most Claude Code setups run with no deny rules, no session logging, and rules that cover only 2 of 4 LLM failure modes. One scan surfaces every gap. Free, zero dependencies, read-only.
pip install claude-agent-auditorThen run claude-agent-auditor --open in your project directory
See a live sample report
Generated from a real Claude Code workspace. Three pages, each with a different view of your agent architecture.
Current State
Architecture score, autonomy risk level, observability hook coverage, rule architecture analysis, and all detected issues.
View page 1 →Recommendations
Prioritized P0/P1/P2 action plan. Every missing hook includes a copy-paste JSON snippet. Rule gaps include starter templates.
View page 2 →Projected Results
Estimated architecture score, autonomy risk, and observability coverage after implementing all recommendations.
View page 3 →What the auditor finds
Unconstrained Autonomy
bypassPermissions or dontAsk mode with zero deny or ask rules. Claude can delete files, push code, and send messages without any confirmation. This is the most common P0 issue.
No Agent Tracing
Multi-step Task tool runs with no PostToolUse hook. When an agent sub-task fails, there's no trace. You can't debug what you can't see.
Missing Session Logging
No Stop hook configured. Every architectural decision, plan, and conclusion from the session is lost when it ends. Critical context vanishes between conversations.
Rule Coverage Gaps
Rules covering hallucination prevention but nothing on context window limits, or domain knowledge rules with no control constraints. Uncovered failure modes fail silently.
Overlapping Rules
Two rules with 70%+ keyword overlap loaded every session. They dilute each other and waste context budget. The auditor identifies every redundant pair.
No Memory Preservation
PreCompact hook absent. Claude compacts long sessions without extracting key decisions first. Context that took hours to build gets summarized away.
MCP Servers Without Deny Rules
MCP tools expand Claude's action surface. When servers are configured with no deny rules, every MCP tool runs unchecked. The auditor flags this combination.
Secrets Exposure
API keys, tokens, or passwords accidentally dropped into rules files or settings JSON that could end up in version control. The auditor scans for common patterns.
Requirements
- Python 3.10+ (check with
python --version) - Zero dependencies — pure Python stdlib
- Does NOT need to be installed inside your Claude Code workspace
- Read-only analysis — never modifies your files
How it works
Install (from anywhere)
pip install claude-agent-auditorPoint it at your project
claude-agent-auditor /path/to/your/project --openReview the three-page report
Page 1: Current State — score, issues, and what's misconfigured. Page 2: Recommendations — prioritized fixes with copy-paste hook JSON. Page 3: Projected Results — your score after all fixes applied.
Ask Claude to fix it
Feed the report to your Claude Code instance: "Read agent-audit/recommendations.html and implement the P0 recommendations." Review the changes before accepting.
The four LLM failure modes
Based on the Stanford CS230 study guide on building with LLMs. Every technique — RAG, fine-tuning, agentic workflows — exists to solve one of these four problems. Your rules should cover all four.
Domain Knowledge Gaps
Base models lack proprietary data, recent events, and internal docs. Solved by RAG, memory systems, and domain-specific rules.
Fix: Memory system, RAG, domain rules
Context Window Limits
Can't hold arbitrarily long history. Requires explicit architectural choices: handoffs, compaction, summarization.
Fix: PreCompact hook, handoff rules, MEMORY.md
Hallucinations
Generates plausible-sounding but incorrect output with confidence. Needs explicit verification rules before asserting facts.
Fix: Verification rules, claim-checking, grounding
Difficulty of Control
Hard to get consistent, scoped behavior. Requires deny rules, ask rules, and explicit scope constraints.
Fix: Deny rules, ask rules, scope limits
Six observability hooks, three priority tiers
If you don't have traces, you can't debug your agent system. The auditor checks for all six hooks and flags missing ones with copy-paste implementation snippets.
| Priority | Hook | What it captures |
|---|---|---|
| CRITICAL | PostToolUse: Task | Every agent sub-task — the backbone of multi-agent tracing |
| CRITICAL | Stop | Session decisions before they're lost on conversation end |
| IMPORTANT | PreCompact | Critical context preserved before compaction runs |
| IMPORTANT | SessionStart | Session initialization and context restore |
| USEFUL | PostToolUse: Write|Edit | Every file change with path and timestamp |
| USEFUL | PostToolUse: Bash | Every command executed — audit trail for automation |
Also run the Claude Workspace Optimizer
The optimizer checks context efficiency: MEMORY.md visibility, rule bloat, token budget. The agent auditor checks safety and architecture. They cover different ground — run both.
pip install claude-workspace-optimizerFrequently asked questions
Does this send my data anywhere?+
Does it work with any Claude Code project?+
What's the architecture score based on?+
Can it automatically fix the issues?+
How is this different from the Claude Workspace Optimizer?+
Share with your team
Know someone running Claude Code agents without deny rules or session logging? This tool is free. Share it.
pip install claude-agent-auditorMIT license. Open source. View on GitHub
Disclaimer: This tool is provided as-is with no warranty. Oaken AI and its contributors accept zero responsibility for any changes made to your workspace based on this tool's output. The report contains recommendations, not instructions. Always review changes before applying them. Back up your workspace before making modifications.
Buy me a coffee
Fuel more free open-source AI tools
BUILT BY OAKEN AI
Need more than an architecture scan?
Oaken AI builds production multi-agent systems for businesses. Architecture, hooks, rules, RAG memory, and infrastructure — everything in this report, built for your stack.