What does the Claude agent auditor check?

It checks eight areas: autonomy risk (bypassPermissions or dontAsk mode with no deny rules), observability hooks (agent tracing, session logging, memory preservation, command logging), rule architecture (coverage of the four LLM failure modes, overlapping rules), agent setup (memory systems, orchestration patterns, specialized agents), MCP server detection (flags unconstrained MCP tools), secrets exposure (API keys or tokens in workspace files), permission configuration, and overall architecture score.

Yes, completely free and open source under the MIT license. No account required, no data collection, runs entirely on your machine.

What is autonomy risk in Claude Code?

Autonomy risk is HIGH when Claude Code is configured with bypassPermissions or dontAsk mode and no deny or ask rules. In this state, Claude can delete files, push to remote, send Slack messages, and make API calls without any confirmation. The auditor detects this and flags it as a P0 issue.

What are observability hooks?

Observability hooks are Claude Code hook configurations that log what your agent does: PostToolUse:Task traces every agent sub-task, the Stop hook logs session decisions before they're lost, PreCompact preserves critical context before compaction, and PostToolUse:Bash logs every command executed. Without these, you can't debug multi-agent failures.

What are the four LLM failure modes?

The four fundamental LLM problems are: Domain Knowledge Gaps (the model lacks project-specific context), Context Window Limits (it can't hold long histories), Hallucinations (it generates confident but wrong output), and Difficulty of Control (getting consistent structured behavior). Your rules should address all four. The auditor checks whether they do.

FREE & OPEN SOURCE

Claude Agent Auditor
Your agent has no guardrails.

Name: Claude Agent Auditor by Oaken AI
Author: Oaken AI

Most Claude Code setups run with no deny rules, no session logging, and rules that cover only 2 of 4 LLM failure modes. One scan surfaces every gap. Free, zero dependencies, read-only.

pip install claude-agent-auditor

Then run claude-agent-auditor --open in your project directory

Checks run

Report pages

Failure modes tested

Cost

See a live sample report

Generated from a real Claude Code workspace. Three pages, each with a different view of your agent architecture.

Current State

Architecture score, autonomy risk level, observability hook coverage, rule architecture analysis, and all detected issues.

View page 1 →

Recommendations

Prioritized P0/P1/P2 action plan. Every missing hook includes a copy-paste JSON snippet. Rule gaps include starter templates.

View page 2 →

Projected Results

Estimated architecture score, autonomy risk, and observability coverage after implementing all recommendations.

View page 3 →

What the auditor finds

CRITICAL

Unconstrained Autonomy

bypassPermissions or dontAsk mode with zero deny or ask rules. Claude can delete files, push code, and send messages without any confirmation. This is the most common P0 issue.

CRITICAL

No Agent Tracing

Multi-step Task tool runs with no PostToolUse hook. When an agent sub-task fails, there's no trace. You can't debug what you can't see.

WARNING

Missing Session Logging

No Stop hook configured. Every architectural decision, plan, and conclusion from the session is lost when it ends. Critical context vanishes between conversations.

WARNING

Rule Coverage Gaps

Rules covering hallucination prevention but nothing on context window limits, or domain knowledge rules with no control constraints. Uncovered failure modes fail silently.

INFO

Overlapping Rules

Two rules with 70%+ keyword overlap loaded every session. They dilute each other and waste context budget. The auditor identifies every redundant pair.

INFO

No Memory Preservation

PreCompact hook absent. Claude compacts long sessions without extracting key decisions first. Context that took hours to build gets summarized away.

WARNING

MCP Servers Without Deny Rules

MCP tools expand Claude's action surface. When servers are configured with no deny rules, every MCP tool runs unchecked. The auditor flags this combination.

CRITICAL

Secrets Exposure

API keys, tokens, or passwords accidentally dropped into rules files or settings JSON that could end up in version control. The auditor scans for common patterns.

Requirements

Python 3.10+ (check with python --version)
Zero dependencies — pure Python stdlib
Does NOT need to be installed inside your Claude Code workspace
Read-only analysis — never modifies your files

How it works

Install (from anywhere)

pip install claude-agent-auditor

Point it at your project

claude-agent-auditor /path/to/your/project --open

Review the three-page report

Page 1: Current State — score, issues, and what's misconfigured. Page 2: Recommendations — prioritized fixes with copy-paste hook JSON. Page 3: Projected Results — your score after all fixes applied.

Ask Claude to fix it

Feed the report to your Claude Code instance: "Read agent-audit/recommendations.html and implement the P0 recommendations." Review the changes before accepting.

The four LLM failure modes

Based on the Stanford CS230 study guide on building with LLMs. Every technique — RAG, fine-tuning, agentic workflows — exists to solve one of these four problems. Your rules should cover all four.

Domain Knowledge Gaps

Base models lack proprietary data, recent events, and internal docs. Solved by RAG, memory systems, and domain-specific rules.

Fix: Memory system, RAG, domain rules

Context Window Limits

Can't hold arbitrarily long history. Requires explicit architectural choices: handoffs, compaction, summarization.

Fix: PreCompact hook, handoff rules, MEMORY.md

Hallucinations

Generates plausible-sounding but incorrect output with confidence. Needs explicit verification rules before asserting facts.

Fix: Verification rules, claim-checking, grounding

Difficulty of Control

Hard to get consistent, scoped behavior. Requires deny rules, ask rules, and explicit scope constraints.

Fix: Deny rules, ask rules, scope limits

Six observability hooks, three priority tiers

If you don't have traces, you can't debug your agent system. The auditor checks for all six hooks and flags missing ones with copy-paste implementation snippets.

Priority	Hook	What it captures
CRITICAL	`PostToolUse: Task`	Every agent sub-task — the backbone of multi-agent tracing
CRITICAL	`Stop`	Session decisions before they're lost on conversation end
IMPORTANT	`PreCompact`	Critical context preserved before compaction runs
IMPORTANT	`SessionStart`	Session initialization and context restore
USEFUL	`PostToolUse: Write\|Edit`	Every file change with path and timestamp
USEFUL	`PostToolUse: Bash`	Every command executed — audit trail for automation

⚡

Also run the Claude Workspace Optimizer

The optimizer checks context efficiency: MEMORY.md visibility, rule bloat, token budget. The agent auditor checks safety and architecture. They cover different ground — run both.

pip install claude-workspace-optimizer

Frequently asked questions

Does this send my data anywhere?+

No. The tool runs entirely on your machine. No API calls, no data collection, no telemetry. Your code never leaves your computer.

Does it work with any Claude Code project?+

Yes. It scans .claude/settings.json, rules/, hooks, and skills. It also reads ~/.claude/ for global settings. Works with any project that uses Claude Code.

What's the architecture score based on?+

The score (0-100) weighs autonomy risk, observability hook coverage, rule architecture coverage of the four failure modes, and agent setup quality. A score under 60 means there are P0 issues to fix before running agents autonomously.

Can it automatically fix the issues?+

The auditor generates a three-page report — it doesn't modify your files. The Recommendations page includes copy-paste JSON for every missing hook. You can also feed the report to your Claude Code instance and ask it to implement the fixes.

How is this different from the Claude Workspace Optimizer?+

Different tools, different concerns. The Workspace Optimizer checks context efficiency (MEMORY.md visibility, context bloat, rule tiering). The Agent Auditor checks safety and architecture (autonomy risk, observability hooks, rule coverage, agent patterns). Run both.

Share with your team

Know someone running Claude Code agents without deny rules or session logging? This tool is free. Share it.

Share on X Share on LinkedIn Submit to HN Post to Reddit

pip install claude-agent-auditor

MIT license. Open source. View on GitHub

Disclaimer: This tool is provided as-is with no warranty. Oaken AI and its contributors accept zero responsibility for any changes made to your workspace based on this tool's output. The report contains recommendations, not instructions. Always review changes before applying them. Back up your workspace before making modifications.

☕

Buy me a coffee

Fuel more free open-source AI tools

BUILT BY OAKEN AI

Need more than an architecture scan?

Oaken AI builds production multi-agent systems for businesses. Architecture, hooks, rules, RAG memory, and infrastructure — everything in this report, built for your stack.

Free AI Assessment Book a Strategy Call All Free Tools

Claude Agent AuditorYour agent has no guardrails.

See a live sample report

Current State

Recommendations

Projected Results

What the auditor finds

Unconstrained Autonomy

No Agent Tracing

Missing Session Logging

Rule Coverage Gaps

Overlapping Rules

No Memory Preservation

MCP Servers Without Deny Rules

Secrets Exposure

Requirements

How it works

Install (from anywhere)

Point it at your project

Review the three-page report

Ask Claude to fix it

The four LLM failure modes

Domain Knowledge Gaps

Context Window Limits

Hallucinations

Difficulty of Control

Six observability hooks, three priority tiers

Frequently asked questions

Share with your team

Need more than an architecture scan?

Claude Agent Auditor
Your agent has no guardrails.