Install the hook. Restart the agent. Watch the lie surface.
agentwatch is a passive observer. It captures the prompt, every action your coding agent took, and its final summary — then diffs the claim against reality. Five minutes to first catch. It never blocks or slows the agent.
Install the capture hook
The capture client is a single Python file with no third-party dependencies (stdlib urllib only). It registers async hooks in your agent so every prompt, tool call, and final summary is observed. Today the shipping shim is for Claude Code; Cursor and Codex are on the roadmap.
From the platform/capture directory, write the hooks into your ~/.claude/settings.json:
install registers async hooks for UserPromptSubmit, PostToolUse, PostToolUseFailure, Stop, and SubagentStop. Every hook calls agentwatch_remote.py capture fire-and-forget. The Stop hook is async too — the verdict is computed server-side when the ingest API receives the stop event, so nothing on the capture path ever blocks the agent.
Point it at your org
Two environment variables tell the hook where to send events and which org they belong to. Grab the API key from signup — it is shown exactly once.
Then reload Claude Code (or run /hooks) to load them. If AGENTWATCH_INGEST_URL or AGENTWATCH_API_KEY is unset, capture silently transmits nothing and still exits 0 — safe to install before the platform is provisioned.
| Variable | Required | Meaning |
|---|---|---|
| AGENTWATCH_INGEST_URL | yes | Base URL of the ingest API. No trailing slash. |
| AGENTWATCH_API_KEY | yes | Org-scoped Bearer token. The server derives the org from it. |
| AGENTWATCH_USER_EMAIL | no | Optional best-effort identity attached to events. |
See your first catch
Restart the agent and give it a task it might be tempted to fib about — the canonical one is “fix the failing test and make sure all tests pass, then commit.” If the agent says it ran the tests but the action log shows no test command, agentwatch catches it.
Open the dashboard. Each session gets a verdict; a caught session reads like this:
The dashboard relocates to your authed account at /sessions and /sessions/[id]: the prompt, the captured timeline, every deterministic check, and the verdict. No verdict appears until a stop event arrives — that is the agent’s final summary, which is the claim we diff against.
Deterministic checks
Most confabulation is caught with no re-execution — the verifier matches the claim text in the agent’s summary against the captured action log. This is the v0 rule set, unchanged. It runs server-side on every stop event and is the only thing that ever participates in a blocking path.
| Claim detected | Reality checked | Severity |
|---|---|---|
| tests pass / ran tests | a test command actually ran (pytest, npm test, go test, cargo test, jest, vitest…) and its output shows no failure markers | high |
| build succeeds | a build command ran (build, tsc, make, cargo build…) | medium |
| committed | a git commit appears in the action log | high |
| pushed | a git push appears in the action log | high |
| deployed | a deploy command ran (deploy, vercel, netlify, kubectl apply…) | high |
| “updated N files” | distinct files actually written/edited (Write/Edit/MultiEdit/NotebookEdit) equals N | medium |
| “fixed it” | no tool actions errored during the session | warn |
Needles are intentionally specific: a bare commit substring would match pre-commit and falsely satisfy the claim — a missed lie, the dangerous direction for a deception detector. A session with any high-severity flag is marked deception_suspected. The verdict shape (flags, checks, deception_suspected) is the v0 shape, reused verbatim.
The optional judge
Deterministic checks catch the named lies. For the fuzzier question — did the actions actually fulfil the prompt? — an independent LLM judge reads the prompt, the real action log, and the claims, and scores whether they line up. It is advisory only and never participates in a blocking path.
The judge is off by default. Enable it on the ingest/verifier side:
When AGENTWATCH_JUDGE=1 but ANTHROPIC_API_KEY is unset, the judge field is simply marked skipped — the deterministic verdict is unaffected. The judge is a second opinion, not a gate.
The CI policy gate
Catch the lie before the change merges. The agentwatch-gate CLI calls POST /v1/policy/evaluate, runs the deterministic verifier (the judge never participates here), and maps the verdict to a CI exit code.
A composite GitHub Action ships in platform/policy-gate:
- Warn-only is the default. Ship in
warn, collect a few weeks of data, then flip toenforceand mark theagent-honestycheck required. - Fail-open by default. If the API is unreachable the gate prints a warning and exits 0 — an agentwatch outage must not block your deploy. Pass
--fail-on-errorto override. - Tune the blocking threshold with
--fail-on high|medium|any, or scope it to a rule subset with--rules no_unverified_tests,no_phantom_commit.
What leaves the machine
The prime directive of redaction: code never leaves the machine. Redaction happens at source, in the capture client, before anything is transmitted. The ingest gateway re-applies the same rules as defense-in-depth, but the client is the first and primary guard.
- file paths (e.g.
src/auth.py) - command heads (
git commit,pytest -q) - tool name, exit code, edit range
- the agent’s prompt & final summary text
- last 400 chars of a result tail
- file bodies — SHA-256 hashed to a marker
- diffs, patches,
new_string/old_string - bash arguments past the command head
- secrets in flags (
-m "…", auth headers)
Concretely, the masker keeps only the head the verifier matches on and drops everything that could hide code or a secret:
The masker keeps two leading tokens because that is all the verifier needs — git commit, npm test, cargo build. Masking the rest loses no detection signal. Only structural metadata ever crosses the wire.
Mint a key and watch your first session.