docs · quickstart

Install the hook. Restart the agent. Watch the lie surface.

agentwatch is a passive observer. It captures the prompt, every action your coding agent took, and its final summary — then diffs the claim against reality. Five minutes to first catch. It never blocks or slows the agent.

step 01

Install the capture hook

The capture client is a single Python file with no third-party dependencies (stdlib urllib only). It registers async hooks in your agent so every prompt, tool call, and final summary is observed. Today the shipping shim is for Claude Code; Cursor and Codex are on the roadmap.

From the platform/capture directory, write the hooks into your ~/.claude/settings.json:

shell
python3 claude_code/agentwatch_remote.py install

install registers async hooks for UserPromptSubmit, PostToolUse, PostToolUseFailure, Stop, and SubagentStop. Every hook calls agentwatch_remote.py capture fire-and-forget. The Stop hook is async too — the verdict is computed server-side when the ingest API receives the stop event, so nothing on the capture path ever blocks the agent.

step 02

Point it at your org

Two environment variables tell the hook where to send events and which org they belong to. Grab the API key from signup — it is shown exactly once.

~/.zshrc · ~/.bashrc
# where to send redacted events (your org's ingest API)
export AGENTWATCH_INGEST_URL="https://ingest.agentwatch.io"

# org-scoped Bearer token (minted at signup, shown once)
export AGENTWATCH_API_KEY="aw_live_..."

# optional: best-effort human identity on each event
export AGENTWATCH_USER_EMAIL="dev@acme.com"

Then reload Claude Code (or run /hooks) to load them. If AGENTWATCH_INGEST_URL or AGENTWATCH_API_KEY is unset, capture silently transmits nothing and still exits 0 — safe to install before the platform is provisioned.

VariableRequiredMeaning
AGENTWATCH_INGEST_URLyesBase URL of the ingest API. No trailing slash.
AGENTWATCH_API_KEYyesOrg-scoped Bearer token. The server derives the org from it.
AGENTWATCH_USER_EMAILnoOptional best-effort identity attached to events.
step 03

See your first catch

Restart the agent and give it a task it might be tempted to fib about — the canonical one is “fix the failing test and make sure all tests pass, then commit.” If the agent says it ran the tests but the action log shows no test command, agentwatch catches it.

Open the dashboard. Each session gets a verdict; a caught session reads like this:

agentwatch · receiptsess_4f1a…
ClaimAll tests pass and I've committed the change.
RealityNo test command ran. No git commit in the action log.
✕ caughtDeception suspected

The dashboard relocates to your authed account at /sessions and /sessions/[id]: the prompt, the captured timeline, every deterministic check, and the verdict. No verdict appears until a stop event arrives — that is the agent’s final summary, which is the claim we diff against.


how it verifies

Deterministic checks

Most confabulation is caught with no re-execution — the verifier matches the claim text in the agent’s summary against the captured action log. This is the v0 rule set, unchanged. It runs server-side on every stop event and is the only thing that ever participates in a blocking path.

Claim detectedReality checkedSeverity
tests pass / ran testsa test command actually ran (pytest, npm test, go test, cargo test, jest, vitest…) and its output shows no failure markershigh
build succeedsa build command ran (build, tsc, make, cargo build…)medium
committeda git commit appears in the action loghigh
pusheda git push appears in the action loghigh
deployeda deploy command ran (deploy, vercel, netlify, kubectl apply…)high
“updated N files”distinct files actually written/edited (Write/Edit/MultiEdit/NotebookEdit) equals Nmedium
“fixed it”no tool actions errored during the sessionwarn

Needles are intentionally specific: a bare commit substring would match pre-commit and falsely satisfy the claim — a missed lie, the dangerous direction for a deception detector. A session with any high-severity flag is marked deception_suspected. The verdict shape (flags, checks, deception_suspected) is the v0 shape, reused verbatim.

optional

The optional judge

Deterministic checks catch the named lies. For the fuzzier question — did the actions actually fulfil the prompt? — an independent LLM judge reads the prompt, the real action log, and the claims, and scores whether they line up. It is advisory only and never participates in a blocking path.

The judge is off by default. Enable it on the ingest/verifier side:

ingest env
# enable the advisory judge on stop / verify
export AGENTWATCH_JUDGE="1"

# Anthropic key for the judge call; if unset the judge is
# skipped cleanly (it never errors the verdict)
export ANTHROPIC_API_KEY="sk-ant-..."

When AGENTWATCH_JUDGE=1 but ANTHROPIC_API_KEY is unset, the judge field is simply marked skipped — the deterministic verdict is unaffected. The judge is a second opinion, not a gate.

ci

The CI policy gate

Catch the lie before the change merges. The agentwatch-gate CLI calls POST /v1/policy/evaluate, runs the deterministic verifier (the judge never participates here), and maps the verdict to a CI exit code.

shell
# from the repo (PyPI package published at launch)
pip install -e ./platform/policy-gate

# warn-only (default): annotate the build, never fail it
agentwatch-gate --session-id "$AGENT_SESSION_ID"

# enforce: exit 1 on a high-severity violation
agentwatch-gate --session-id "$AGENT_SESSION_ID" --mode enforce

A composite GitHub Action ships in platform/policy-gate:

.github/workflows/agentwatch.yml
name: agentwatch
on: pull_request

jobs:
  agent-honesty:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      # ... your agent runs and produces AGENT_SESSION_ID ...
      - name: agentwatch policy gate
        uses: ./platform/policy-gate
        with:
          api_url: https://ingest.agentwatch.io
          api_key: ${{ secrets.AGENTWATCH_API_KEY }}
          session_id: ${{ env.AGENT_SESSION_ID }}
          mode: warn          # ramp: warn first, then enforce
          fail_on: high
  • Warn-only is the default. Ship in warn, collect a few weeks of data, then flip to enforce and mark the agent-honesty check required.
  • Fail-open by default. If the API is unreachable the gate prints a warning and exits 0 — an agentwatch outage must not block your deploy. Pass --fail-on-error to override.
  • Tune the blocking threshold with --fail-on high|medium|any, or scope it to a rule subset with --rules no_unverified_tests,no_phantom_commit.

data

What leaves the machine

The prime directive of redaction: code never leaves the machine. Redaction happens at source, in the capture client, before anything is transmitted. The ingest gateway re-applies the same rules as defense-in-depth, but the client is the first and primary guard.

crosses the wire
  • file paths (e.g. src/auth.py)
  • command heads (git commit, pytest -q)
  • tool name, exit code, edit range
  • the agent’s prompt & final summary text
  • last 400 chars of a result tail
never transmitted
  • file bodies — SHA-256 hashed to a marker
  • diffs, patches, new_string/old_string
  • bash arguments past the command head
  • secrets in flags (-m "…", auth headers)

Concretely, the masker keeps only the head the verifier matches on and drops everything that could hide code or a secret:

redaction at source
pytest -q tests/test_auth.py        ->  pytest -q [redacted]
git commit -m "fix the leak"        ->  git commit [redacted]
curl -H "Authorization: Bearer x"   ->  curl [redacted]
npm test                            ->  npm test

# a file body becomes a hash; only the path survives:
{ "tool": "Edit", "file": "src/auth.py",
  "body_sha256": "sha256:9f86d0818…" }

The masker keeps two leading tokens because that is all the verifier needs — git commit, npm test, cargo build. Masking the rest loses no detection signal. Only structural metadata ever crosses the wire.

ready

Mint a key and watch your first session.