THREAT MODEL

Trust nothing on the wire.

AI coding agents execute tools with real-world consequences, and every input they consume — model output, relay responses, fetched web pages, repository content — can carry injected instructions. Recent research such as “Your Agent Is Mine” (arXiv:2604.08407) shows how practical these hijacks are. Sieve's answer is a checkpoint on the one wire you control: the localhost hop between your agent and the LLM API.

What Sieve assumes

Sieve assumes either end of the conversation can be wrong: the prompt leaving your machine may carry secrets it should not, and the response coming back may carry a tool call the model was talked into. It also assumes the path in the middle — a relay or proxy you route through — can rewrite a tool_call before it reaches you. Every external input is treated as hostile until checked against local policy and your intent.

Threats defended

ThreatWhat Sieve does
Secret exfiltration (outbound) API keys, high-entropy secrets, BIP39 seed phrases (checksum-verified), Bitcoin WIF and BIP-32 extended private keys are redacted in place before the request leaves 127.0.0.1 — automatic, no popup, with a status-bar notification.
Prompt-injected / malicious tool calls (inbound) Critical tool calls — signing, transfers, sensitive-path access — are intercepted at the tool-call boundary and held for explicit human confirmation before anything runs.
Address swaps EVM addresses seen in a session are tracked; a near-identical substitute coming back in a response is flagged before you sign.
Multi-step exfiltration chains Sequence detection recognizes staged secret-exfiltration patterns across turns (notify-only, conservative by default).
Canary decoys Decoy credential files planted in sensitive directories raise an inbound alert on any tool call that reads them.
Untrusted relays Responses from relays and proxies get the same inbound inspection as official upstreams — a relay that rewrites tool calls is caught at the same gate.

Detection on both directions

Outbound — every request body passes the outbound filter pipeline before forwarding. Most findings are auto-redacted in place; the highest-certainty private-key and seed-phrase matches hold the request for confirmation instead, and a refusal returns an explicit sieve_blocked error to the agent.

Inbound — response streams are inspected at tool-call boundaries. Some rules enforce through the agent's PreToolUse hook; the rest hold the stream itself (with keep-alive) until you decide via the GUI or sieve decisions. Every inbound detection covers all four content routes with parity: Anthropic SSE, Anthropic JSON, OpenAI SSE, and OpenAI JSON — a rule that only watched streaming responses would be a bypass, not a defense.

Fail-closed semantics

  • Critical cannot be disabled — in any mode, preset, or configuration. There is deliberately no --yolo flag.
  • If no decision arrives — the GUI is gone, the CLI is silent, a timeout expires — the held action is blocked, not released. Signing-related holds time out at 120 seconds, closed.
  • While a stream is held, upstream chunks buffer in memory up to a hard cap (256 KB by default); overflowing the cap terminates the stream closed rather than letting content slip through.
  • Headless approval has a floor: signing, transfers and sensitive-path decisions cannot be approved from the CLI at all — they require the GUI.
  • On non-streaming JSON responses, where a stream cannot be held open, the hold degrades to an outright block — never to a pass.
  • Degraded mode keeps blocking Critical: losing the GUI or the audit store never relaxes enforcement.

Architecture facts you can check

  • One binary, loopback only. The daemon refuses to start if bind_addr is anything but 127.0.0.1 — binding 0.0.0.0 is a fatal config error, not a warning.
  • All detection runs locally. There is no cloud API. Sieve never uploads prompts, responses, or keys; the only outbound call it makes for itself is fetching signed rule updates.
  • No MITM. Sieve does not install a local CA or intercept TLS; forwarding to the upstream is end-to-end TLS.
  • Verifiable. Releases are Sigstore-signed and reproducible; the engine and rules format are open. See verify signed builds.

Explicitly out of scope

  • User misconfiguration — the config layer rejects dangerous values (like a public bind address) rather than defending them at runtime.
  • Vulnerabilities in relays or upstream APIs themselves — hostile upstream output is exactly what Sieve inspects, but patching the upstream is not its job.
  • Wallet or browser-extension phishing — Sieve is a cognitive-friction layer on agent traffic, not a wallet security product.
  • Individual detection misses or false positives, unless they violate the Critical false-positive budget.
  • Running stale or unsigned binaries.

Honest limitations

The audit log's hash chain guarantees history cannot be silently rewritten, but cannot stop forged entries appended after a compromise. Behavioral sequence detection is notify-only and off by default. OS-level network interception, local ML models, and non-macOS platforms are on the roadmap, not in the current build. Sieve reduces the blast radius of a hijacked agent; it does not make a hijacked agent safe.