Trust nothing on the wire.
AI coding agents execute tools with real-world consequences, and every input they consume — model output, relay responses, fetched web pages, repository content — can carry injected instructions. Recent research such as “Your Agent Is Mine” (arXiv:2604.08407) shows how practical these hijacks are. Sieve's answer is a checkpoint on the one wire you control: the localhost hop between your agent and the LLM API.
What Sieve assumes
Sieve assumes either end of the conversation can be wrong: the prompt leaving
your machine may carry secrets it should not, and the response coming back may
carry a tool call the model was talked into. It also assumes the path in the
middle — a relay or proxy you route through — can rewrite a
tool_call before it reaches you. Every external input is treated
as hostile until checked against local policy and your intent.
Threats defended
| Threat | What Sieve does |
|---|---|
| Secret exfiltration (outbound) | API keys, high-entropy secrets, BIP39 seed phrases (checksum-verified),
Bitcoin WIF and BIP-32 extended private keys are redacted in place
before the request leaves 127.0.0.1 — automatic, no popup,
with a status-bar notification. |
| Prompt-injected / malicious tool calls (inbound) | Critical tool calls — signing, transfers, sensitive-path access — are intercepted at the tool-call boundary and held for explicit human confirmation before anything runs. |
| Address swaps | EVM addresses seen in a session are tracked; a near-identical substitute coming back in a response is flagged before you sign. |
| Multi-step exfiltration chains | Sequence detection recognizes staged secret-exfiltration patterns across turns (notify-only, conservative by default). |
| Canary decoys | Decoy credential files planted in sensitive directories raise an inbound alert on any tool call that reads them. |
| Untrusted relays | Responses from relays and proxies get the same inbound inspection as official upstreams — a relay that rewrites tool calls is caught at the same gate. |
Detection on both directions
Outbound — every request body passes the outbound filter
pipeline before forwarding. Most findings are auto-redacted in place; the
highest-certainty private-key and seed-phrase matches hold the request for
confirmation instead, and a refusal returns an explicit
sieve_blocked error to the agent.
Inbound — response streams are inspected at tool-call
boundaries. Some rules enforce through the agent's PreToolUse hook;
the rest hold the stream itself (with keep-alive) until you decide via the GUI
or sieve decisions. Every inbound detection covers all four content
routes with parity: Anthropic SSE, Anthropic JSON, OpenAI SSE, and OpenAI JSON —
a rule that only watched streaming responses would be a bypass, not a defense.
Fail-closed semantics
- Critical cannot be disabled — in any mode, preset, or
configuration. There is deliberately no
--yoloflag. - If no decision arrives — the GUI is gone, the CLI is silent, a timeout expires — the held action is blocked, not released. Signing-related holds time out at 120 seconds, closed.
- While a stream is held, upstream chunks buffer in memory up to a hard cap (256 KB by default); overflowing the cap terminates the stream closed rather than letting content slip through.
- Headless approval has a floor: signing, transfers and sensitive-path decisions cannot be approved from the CLI at all — they require the GUI.
- On non-streaming JSON responses, where a stream cannot be held open, the hold degrades to an outright block — never to a pass.
- Degraded mode keeps blocking Critical: losing the GUI or the audit store never relaxes enforcement.
Architecture facts you can check
- One binary, loopback only. The daemon refuses to start if
bind_addris anything but127.0.0.1— binding0.0.0.0is a fatal config error, not a warning. - All detection runs locally. There is no cloud API. Sieve never uploads prompts, responses, or keys; the only outbound call it makes for itself is fetching signed rule updates.
- No MITM. Sieve does not install a local CA or intercept TLS; forwarding to the upstream is end-to-end TLS.
- Verifiable. Releases are Sigstore-signed and reproducible; the engine and rules format are open. See verify signed builds.
Explicitly out of scope
- User misconfiguration — the config layer rejects dangerous values (like a public bind address) rather than defending them at runtime.
- Vulnerabilities in relays or upstream APIs themselves — hostile upstream output is exactly what Sieve inspects, but patching the upstream is not its job.
- Wallet or browser-extension phishing — Sieve is a cognitive-friction layer on agent traffic, not a wallet security product.
- Individual detection misses or false positives, unless they violate the Critical false-positive budget.
- Running stale or unsigned binaries.
Honest limitations
The audit log's hash chain guarantees history cannot be silently rewritten, but cannot stop forged entries appended after a compromise. Behavioral sequence detection is notify-only and off by default. OS-level network interception, local ML models, and non-macOS platforms are on the roadmap, not in the current build. Sieve reduces the blast radius of a hijacked agent; it does not make a hijacked agent safe.