docs/superpowers/specs/2026-03-29-exfil-detection-design.md
Ref: Size: 4.7 KiB
# Exfiltration Detection for nono-proxy
## Problem
The nono proxy currently does host-based allowlisting only. A sandboxed process can still exfiltrate sensitive data (SSH keys, passwords, API tokens) to an approved host. We need content inspection to detect and block this.
## Approach
Inline MITM in the existing proxy. Extend the single `nono-proxy` binary to do TLS interception on CONNECT requests, enabling full request body scanning for both HTTP and HTTPS traffic.
## CA Certificate Management
On first startup, `nono-proxy` checks for `~/.local/share/nono/ca.key` and `~/.local/share/nono/ca.pem`. If missing, it generates:
- An ECDSA P-256 CA private key
- A self-signed CA certificate ("Nono Proxy CA", 10-year validity)
Saved to disk and reused on subsequent runs. Per-host leaf certificates are generated on-the-fly at CONNECT time, signed by this CA, and cached in-memory (keyed by hostname) for the process lifetime.
The `nono` wrapper script is updated to:
- Bind-mount `ca.pem` into the sandbox (read-only)
- Set `SSL_CERT_FILE` and `NODE_EXTRA_CA_CERTS` so tools inside the sandbox trust it
## MITM CONNECT Handling
The current `handleConnect` does a blind TCP tunnel. The new flow:
1. Hijack the client connection, send `200 Connection Established`
2. Generate (or fetch from cache) a leaf cert for the requested hostname, signed by the nono CA
3. Wrap the client connection in a `tls.Server` using that leaf cert
4. Establish a real `tls.Client` connection to the target host
5. Read HTTP requests from the client-side TLS connection, run them through the scanner, and if clean, forward to the target
6. Relay the response back to the client
For non-HTTP protocols over CONNECT (e.g. WebSockets upgrade after initial HTTP), forward the upgraded connection as a raw tunnel after the initial request passes scanning.
## Request Body Scanner
A `scanner` package with a `Scan(body []byte) []Finding` function. Each `Finding` has a `Rule` name and a `Match` snippet (truncated for logging, not the full secret).
### Default Rules
| Rule | Pattern |
|------|---------|
| `ssh-private-key` | `-----BEGIN (OPENSSH\|RSA\|DSA\|EC\|ED25519) PRIVATE KEY-----` |
| `pgp-private-key` | `-----BEGIN PGP PRIVATE KEY BLOCK-----` |
| `basic-auth` | `Authorization: Basic` header |
| `bearer-token` | `Authorization: Bearer` header |
| `aws-access-key` | `AKIA[0-9A-Z]{16}` |
| `github-token` | `gh[ps]_[A-Za-z0-9_]{36,}` |
| `openai-key` | `sk-[A-Za-z0-9]{32,}` |
| `password-field` | `password=` or `"password":` in body |
| `env-file` | 3+ consecutive lines matching `[A-Z_]+=.+` |
### Configurable Rules
Rules are loaded from `~/.local/share/nono/rules.yaml`. On first run, `nono-proxy` writes a default file with the built-in rules if one doesn't exist. Users can add, remove, or modify rules.
Format:
```yaml
rules:
- name: ssh-private-key
pattern: "-----BEGIN (OPENSSH|RSA|DSA|EC|ED25519) PRIVATE KEY-----"
- name: github-token
pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
```
Each rule is a name + regex pattern. The scanner compiles them at startup and returns an error if any pattern is invalid.
### Behavior
- Scans outbound request bodies only (not responses)
- Reads the full request body via `io.ReadAll`, scans, and if clean replays via `bytes.Reader`
- On match: logs `BLOCKED <method> <host> [rule1, rule2]`, returns 403 with message like `"request blocked: contains sensitive data (ssh-private-key)"`
## Request Flow
```
Client in sandbox
-> plain HTTP or CONNECT to nono-proxy (port 9854)
-> host allowlist check (existing logic, unchanged)
-> if CONNECT: MITM TLS termination, read inner HTTP request
-> read request body, run scanner rules
-> if findings: log BLOCKED, return 403
-> if clean: forward to target, relay response
```
## Code Changes
- `proxy/proxy.go` — `Proxy` struct gains `caKey`/`caCert` fields and `certCache map[string]*tls.Certificate`. `New()` takes a CA path in addition to the hosts file. `handleConnect` replaced with MITM flow. `handleHTTP` gets scanner check before forwarding.
- `scanner/` — new package with `Rule`, `Finding`, `Scanner` (loads rules from YAML, compiles regexes, exposes `Scan([]byte) []Finding`)
- `ca/` — new package with `LoadOrCreateCA(dir string)` and `GenerateLeafCert(host string, ca)` functions
- `cmd/nono-proxy/main.go` — loads CA at startup, passes to `proxy.New()`, writes default `rules.yaml` if missing
- `nono` script — adds `--ro-bind` for `ca.pem`, sets `SSL_CERT_FILE` and `NODE_EXTRA_CA_CERTS`
## New Dependencies
- `gopkg.in/yaml.v3` for rules config
- Everything else is stdlib (`crypto/x509`, `crypto/tls`, `crypto/ecdsa`)
## Scan Direction
- Request bodies only (outbound exfiltration detection)
- No size cap — large uploads are themselves suspicious