8ac98377

docs: add exfiltration detection design spec
a73x 2026-03-29 16:00
diff --git a/docs/superpowers/specs/2026-03-29-exfil-detection-design.md b/docs/superpowers/specs/2026-03-29-exfil-detection-design.md
new file mode 100644
index 0000000..bc8c27e
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-29-exfil-detection-design.md
@@ -0,0 +1,106 @@
# Exfiltration Detection for nono-proxy

## Problem

The nono proxy currently does host-based allowlisting only. A sandboxed process can still exfiltrate sensitive data (SSH keys, passwords, API tokens) to an approved host. We need content inspection to detect and block this.

## Approach

Inline MITM in the existing proxy. Extend the single `nono-proxy` binary to do TLS interception on CONNECT requests, enabling full request body scanning for both HTTP and HTTPS traffic.

## CA Certificate Management

On first startup, `nono-proxy` checks for `~/.local/share/nono/ca.key` and `~/.local/share/nono/ca.pem`. If missing, it generates:

- An ECDSA P-256 CA private key
- A self-signed CA certificate ("Nono Proxy CA", 10-year validity)

Saved to disk and reused on subsequent runs. Per-host leaf certificates are generated on-the-fly at CONNECT time, signed by this CA, and cached in-memory (keyed by hostname) for the process lifetime.

The `nono` wrapper script is updated to:

- Bind-mount `ca.pem` into the sandbox (read-only)
- Set `SSL_CERT_FILE` and `NODE_EXTRA_CA_CERTS` so tools inside the sandbox trust it

## MITM CONNECT Handling

The current `handleConnect` does a blind TCP tunnel. The new flow:

1. Hijack the client connection, send `200 Connection Established`
2. Generate (or fetch from cache) a leaf cert for the requested hostname, signed by the nono CA
3. Wrap the client connection in a `tls.Server` using that leaf cert
4. Establish a real `tls.Client` connection to the target host
5. Read HTTP requests from the client-side TLS connection, run them through the scanner, and if clean, forward to the target
6. Relay the response back to the client

For non-HTTP protocols over CONNECT (e.g. WebSockets upgrade after initial HTTP), forward the upgraded connection as a raw tunnel after the initial request passes scanning.

## Request Body Scanner

A `scanner` package with a `Scan(body []byte) []Finding` function. Each `Finding` has a `Rule` name and a `Match` snippet (truncated for logging, not the full secret).

### Default Rules

| Rule | Pattern |
|------|---------|
| `ssh-private-key` | `-----BEGIN (OPENSSH\|RSA\|DSA\|EC\|ED25519) PRIVATE KEY-----` |
| `pgp-private-key` | `-----BEGIN PGP PRIVATE KEY BLOCK-----` |
| `basic-auth` | `Authorization: Basic` header |
| `bearer-token` | `Authorization: Bearer` header |
| `aws-access-key` | `AKIA[0-9A-Z]{16}` |
| `github-token` | `gh[ps]_[A-Za-z0-9_]{36,}` |
| `openai-key` | `sk-[A-Za-z0-9]{32,}` |
| `password-field` | `password=` or `"password":` in body |
| `env-file` | 3+ consecutive lines matching `[A-Z_]+=.+` |

### Configurable Rules

Rules are loaded from `~/.local/share/nono/rules.yaml`. On first run, `nono-proxy` writes a default file with the built-in rules if one doesn't exist. Users can add, remove, or modify rules.

Format:

```yaml
rules:
  - name: ssh-private-key
    pattern: "-----BEGIN (OPENSSH|RSA|DSA|EC|ED25519) PRIVATE KEY-----"
  - name: github-token
    pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
```

Each rule is a name + regex pattern. The scanner compiles them at startup and returns an error if any pattern is invalid.

### Behavior

- Scans outbound request bodies only (not responses)
- Reads the full request body via `io.ReadAll`, scans, and if clean replays via `bytes.Reader`
- On match: logs `BLOCKED <method> <host> [rule1, rule2]`, returns 403 with message like `"request blocked: contains sensitive data (ssh-private-key)"`

## Request Flow

```
Client in sandbox
  -> plain HTTP or CONNECT to nono-proxy (port 9854)
  -> host allowlist check (existing logic, unchanged)
  -> if CONNECT: MITM TLS termination, read inner HTTP request
  -> read request body, run scanner rules
  -> if findings: log BLOCKED, return 403
  -> if clean: forward to target, relay response
```

## Code Changes

- `proxy/proxy.go` — `Proxy` struct gains `caKey`/`caCert` fields and `certCache map[string]*tls.Certificate`. `New()` takes a CA path in addition to the hosts file. `handleConnect` replaced with MITM flow. `handleHTTP` gets scanner check before forwarding.
- `scanner/` — new package with `Rule`, `Finding`, `Scanner` (loads rules from YAML, compiles regexes, exposes `Scan([]byte) []Finding`)
- `ca/` — new package with `LoadOrCreateCA(dir string)` and `GenerateLeafCert(host string, ca)` functions
- `cmd/nono-proxy/main.go` — loads CA at startup, passes to `proxy.New()`, writes default `rules.yaml` if missing
- `nono` script — adds `--ro-bind` for `ca.pem`, sets `SSL_CERT_FILE` and `NODE_EXTRA_CA_CERTS`

## New Dependencies

- `gopkg.in/yaml.v3` for rules config
- Everything else is stdlib (`crypto/x509`, `crypto/tls`, `crypto/ecdsa`)

## Scan Direction

- Request bodies only (outbound exfiltration detection)
- No size cap — large uploads are themselves suspicious