a73x

docs/superpowers/plans/2026-04-18-input-latency-bench-implementation.md

Ref:   Size: 54.8 KiB

# Input-Latency Bench Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Implement a closed-loop keystroke-to-display latency benchmark in waystty, measuring cold (idle) and hot (contended-PTY) latency via in-process `KeyEvent` injection, PUA-codepoint sentinels, and `wp_presentation_time` feedback for the display endpoint.

**Architecture:** A new `WAYSTTY_INPUT_BENCH` mode drives a `BenchDriver` that injects sentinels, scans rendered frames for them, and pairs grid-observations with compositor presentation-feedback to compute latency. A shared grid-lock infrastructure (also applied to the existing output bench) forces a known grid size for reproducibility.

**Tech Stack:** Zig 0.15+, zig-wayland (ifreund), Vulkan WSI (vulkan-zig), Wayland `wp_presentation_time` protocol, existing `bench_stats` module.

**Reference spec:** `docs/superpowers/specs/2026-04-18-input-latency-bench-design.md`

---

## Phase 1 — Shared Bench Infrastructure (grid-lock retrofit)

These tasks apply to *both* existing `WAYSTTY_BENCH` and the new `WAYSTTY_INPUT_BENCH`. They harden reproducibility of what's already there before adding new bench modes.

### Task 1.1: Bench-mode grid-size env vars

**Files:**
- Modify: `src/main.zig:196` (initial_grid constant)

- [ ] **Step 1: Add env parsing for bench grid size**

Replace the `initial_grid` constant and add a helper above `main()`:

```zig
fn benchGridSize() GridSize {
    const cols_str = std.posix.getenv("WAYSTTY_BENCH_COLS") orelse "80";
    const rows_str = std.posix.getenv("WAYSTTY_BENCH_ROWS") orelse "24";
    const cols = std.fmt.parseInt(u16, cols_str, 10) catch 80;
    const rows = std.fmt.parseInt(u16, rows_str, 10) catch 24;
    return .{ .cols = cols, .rows = rows };
}

fn benchModeActive() bool {
    return std.posix.getenv("WAYSTTY_BENCH") != null
        or std.posix.getenv("WAYSTTY_INPUT_BENCH") != null;
}
```

Change the `initial_grid` site in `main`:

```zig
    // === grid size ===
    const initial_grid: GridSize = if (benchModeActive())
        benchGridSize()
    else
        .{ .cols = 80, .rows = 24 };
    var cols: u16 = initial_grid.cols;
    var rows: u16 = initial_grid.rows;
```

- [ ] **Step 2: Verify build**

Run: `zig build`
Expected: PASS

- [ ] **Step 3: Commit**

```bash
git add src/main.zig
git commit -m "bench: parse WAYSTTY_BENCH_COLS/ROWS for configurable grid"
```

### Task 1.2: `xdg_toplevel` min/max size hints

**Files:**
- Modify: `src/main.zig` around the `xdg_toplevel` setup (after window creation, before the first roundtrip at `src/main.zig:213`)

- [ ] **Step 1: Expose size hints on Window in `src/wayland.zig`**

Add a method on `Window` near the existing `setTitle`:

```zig
pub fn setSizeHints(self: *Window, w: u32, h: u32) void {
    const iw = @as(i32, @intCast(w));
    const ih = @as(i32, @intCast(h));
    self.xdg_toplevel.setMinSize(iw, ih);
    self.xdg_toplevel.setMaxSize(iw, ih);
}
```

- [ ] **Step 2: Call from main when bench is active**

After `window.height = initial_h;` (around `src/main.zig:211`) and before the roundtrip:

```zig
    if (benchModeActive()) {
        window.setSizeHints(initial_w, initial_h);
    }
```

- [ ] **Step 3: Build and smoke-test**

Run: `zig build && WAYSTTY_BENCH=1 WAYSTTY_BENCH_ROWS=24 WAYSTTY_BENCH_COLS=80 ./zig-out/bin/waystty 2>/tmp/smoke.log; head -40 /tmp/smoke.log`
Expected: launches and exits cleanly; in a floating window on sway, geometry is respected.

- [ ] **Step 4: Commit**

```bash
git add src/main.zig src/wayland.zig
git commit -m "bench: advertise xdg_toplevel min/max size hints in bench mode"
```

### Task 1.3: Abort on compositor resize in bench mode

**Files:**
- Modify: `src/main.zig:409` (resize observer)

- [ ] **Step 1: Add abort on size mismatch in the resize observer**

Replace the block at `src/main.zig:409`:

```zig
        if (window.width != last_window_w or window.height != last_window_h) {
            if (benchModeActive()) {
                std.debug.print(
                    "\nwaystty bench: compositor sized window to {}x{}, expected {}x{} ({}x{} grid). " ++
                    "Run in a floating window or non-tiling compositor for reproducible benchmarks.\n",
                    .{ window.width, window.height, initial_w, initial_h, cols, rows },
                );
                std.process.exit(2);
            }
            resize_pending = true;
            render_pending = true;
        }
```

- [ ] **Step 2: Build**

Run: `zig build`
Expected: PASS

- [ ] **Step 3: Manual sanity — tiling compositor abort**

On sway (tiling mode), run:
`WAYSTTY_BENCH=1 ./zig-out/bin/waystty 2>/tmp/bench-abort.log`
Expected: exits with code 2 and diagnostic message. (In floating mode, no abort.)

- [ ] **Step 4: Commit**

```bash
git add src/main.zig
git commit -m "bench: abort with diagnostic if compositor resizes during bench"
```

### Task 1.4: Print grid size in bench stats output

**Files:**
- Modify: `src/bench_stats.zig:152-162` (`printFrameStats`)

- [ ] **Step 1: Extend signature to take grid dims**

Replace `printFrameStats`:

```zig
pub fn printFrameStats(stats: FrameTimingStats, cols: u16, rows: u16) void {
    const row_fmt = "{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}\n";
    std.debug.print("\n=== waystty frame timing ({d} frames, {d}x{d} grid) ===\n", .{ stats.frame_count, cols, rows });
    std.debug.print("{s:<20}{s:>6}{s:>6}{s:>6}{s:>6}  (us)\n", .{ "section", "min", "avg", "p99", "max" });
    std.debug.print(row_fmt, .{ "snapshot",        stats.snapshot.min,        stats.snapshot.avg,        stats.snapshot.p99,        stats.snapshot.max });
    std.debug.print(row_fmt, .{ "row_rebuild",     stats.row_rebuild.min,     stats.row_rebuild.avg,     stats.row_rebuild.p99,     stats.row_rebuild.max });
    std.debug.print(row_fmt, .{ "atlas_upload",    stats.atlas_upload.min,    stats.atlas_upload.avg,    stats.atlas_upload.p99,    stats.atlas_upload.max });
    std.debug.print(row_fmt, .{ "instance_upload", stats.instance_upload.min, stats.instance_upload.avg, stats.instance_upload.p99, stats.instance_upload.max });
    std.debug.print(row_fmt, .{ "gpu_submit",      stats.gpu_submit.min,      stats.gpu_submit.avg,      stats.gpu_submit.p99,      stats.gpu_submit.max });
    std.debug.print("----------------------------------------------------\n", .{});
    std.debug.print(row_fmt, .{ "total",           stats.total.min,           stats.total.avg,           stats.total.p99,           stats.total.max });
}
```

- [ ] **Step 2: Update all call sites in `src/main.zig`**

Run: `grep -n "printFrameStats" src/main.zig`
For each call, pass `cols, rows` as additional args. E.g. `printFrameStats(computeFrameStats(&frame_ring), cols, rows);`

- [ ] **Step 3: Build and test**

Run: `zig build && zig build test`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/main.zig src/bench_stats.zig
git commit -m "bench: include grid size in stats header"
```

---

## Phase 2 — Frame-counter plumbing

### Task 2.1: Add `frame_counter` field to `FrameTiming`

**Files:**
- Modify: `src/bench_stats.zig:3-24` (`FrameTiming` struct)

- [ ] **Step 1: Extend the struct**

Add `frame_counter: u64 = 0,` as a field on `FrameTiming` (keep `.total()` unchanged — counter is metadata, not timing):

```zig
pub const FrameTiming = struct {
    frame_counter: u64 = 0,
    snapshot_us: u32 = 0,
    row_rebuild_us: u32 = 0,
    atlas_upload_us: u32 = 0,
    instance_upload_us: u32 = 0,
    gpu_submit_us: u32 = 0,
    wait_fences_us: u32 = 0,
    acquire_us: u32 = 0,
    record_us: u32 = 0,
    submit_us: u32 = 0,
    present_us: u32 = 0,

    pub fn total(self: FrameTiming) u32 {
        return self.snapshot_us +
            self.row_rebuild_us +
            self.atlas_upload_us +
            self.instance_upload_us +
            self.gpu_submit_us;
    }
};
```

- [ ] **Step 2: Update CSV writer to include frame_counter column**

In `writeFrameCsv` (around `src/bench_stats.zig:124`), change the header and row:

```zig
    _ = try file.write("frame_counter,frame_idx,snapshot_us,row_rebuild_us,atlas_upload_us,instance_upload_us,gpu_submit_us,wait_fences_us,acquire_us,record_us,submit_us,present_us,total_us\n");
    for (entries, 0..) |e, i| {
        const line = try std.fmt.bufPrint(&buf, "{d},{d},{d},{d},{d},{d},{d},{d},{d},{d},{d},{d},{d}\n", .{
            e.frame_counter,
            i,
            e.snapshot_us,
            e.row_rebuild_us,
            e.atlas_upload_us,
            e.instance_upload_us,
            e.gpu_submit_us,
            e.wait_fences_us,
            e.acquire_us,
            e.record_us,
            e.submit_us,
            e.present_us,
            e.total(),
        });
        _ = try file.write(line);
    }
```

- [ ] **Step 3: Increment in main loop**

In `src/main.zig`, add near the other `var` declarations before the main loop (around `src/main.zig:339`):

```zig
    var frame_counter: u64 = 0;
```

At the end of each rendered frame (where the ring push happens — grep `frame_ring.push` to find it), set the counter on the timing struct before pushing, then increment:

```zig
    timing.frame_counter = frame_counter;
    frame_ring.push(timing);
    frame_counter +%= 1;
```

(Use the exact local name of the timing variable at that site; adjust if named differently.)

- [ ] **Step 4: Build and test**

Run: `zig build && zig build test`
Expected: PASS. Existing tests continue to pass (they don't touch `frame_counter`).

- [ ] **Step 5: Commit**

```bash
git add src/bench_stats.zig src/main.zig
git commit -m "bench: add frame_counter to FrameTiming for sample correlation"
```

### Task 2.2: Test that `frame_counter` round-trips through the ring

**Files:**
- Modify: `src/bench_stats.zig` (add new test)

- [ ] **Step 1: Add test**

Append to the test block:

```zig
test "FrameTimingRing preserves frame_counter through wrap" {
    var ring = FrameTimingRing{};
    for (0..FrameTimingRing.capacity + 5) |i| {
        ring.push(.{ .frame_counter = i, .snapshot_us = @intCast(i) });
    }
    var buf: [FrameTimingRing.capacity]FrameTiming = undefined;
    const ordered = ring.orderedSlice(&buf);
    try std.testing.expectEqual(@as(u64, 5), ordered[0].frame_counter);
    try std.testing.expectEqual(@as(u64, FrameTimingRing.capacity + 4), ordered[ordered.len - 1].frame_counter);
}
```

- [ ] **Step 2: Run test**

Run: `zig build test 2>&1 | grep -E "PASS|FAIL|error"`
Expected: test PASSes.

- [ ] **Step 3: Commit**

```bash
git add src/bench_stats.zig
git commit -m "bench: test frame_counter preservation across ring wrap"
```

---

## Phase 3 — `wp_presentation_time` protocol binding

### Task 3.1: Add protocol XML to the Wayland scanner

**Files:**
- Modify: `build.zig:35-40`

- [ ] **Step 1: Register the protocol**

After `scanner.addSystemProtocol("stable/xdg-shell/xdg-shell.xml");`:

```zig
    scanner.addSystemProtocol("stable/presentation-time/presentation-time.xml");
```

And after `scanner.generate("xdg_wm_base", 6);`:

```zig
    scanner.generate("wp_presentation", 1);
```

- [ ] **Step 2: Build**

Run: `zig build`
Expected: PASS. (Requires `wayland-protocols` system package — if missing, install it via the distro's wayland-protocols dev package.)

- [ ] **Step 3: Commit**

```bash
git add build.zig
git commit -m "bench: register wp_presentation_time protocol in build.zig"
```

### Task 3.2: Bind `wp_presentation` global in the Wayland layer

**Files:**
- Modify: `src/wayland.zig` — find the `Globals` struct and the registry listener

- [ ] **Step 1: Find the Globals struct**

Run: `grep -n "struct.*Globals\|pub const Globals\|globals:\|seat: ?\|compositor: ?" src/wayland.zig | head -20`

Locate where other globals like `seat`, `compositor`, `data_device_manager` are declared. Add a new field:

```zig
    wp_presentation: ?*wp.Presentation = null,
```

(Adjust the Wayland protocol import — the generated module exposes `wp` as a namespace; follow the pattern already used for `xdg`/`wl`.)

- [ ] **Step 2: Handle the global in the registry listener**

Find the `registryListener` (or similarly named) that dispatches `registry.global` events. In the switch on interface name, add a branch for `wp_presentation`:

```zig
} else if (std.mem.eql(u8, interface, "wp_presentation")) {
    globals.wp_presentation = registry.bind(name, wp.Presentation, 1) catch null;
}
```

(Follow the exact pattern of neighboring `std.mem.eql(u8, interface, "wl_seat")` branches.)

- [ ] **Step 3: Import the generated namespace at the top of `src/wayland.zig`**

Find the existing `const wl = @import("wayland").client.wl;` line and add a parallel:

```zig
const wp = @import("wayland").client.wp;
```

(If the generated module uses a different namespace (e.g. `wp_presentation` rather than `wp`), use whatever the scanner emits — check `zig-cache`'s generated wayland.zig.)

- [ ] **Step 4: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 5: Commit**

```bash
git add src/wayland.zig
git commit -m "bench: bind wp_presentation global"
```

### Task 3.3: Wrap `wp_presentation_feedback` with a Zig-friendly callback

**Files:**
- Modify: `src/wayland.zig`

- [ ] **Step 1: Add a `PresentationFeedback` wrapper type**

Near the existing Window / Keyboard types, add:

```zig
pub const PresentationFeedback = struct {
    pub const Event = union(enum) {
        presented: struct { tv_sec: u64, tv_nsec: u32, refresh: u32, flags: u32 },
        discarded: void,
    };

    feedback: *wp.PresentationFeedback,
    user_data: ?*anyopaque = null,
    callback: ?*const fn (user_data: ?*anyopaque, ev: Event) void = null,

    pub fn init(
        presentation: *wp.Presentation,
        surface: *wl.Surface,
        user_data: ?*anyopaque,
        callback: *const fn (?*anyopaque, Event) void,
    ) !*PresentationFeedback {
        const alloc = std.heap.c_allocator; // arena-free, lives until presented/discarded
        const self = try alloc.create(PresentationFeedback);
        self.* = .{
            .feedback = try presentation.feedback(surface),
            .user_data = user_data,
            .callback = callback,
        };
        self.feedback.setListener(*PresentationFeedback, feedbackListener, self);
        return self;
    }

    fn feedbackListener(
        _: *wp.PresentationFeedback,
        event: wp.PresentationFeedback.Event,
        self: *PresentationFeedback,
    ) void {
        switch (event) {
            .presented => |p| {
                const tv_sec = (@as(u64, p.tv_sec_hi) << 32) | p.tv_sec_lo;
                if (self.callback) |cb| {
                    cb(self.user_data, .{ .presented = .{
                        .tv_sec = tv_sec,
                        .tv_nsec = p.tv_nsec,
                        .refresh = p.refresh,
                        .flags = @bitCast(p.flags),
                    } });
                }
                self.destroy();
            },
            .discarded => {
                if (self.callback) |cb| cb(self.user_data, .discarded);
                self.destroy();
            },
            else => {}, // sync_output events — ignore
        }
    }

    fn destroy(self: *PresentationFeedback) void {
        self.feedback.destroy();
        std.heap.c_allocator.destroy(self);
    }
};
```

(Allocator choice: `c_allocator` because lifetime is tied to async Wayland events, not main-loop ownership. If the project already has a conventional allocator for this, use it.)

- [ ] **Step 2: Build**

Run: `zig build`
Expected: PASS. Fix any compilation errors (wp event names differ slightly in generated bindings — check `zig build --verbose` for exact field names).

- [ ] **Step 3: Commit**

```bash
git add src/wayland.zig
git commit -m "bench: add PresentationFeedback wrapper with typed callback"
```

### Task 3.4: Smoke-test `wp_presentation` under sway

**Files:**
- Create: `src/tools/presentation_smoke.zig` (new, small standalone program)

- [ ] **Step 1: Write a minimal smoke test**

```zig
const std = @import("std");
const wayland_client = @import("wayland-client");

pub fn main() !void {
    var gpa: std.heap.DebugAllocator(.{}) = .init;
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    const conn = try wayland_client.Connection.init(alloc);
    defer conn.deinit();

    if (conn.globals.wp_presentation == null) {
        std.debug.print("FAIL: wp_presentation global not advertised by compositor\n", .{});
        std.process.exit(1);
    }
    std.debug.print("OK: wp_presentation bound\n", .{});
}
```

- [ ] **Step 2: Add build step in `build.zig`**

After other tools (grep for `bench_baseline` to find the pattern), add:

```zig
    const presentation_smoke_exe = b.addExecutable(.{
        .name = "presentation-smoke",
        .root_source_file = b.path("src/tools/presentation_smoke.zig"),
        .target = target,
        .optimize = optimize,
    });
    presentation_smoke_exe.root_module.addImport("wayland-client", wayland_mod);
    const run_presentation_smoke = b.addRunArtifact(presentation_smoke_exe);
    const smoke_step = b.step("presentation-smoke", "Verify wp_presentation binding");
    smoke_step.dependOn(&run_presentation_smoke.step);
```

- [ ] **Step 3: Run**

Run: `zig build presentation-smoke`
Expected: prints `OK: wp_presentation bound`. On compositors without the protocol, FAILs with a clear message.

- [ ] **Step 4: Commit**

```bash
git add src/tools/presentation_smoke.zig build.zig
git commit -m "bench: add presentation-smoke tool for wp_presentation binding check"
```

### Task 3.5: Request feedback per-frame in the renderer

**Files:**
- Modify: `src/renderer.zig` (around the swapchain `queuePresentKHR` call) and `src/main.zig`

- [ ] **Step 1: Find the present call**

Run: `grep -n "queuePresentKHR\|present_info\|queue_present" src/renderer.zig`
Expected: one location where `vkQueuePresentKHR` is invoked.

- [ ] **Step 2: Expose a pre-present hook**

Add a function pointer field on the renderer context (or equivalent) that's called just before `queuePresentKHR`:

```zig
// In the Context struct definition
pre_present_hook: ?*const fn (ctx: ?*anyopaque) void = null,
pre_present_ctx: ?*anyopaque = null,
```

Immediately before the `queuePresentKHR` call site:

```zig
if (ctx.pre_present_hook) |h| h(ctx.pre_present_ctx);
```

- [ ] **Step 3: Wire the hook from main**

In `src/main.zig`, after the bench driver is initialized (Task 4.1+), set:

```zig
ctx.pre_present_hook = &benchPrePresentHook;
ctx.pre_present_ctx = &bench_driver;
```

For now, leave the hook body as a placeholder fn that does nothing — actual feedback-request logic lands in Phase 6. Define:

```zig
fn benchPrePresentHook(opaque_ctx: ?*anyopaque) void {
    _ = opaque_ctx;
    // Populated in Task 6.3
}
```

- [ ] **Step 4: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 5: Commit**

```bash
git add src/renderer.zig src/main.zig
git commit -m "bench: add pre-present hook in renderer for wp_presentation_feedback"
```

---

## Phase 4 — BenchDriver skeleton + PTY plumbing

### Task 4.1: Env parsing + `Scenario` enum

**Files:**
- Create: `src/bench_input.zig`
- Modify: `build.zig` (register module)
- Modify: `src/main.zig`

- [ ] **Step 1: Create `src/bench_input.zig` scaffold**

```zig
const std = @import("std");

pub const Scenario = enum {
    cold,
    hot,
    both,

    pub fn parse(s: []const u8) ?Scenario {
        if (std.mem.eql(u8, s, "cold")) return .cold;
        if (std.mem.eql(u8, s, "hot")) return .hot;
        if (std.mem.eql(u8, s, "both")) return .both;
        if (std.mem.eql(u8, s, "1")) return .both; // default when set to any truthy
        return null;
    }
};

pub const Config = struct {
    scenario: Scenario,
    samples_per_scenario: u32 = 500,
    max_frames_per_sample: u32 = 60,
    cols: u16 = 80,
    rows: u16 = 24,
};

pub fn readConfigFromEnv() ?Config {
    const val = std.posix.getenv("WAYSTTY_INPUT_BENCH") orelse return null;
    const sc = Scenario.parse(val) orelse {
        std.debug.print("WAYSTTY_INPUT_BENCH: invalid scenario '{s}', expected cold|hot|both\n", .{val});
        std.process.exit(2);
    };
    return .{
        .scenario = sc,
        .cols = if (std.posix.getenv("WAYSTTY_BENCH_COLS")) |s|
            (std.fmt.parseInt(u16, s, 10) catch 80)
        else 80,
        .rows = if (std.posix.getenv("WAYSTTY_BENCH_ROWS")) |s|
            (std.fmt.parseInt(u16, s, 10) catch 24)
        else 24,
    };
}

test "Scenario.parse" {
    try std.testing.expectEqual(@as(?Scenario, .cold), Scenario.parse("cold"));
    try std.testing.expectEqual(@as(?Scenario, .hot), Scenario.parse("hot"));
    try std.testing.expectEqual(@as(?Scenario, .both), Scenario.parse("both"));
    try std.testing.expectEqual(@as(?Scenario, null), Scenario.parse("nope"));
}
```

- [ ] **Step 2: Register module in `build.zig`**

After other module declarations (grep `bench_stats_mod` pattern), add:

```zig
    const bench_input_mod = b.createModule(.{
        .root_source_file = b.path("src/bench_input.zig"),
        .target = target,
        .optimize = optimize,
    });
```

And at the waystty executable's `addImport` block, add:

```zig
    exe.root_module.addImport("bench_input", bench_input_mod);
```

Also add a test step for it:

```zig
    const bench_input_tests = b.addTest(.{ .root_module = bench_input_mod });
    const run_bench_input_tests = b.addRunArtifact(bench_input_tests);
    const test_step = b.step("test", "Run tests"); // if a test step already exists, just add the dep
    test_step.dependOn(&run_bench_input_tests.step);
```

(If there's already a `test` step — check with `grep "b.step(\"test\"" build.zig` — add `test_step.dependOn(&run_bench_input_tests.step);` to the existing one.)

- [ ] **Step 3: Run tests**

Run: `zig build test 2>&1 | tail -20`
Expected: `Scenario.parse` passes.

- [ ] **Step 4: Commit**

```bash
git add src/bench_input.zig build.zig
git commit -m "bench: create bench_input module with Scenario enum + Config"
```

### Task 4.2: PTY termios ECHO verification helper

**Files:**
- Modify: `src/pty.zig`

- [ ] **Step 1: Find PTY spawn**

Run: `grep -n "pub fn spawn\|tcsetattr\|termios\|ECHO" src/pty.zig | head`

- [ ] **Step 2: Add a helper**

Near the existing `spawn` method:

```zig
pub fn ensureEcho(slave_fd: std.posix.fd_t) !void {
    var tio: std.posix.termios = undefined;
    try std.posix.tcgetattr(slave_fd, &tio);
    if ((tio.lflag & std.posix.system.linux.ECHO) == 0) {
        tio.lflag |= std.posix.system.linux.ECHO;
        try std.posix.tcsetattr(slave_fd, .NOW, tio);
    }
}
```

(Adjust namespaces if Zig stdlib differs slightly — find by grepping `tcgetattr` in the stdlib.)

- [ ] **Step 3: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/pty.zig
git commit -m "bench: add Pty.ensureEcho helper"
```

### Task 4.3: Cold PTY spawn (`cat > /dev/null`)

**Files:**
- Modify: `src/main.zig` (spawn block around lines 276-304)

- [ ] **Step 1: Extend the shell-selection logic**

Replace the block from `const is_bench = ...` through `defer p.deinit();` (roughly `src/main.zig:276-305`):

```zig
    const bench_input_cfg = @import("bench_input").readConfigFromEnv();
    const is_output_bench = std.posix.getenv("WAYSTTY_BENCH") != null;
    const bench_unthrottled = is_output_bench and std.posix.getenv("WAYSTTY_BENCH_UNTHROTTLED") != null;

    // Shell + args choice
    const ShellPlan = struct {
        shell: [:0]const u8,
        args: ?[]const [:0]const u8,
    };
    const shell_plan: ShellPlan = if (bench_input_cfg) |cfg| blk: {
        const sh_args: []const [:0]const u8 = switch (cfg.scenario) {
            .cold, .both => &.{ "-c", "cat > /dev/null" },
            .hot => &.{ "-c", "yes \"$(printf 'x%.0s' {1..500})\" | pv -qL 24K" },
        };
        break :blk .{ .shell = try alloc.dupeZ(u8, "/bin/sh"), .args = sh_args };
    } else if (is_output_bench) blk: {
        break :blk .{ .shell = try alloc.dupeZ(u8, "/bin/sh"), .args = null };
    } else blk: {
        const shell_env = std.posix.getenv("SHELL") orelse "/bin/sh";
        break :blk .{ .shell = try alloc.dupeZ(u8, shell_env), .args = null };
    };
    defer alloc.free(shell_plan.shell);

    const bench_script: ?[:0]const u8 = if (is_output_bench)
        @embedFile("bench_workload")
    else
        null;

    if (is_output_bench) {
        if (bench_unthrottled) {
            std.debug.print("[bench] mode: UNTHROTTLED (not freeze-safe)\n", .{});
        } else {
            std.debug.print("[bench] mode: THROTTLED (vsync-paced)\n", .{});
        }
    }
    if (bench_input_cfg) |cfg| {
        std.debug.print("[input-bench] scenario: {s}, grid: {d}x{d}\n", .{ @tagName(cfg.scenario), cfg.cols, cfg.rows });
    }

    const pty_args = if (shell_plan.args) |a| a else if (bench_script) |script| &[_][:0]const u8{ "-c", script } else null;

    var p = try pty.Pty.spawn(.{
        .cols = cols,
        .rows = rows,
        .shell = shell_plan.shell,
        .shell_args = pty_args,
    });
    defer p.deinit();
    try pty.Pty.ensureEcho(p.slave_fd); // if slave_fd isn't public, expose it; otherwise do inside Pty.spawn
    term.setWritePtyCallback(&p, &writePtyFromTerminal);
```

(If `p.slave_fd` is not exposed, modify `src/pty.zig` to expose it, or call `ensureEcho` from inside `Pty.spawn`.)

- [ ] **Step 2: Build and smoke-test cold**

Run: `zig build && WAYSTTY_INPUT_BENCH=cold WAYSTTY_BENCH_COLS=80 WAYSTTY_BENCH_ROWS=24 ./zig-out/bin/waystty 2>/tmp/bench-cold.log &`

Let it run ~2s, then kill. Inspect `/tmp/bench-cold.log` — should show the `[input-bench] scenario: cold` line.

- [ ] **Step 3: Commit**

```bash
git add src/main.zig src/pty.zig
git commit -m "bench: spawn bench-specific PTY children for cold/hot scenarios"
```

### Task 4.4: `pv` availability check for hot mode

**Files:**
- Modify: `src/main.zig` (inside the hot-scenario branch)

- [ ] **Step 1: Add a pre-spawn check**

Before the hot branch resolves the args, add:

```zig
fn assertPvAvailable(alloc: std.mem.Allocator) void {
    const res = std.process.Child.run(.{
        .allocator = alloc,
        .argv = &.{ "sh", "-c", "command -v pv" },
    }) catch {
        std.debug.print("waystty input-bench hot: `pv` not found. Install with your package manager (e.g. `pacman -S pv`).\n", .{});
        std.process.exit(2);
    };
    alloc.free(res.stdout);
    alloc.free(res.stderr);
    if (res.term != .Exited or res.term.Exited != 0) {
        std.debug.print("waystty input-bench hot: `pv` not found. Install with your package manager (e.g. `pacman -S pv`).\n", .{});
        std.process.exit(2);
    }
}
```

Call it in the hot arm:

```zig
.hot => blk: {
    assertPvAvailable(alloc);
    break :blk &.{ "-c", "yes \"$(printf 'x%.0s' {1..500})\" | pv -qL 24K" };
},
```

- [ ] **Step 2: Manual test without pv**

Run: `PATH=/usr/bin:/bin WAYSTTY_INPUT_BENCH=hot WAYSTTY_BENCH_COLS=80 ./zig-out/bin/waystty 2>/tmp/bench-hot.log || echo "exit $?"`
Expected: If `pv` is present, runs; otherwise exits with 2 and the diagnostic.

- [ ] **Step 3: Commit**

```bash
git add src/main.zig
git commit -m "bench: fail loudly if pv is missing for hot scenario"
```

### Task 4.5: Child teardown (SIGTERM → 100ms → SIGKILL → waitpid)

**Files:**
- Modify: `src/pty.zig`

- [ ] **Step 1: Add a `gracefulTeardown` method on Pty**

```zig
pub fn gracefulTeardown(self: *Pty) void {
    if (self.pid <= 0) return;
    _ = std.posix.kill(self.pid, std.posix.SIG.TERM) catch {};
    // poll waitpid up to 100ms
    var elapsed_ms: u32 = 0;
    while (elapsed_ms < 100) : (elapsed_ms += 10) {
        const res = std.posix.waitpid(self.pid, std.posix.W.NOHANG);
        if (res.pid != 0) return;
        std.Thread.sleep(10 * std.time.ns_per_ms);
    }
    _ = std.posix.kill(self.pid, std.posix.SIG.KILL) catch {};
    _ = std.posix.waitpid(self.pid, 0);
}
```

(Cross-check exact `waitpid` / `WNOHANG` / `SIG.TERM` spellings against Zig stdlib.)

- [ ] **Step 2: Call from deinit or bench scenario switch**

Make `Pty.deinit` call `gracefulTeardown` before closing fds if the child is still running. For `both` scenario switching, expose a public method to call explicitly.

- [ ] **Step 3: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/pty.zig
git commit -m "bench: gracefulTeardown for PTY children (SIGTERM→grace→SIGKILL)"
```

### Task 4.6: Suppress `.key` events in bench mode

**Files:**
- Modify: `src/main.zig:372-399` (keyboard event loop)

- [ ] **Step 1: Gate the `.key` processing**

Replace the loop at `src/main.zig:374-398`:

```zig
        for (keyboard.event_queue.items) |ev| {
            if (ev.action == .release) continue;
            if (bench_input_cfg != null) {
                // Bench mode: drop real keyboard .key events so ambient typing
                // can't perturb measurements. Modifiers/enter/leave/repeat state
                // on the Keyboard struct still update via the listener callbacks.
                continue;
            }
            // ... existing clipboard/paste/encode path
        }
```

(Rewrap the remaining body unchanged under the `else` / after the continue.)

- [ ] **Step 2: Build and smoke-test**

Run: `zig build && WAYSTTY_INPUT_BENCH=cold ./zig-out/bin/waystty 2>/tmp/sm.log &`

Type in the window (focus must be on it); verify no characters appear. Kill.

- [ ] **Step 3: Commit**

```bash
git add src/main.zig
git commit -m "bench: drop real keyboard .key events in input-bench mode"
```

---

## Phase 5 — Sentinel allocator + injection

### Task 5.1: PUA sentinel allocator

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Add SentinelAlloc**

Append to `src/bench_input.zig`:

```zig
pub const SentinelAlloc = struct {
    const PUA_START: u21 = 0xE000;
    const PUA_COUNT: u32 = 4096;

    next: u32 = 0,

    pub fn take(self: *SentinelAlloc) u21 {
        const idx = self.next % PUA_COUNT;
        self.next +%= 1;
        return PUA_START + @as(u21, @intCast(idx));
    }
};

/// Encode a codepoint as UTF-8 into `buf`. Returns the length written.
pub fn encodeCodepoint(cp: u21, buf: *[4]u8) u3 {
    const n = std.unicode.utf8Encode(cp, buf) catch unreachable;
    return @intCast(n);
}

test "SentinelAlloc rotates through 4096 PUA codepoints" {
    var a: SentinelAlloc = .{};
    const first = a.take();
    try std.testing.expectEqual(@as(u21, 0xE000), first);
    for (1..4096) |_| _ = a.take();
    // Next should wrap to 0xE000 again
    try std.testing.expectEqual(@as(u21, 0xE000), a.take());
}

test "encodeCodepoint produces valid 3-byte UTF-8 for PUA" {
    var buf: [4]u8 = undefined;
    const n = encodeCodepoint(0xE000, &buf);
    try std.testing.expectEqual(@as(u3, 3), n);
    try std.testing.expectEqual(@as(u8, 0xEE), buf[0]);
    try std.testing.expectEqual(@as(u8, 0x80), buf[1]);
    try std.testing.expectEqual(@as(u8, 0x80), buf[2]);
}
```

- [ ] **Step 2: Run tests**

Run: `zig build test 2>&1 | tail`
Expected: both tests PASS.

- [ ] **Step 3: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: SentinelAlloc and encodeCodepoint helpers"
```

### Task 5.2: Fabricate `KeyEvent` injector

**Files:**
- Modify: `src/bench_input.zig`
- Review: `src/wayland.zig` (Keyboard.KeyEvent type)

- [ ] **Step 1: Inspect KeyEvent type**

Run: `grep -n "pub const KeyEvent\|KeyEvent = struct\|action:.*\\.\\(press\\|release\\)\|utf8:" src/wayland.zig | head -10`

Note the exact field set — likely something like:
```zig
pub const KeyEvent = struct {
    action: enum { press, release },
    keysym: u32,
    serial: u32,
    utf8: [16]u8,
    utf8_len: u8,
    // ... possibly mods
};
```

- [ ] **Step 2: Add `injectSentinel` in bench_input.zig**

```zig
const wayland_client = @import("wayland-client");

pub fn injectSentinel(
    keyboard: *wayland_client.Keyboard,
    sentinel_cp: u21,
) !void {
    var utf8: [16]u8 = @splat(0);
    var enc: [4]u8 = undefined;
    const n = encodeCodepoint(sentinel_cp, &enc);
    @memcpy(utf8[0..n], enc[0..n]);

    const ev = wayland_client.Keyboard.KeyEvent{
        .action = .press,
        .keysym = 0,
        .serial = 0,
        .utf8 = utf8,
        .utf8_len = n,
    };
    try keyboard.event_queue.append(ev);
}
```

(Adjust field names to the exact struct — fill with zeros for any required fields not shown above.)

- [ ] **Step 3: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: injectSentinel pushes fabricated KeyEvent onto queue"
```

---

## Phase 6 — Pair-on-arrival matching

### Task 6.1: `Sample` and `BenchDriver` skeletons

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Add Sample + state**

```zig
pub const Sample = struct {
    sentinel: u21,
    t_inject_ns: u64,
    injected_frame: u64,
    grid_seen_frame: ?u64 = null,
    presented_ns: ?u64 = null,
    timed_out: bool = false,

    pub fn complete(self: Sample) bool {
        return self.timed_out or (self.grid_seen_frame != null and self.presented_ns != null);
    }

    pub fn latencyNs(self: Sample) ?u64 {
        const p = self.presented_ns orelse return null;
        return p - self.t_inject_ns;
    }
};

pub const SampleBuffer = struct {
    const cap = 2000; // 2 scenarios × 500 samples + headroom
    items: [cap]Sample = undefined,
    count: usize = 0,

    pub fn push(self: *SampleBuffer, s: Sample) void {
        if (self.count < cap) {
            self.items[self.count] = s;
            self.count += 1;
        }
    }
};
```

- [ ] **Step 2: Build + test**

Run: `zig build test 2>&1 | tail -5`
Expected: no new test failures.

- [ ] **Step 3: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: Sample and SampleBuffer data types"
```

### Task 6.2: BenchDriver struct + tick entry points

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Add driver**

```zig
pub const Phase = enum { idle, running, done };

pub const BenchDriver = struct {
    cfg: Config,
    alloc: std.mem.Allocator,
    sentinels: SentinelAlloc = .{},
    in_flight: ?Sample = null,
    samples: SampleBuffer = .{},
    pending_feedback: std.AutoArrayHashMapUnmanaged(u64, u64) = .{}, // frame_counter -> presented_ns
    current_phase: Phase = .idle,
    current_scenario: Scenario = .cold,
    scenario_sample_count: u32 = 0,
    early_timeouts: u32 = 0,
    early_samples: u32 = 0,

    pub fn init(alloc: std.mem.Allocator, cfg: Config) BenchDriver {
        return .{
            .cfg = cfg,
            .alloc = alloc,
            .current_scenario = switch (cfg.scenario) {
                .cold, .both => .cold,
                .hot => .hot,
            },
        };
    }

    pub fn deinit(self: *BenchDriver) void {
        self.pending_feedback.deinit(self.alloc);
    }

    /// Called before processing keyboard events; decides whether to inject.
    pub fn preTick(
        self: *BenchDriver,
        keyboard: *wayland_client.Keyboard,
        frame_counter: u64,
    ) !void {
        if (self.current_phase != .running) return;
        if (self.in_flight != null) return;

        const sentinel = self.sentinels.take();
        try injectSentinel(keyboard, sentinel);
        self.in_flight = .{
            .sentinel = sentinel,
            .t_inject_ns = @intCast(std.time.Instant.now().timestamp),
            .injected_frame = frame_counter,
        };
    }

    /// Called after term.snapshot — scan the grid for the active sentinel.
    pub fn postFrameGridScan(
        self: *BenchDriver,
        frame_counter: u64,
        grid_contains_sentinel: bool,
    ) void {
        if (self.in_flight == null) return;
        var s = &self.in_flight.?;
        if (s.grid_seen_frame != null) return;
        if (grid_contains_sentinel) {
            s.grid_seen_frame = frame_counter;
        } else if (frame_counter - s.injected_frame >= self.cfg.max_frames_per_sample) {
            s.timed_out = true;
            self.finalizeSample();
        }
    }

    /// Called from the presentation feedback callback.
    pub fn recordPresented(self: *BenchDriver, frame_counter: u64, presented_ns: u64) void {
        _ = self.pending_feedback.put(self.alloc, frame_counter, presented_ns) catch return;
        self.tryFinalize();
    }

    /// Called on presentation-feedback discarded event (no-op; we simply keep waiting).
    pub fn recordDiscarded(self: *BenchDriver, frame_counter: u64) void {
        _ = self;
        _ = frame_counter;
    }

    fn tryFinalize(self: *BenchDriver) void {
        if (self.in_flight == null) return;
        const s = self.in_flight.?;
        const gsf = s.grid_seen_frame orelse return;
        const p = self.pending_feedback.get(gsf) orelse return;
        self.in_flight.?.presented_ns = p;
        self.finalizeSample();
    }

    fn finalizeSample(self: *BenchDriver) void {
        const sample = self.in_flight.?;
        self.in_flight = null;
        self.samples.push(sample);
        self.scenario_sample_count += 1;

        // WSI fallback detection: if >10% of first 50 time out, abort.
        if (self.early_samples < 50) {
            self.early_samples += 1;
            if (sample.timed_out) self.early_timeouts += 1;
            if (self.early_samples == 50 and self.early_timeouts > 5) {
                std.debug.print(
                    "waystty input-bench: {d}/50 early samples timed out. " ++
                    "Likely wp_presentation.feedback commit race with Mesa WSI. " ++
                    "Investigate VK_KHR_present_wait as an alternative.\n",
                    .{self.early_timeouts},
                );
                std.process.exit(3);
            }
        }

        if (self.scenario_sample_count >= self.cfg.samples_per_scenario) {
            self.advanceScenario();
        }
    }

    fn advanceScenario(self: *BenchDriver) void {
        if (self.cfg.scenario == .both and self.current_scenario == .cold) {
            self.current_scenario = .hot;
            self.scenario_sample_count = 0;
            // main.zig's scenario sequencer will respawn the child
            self.current_phase = .idle; // pauses until sequencer re-arms
        } else {
            self.current_phase = .done;
        }
    }

    pub fn start(self: *BenchDriver) void {
        self.current_phase = .running;
    }

    pub fn finished(self: *const BenchDriver) bool {
        return self.current_phase == .done;
    }
};
```

- [ ] **Step 2: Build**

Run: `zig build`
Expected: PASS. Fix minor syntax issues (e.g., `std.time.Instant` API shape).

- [ ] **Step 3: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: BenchDriver skeleton with pre/post-tick entry points"
```

### Task 6.3: Populate pre-present hook to request feedback

**Files:**
- Modify: `src/main.zig`

- [ ] **Step 1: Replace the placeholder `benchPrePresentHook`**

```zig
fn benchPrePresentHook(opaque_ctx: ?*anyopaque) void {
    const driver: *bench_input.BenchDriver = @ptrCast(@alignCast(opaque_ctx orelse return));
    if (driver.current_phase != .running) return;

    // Request feedback on the upcoming commit, associate with the *next* frame_counter
    // (the one about to be rendered).
    const fc = driver.next_expected_frame orelse return;
    const feedback_ctx = blk: {
        const ctx = alloc_g.create(FeedbackCtx) catch return;
        ctx.* = .{ .driver = driver, .frame_counter = fc };
        break :blk ctx;
    };
    _ = wayland_client.PresentationFeedback.init(
        globals_g.wp_presentation.?,
        surface_g,
        feedback_ctx,
        &onPresentationFeedback,
    ) catch return;
}

const FeedbackCtx = struct { driver: *bench_input.BenchDriver, frame_counter: u64 };

fn onPresentationFeedback(
    opaque_ctx: ?*anyopaque,
    ev: wayland_client.PresentationFeedback.Event,
) void {
    const ctx: *FeedbackCtx = @ptrCast(@alignCast(opaque_ctx orelse return));
    defer alloc_g.destroy(ctx);
    switch (ev) {
        .presented => |p| {
            const ns: u64 = p.tv_sec * std.time.ns_per_s + p.tv_nsec;
            ctx.driver.recordPresented(ctx.frame_counter, ns);
        },
        .discarded => ctx.driver.recordDiscarded(ctx.frame_counter),
    }
}
```

(`alloc_g`, `globals_g`, `surface_g` are file-scoped globals initialized in `main()`; declare them at the top of `main.zig` and assign during init. If the codebase prefers to avoid globals, thread the context through the hook's opaque pointer as a `struct { driver, alloc, globals, surface }` instead.)

- [ ] **Step 2: Add `next_expected_frame` field and increment logic to `BenchDriver`**

In `src/bench_input.zig`:

```zig
next_expected_frame: ?u64 = null,
```

In `preTick`, after injecting, set:

```zig
self.next_expected_frame = frame_counter; // this frame is the candidate
```

And after each frame renders (called from the main loop), bump:

```zig
pub fn notifyFramePresented(self: *BenchDriver, frame_counter: u64) void {
    self.next_expected_frame = frame_counter + 1;
}
```

- [ ] **Step 3: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/main.zig src/bench_input.zig
git commit -m "bench: wire pre-present hook to wp_presentation_feedback"
```

### Task 6.4: Grid scan after each frame

**Files:**
- Modify: `src/main.zig` (around the `term.snapshot` call site)

- [ ] **Step 1: Find snapshot site**

Run: `grep -n "term.snapshot" src/main.zig`

- [ ] **Step 2: After snapshot, if in bench mode, scan for sentinel**

Within the render branch, immediately after the snapshot is taken:

```zig
    if (bench_driver_ptr) |drv| {
        if (drv.in_flight) |s| {
            const found = gridContainsCodepoint(&snapshot_view, s.sentinel);
            drv.postFrameGridScan(frame_counter, found);
        }
    }
```

Add a helper (place near other snapshot utilities):

```zig
fn gridContainsCodepoint(snap: *const vt.Snapshot, cp: u21) bool {
    // Iterate every visible cell and compare codepoint. Implementation depends
    // on the Snapshot API — use the existing row-iteration pattern.
    for (snap.rows) |row| {
        for (row.cells) |cell| {
            if (cell.codepoint == cp) return true;
        }
    }
    return false;
}
```

(If `Snapshot.rows[*].cells[*].codepoint` has a different shape, adapt to match. Grep for `.snapshot()` in vt.zig and follow the existing walking pattern.)

- [ ] **Step 3: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 4: Commit**

```bash
git add src/main.zig
git commit -m "bench: scan rendered frame grid for sentinel codepoint"
```

### Task 6.5: Pre-tick injection wired from main loop

**Files:**
- Modify: `src/main.zig` (keyboard events block)

- [ ] **Step 1: Call `driver.preTick` at the top of each main-loop iteration**

Immediately before the existing `keyboard.tickRepeat()` call (around `src/main.zig:373`):

```zig
        if (bench_driver_ptr) |drv| {
            try drv.preTick(&keyboard, frame_counter);
        }
```

- [ ] **Step 2: Declare & initialize `bench_driver_ptr`**

Near the other `var` declarations before the main loop:

```zig
    var bench_driver_storage: ?bench_input.BenchDriver = if (bench_input_cfg) |cfg|
        bench_input.BenchDriver.init(alloc, cfg)
    else
        null;
    defer if (bench_driver_storage) |*d| d.deinit();
    const bench_driver_ptr: ?*bench_input.BenchDriver = if (bench_driver_storage) |*d| d else null;
    if (bench_driver_ptr) |d| d.start();
```

- [ ] **Step 3: Build and run cold smoke**

Run: `zig build && WAYSTTY_INPUT_BENCH=cold ./zig-out/bin/waystty 2>/tmp/bench-in.log`

In a floating window: should run, inject sentinels, and eventually print "done" and exit. If it hangs, check logs for `early_timeouts` diagnostic.

- [ ] **Step 4: Commit**

```bash
git add src/main.zig
git commit -m "bench: inject sentinels per iteration from BenchDriver.preTick"
```

### Task 6.6: Unit test the pair-on-arrival state machine

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Add tests**

```zig
test "BenchDriver completes sample: grid first, then feedback" {
    const cfg: Config = .{ .scenario = .cold, .samples_per_scenario = 1 };
    var d: BenchDriver = .init(std.testing.allocator, cfg);
    defer d.deinit();
    d.start();
    d.in_flight = .{ .sentinel = 0xE000, .t_inject_ns = 1000, .injected_frame = 10 };
    d.postFrameGridScan(10, true);
    try std.testing.expect(d.in_flight != null);
    d.recordPresented(10, 5000);
    try std.testing.expectEqual(@as(usize, 1), d.samples.count);
    try std.testing.expectEqual(@as(u64, 4000), d.samples.items[0].latencyNs().?);
}

test "BenchDriver completes sample: feedback first, then grid" {
    const cfg: Config = .{ .scenario = .cold, .samples_per_scenario = 1 };
    var d: BenchDriver = .init(std.testing.allocator, cfg);
    defer d.deinit();
    d.start();
    d.in_flight = .{ .sentinel = 0xE000, .t_inject_ns = 1000, .injected_frame = 10 };
    d.recordPresented(11, 5500);
    d.postFrameGridScan(11, true);
    try std.testing.expectEqual(@as(usize, 1), d.samples.count);
    try std.testing.expectEqual(@as(u64, 4500), d.samples.items[0].latencyNs().?);
}

test "BenchDriver times out after max_frames_per_sample" {
    const cfg: Config = .{ .scenario = .cold, .samples_per_scenario = 1, .max_frames_per_sample = 5 };
    var d: BenchDriver = .init(std.testing.allocator, cfg);
    defer d.deinit();
    d.start();
    d.in_flight = .{ .sentinel = 0xE000, .t_inject_ns = 1000, .injected_frame = 10 };
    var f: u64 = 10;
    while (f <= 15) : (f += 1) d.postFrameGridScan(f, false);
    try std.testing.expectEqual(@as(usize, 1), d.samples.count);
    try std.testing.expect(d.samples.items[0].timed_out);
}
```

- [ ] **Step 2: Run**

Run: `zig build test 2>&1 | tail -10`
Expected: all three new tests PASS.

- [ ] **Step 3: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: test pair-on-arrival state machine and timeout"
```

---

## Phase 7 — Scenario sequencer + Makefile target

### Task 7.1: Restart PTY child between cold and hot

**Files:**
- Modify: `src/main.zig`

- [ ] **Step 1: Detect scenario completion + respawn**

Inside the main loop, after calling `drv.postFrameGridScan` or near the end of the loop body:

```zig
        if (bench_driver_ptr) |drv| {
            if (drv.current_phase == .idle and drv.cfg.scenario == .both and drv.current_scenario == .hot) {
                // Transition cold -> hot: teardown existing child, spawn the hot one.
                p.gracefulTeardown();
                p.deinit();
                assertPvAvailable(alloc);
                p = try pty.Pty.spawn(.{
                    .cols = cols,
                    .rows = rows,
                    .shell = shell_plan.shell,
                    .shell_args = &.{ "-c", "yes \"$(printf 'x%.0s' {1..500})\" | pv -qL 24K" },
                });
                try pty.Pty.ensureEcho(p.slave_fd);
                term.setWritePtyCallback(&p, &writePtyFromTerminal);
                drv.start();
            }
            if (drv.finished()) {
                // Print stats and exit — see Task 8.1
                bench_input.printStats(drv, cols, rows);
                return;
            }
        }
```

- [ ] **Step 2: Build**

Run: `zig build`
Expected: PASS.

- [ ] **Step 3: Commit**

```bash
git add src/main.zig
git commit -m "bench: restart PTY child between cold and hot scenarios"
```

### Task 7.2: `bench-input` Makefile target

**Files:**
- Modify: `Makefile`

- [ ] **Step 1: Add target**

After the `bench` target, add:

```makefile
# Expected runtime: ~15s cold + ~25s hot = ~40s total
# Requires: pv (for hot-mode rate limiting)
bench-input:
	$(ZIG) build -Doptimize=$(OPT)
	WAYSTTY_INPUT_BENCH=both ./zig-out/bin/waystty 2>bench-input.log || true
	@echo "--- input latency ---"
	@grep -A 20 "waystty input latency" bench-input.log || echo "(no timing data found)"
```

Also append `bench-input` to the `.PHONY` line.

- [ ] **Step 2: Commit**

```bash
git add Makefile
git commit -m "bench: add bench-input Makefile target"
```

---

## Phase 8 — Output

### Task 8.1: Print stats with grid header + per-scenario rows

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Add `printStats`**

```zig
pub fn printStats(drv: *const BenchDriver, cols: u16, rows: u16) void {
    // Split samples by scenario. With current design, cold samples are pushed
    // first, then hot — track via scenario transition.
    // For simplicity in v1, we tag samples with their scenario at push time:
    // TODO: add `scenario` field to Sample — see Task 8.1 step 2.
    _ = drv;
    _ = cols;
    _ = rows;
}
```

**Wait** — the `Sample` struct as defined in Task 6.1 doesn't carry scenario. Fix before proceeding:

- [ ] **Step 2: Add `scenario: Scenario` to Sample**

Edit Task 6.1's Sample struct:

```zig
pub const Sample = struct {
    scenario: Scenario,
    sentinel: u21,
    t_inject_ns: u64,
    injected_frame: u64,
    grid_seen_frame: ?u64 = null,
    presented_ns: ?u64 = null,
    timed_out: bool = false,
    // ... rest unchanged
};
```

In `preTick`, when constructing the in-flight sample:

```zig
self.in_flight = .{
    .scenario = self.current_scenario,
    // ... existing
};
```

- [ ] **Step 3: Implement printStats**

```zig
pub fn printStats(drv: *const BenchDriver, cols: u16, rows: u16) void {
    var cold_buf: [2000]u64 = undefined;
    var hot_buf: [2000]u64 = undefined;
    var cold_to: u32 = 0;
    var hot_to: u32 = 0;
    var cold_n: usize = 0;
    var hot_n: usize = 0;
    for (drv.samples.items[0..drv.samples.count]) |s| {
        if (s.timed_out) {
            switch (s.scenario) {
                .cold => cold_to += 1,
                .hot => hot_to += 1,
                .both => unreachable,
            }
            continue;
        }
        const lat = s.latencyNs() orelse continue;
        switch (s.scenario) {
            .cold => { cold_buf[cold_n] = lat; cold_n += 1; },
            .hot  => { hot_buf[hot_n]  = lat; hot_n  += 1; },
            .both => unreachable,
        }
    }
    std.debug.print(
        "\n=== waystty input latency ({d} cold, {d} hot, {d}x{d} grid) ===\n",
        .{ cold_n, hot_n, cols, rows },
    );
    std.debug.print("{s:<10}{s:>8}{s:>8}{s:>8}{s:>8}{s:>8}   (us)  timeouts\n",
        .{ "scenario", "min", "avg", "p50", "p99", "max" });
    printRow("cold", cold_buf[0..cold_n], cold_to);
    printRow("hot",  hot_buf[0..hot_n],  hot_to);
}

fn printRow(label: []const u8, vals: []u64, timeouts: u32) void {
    if (vals.len == 0) {
        std.debug.print("{s:<10}  (no samples)  timeouts {d}\n", .{ label, timeouts });
        return;
    }
    std.mem.sort(u64, vals, {}, std.sort.asc(u64));
    var sum: u128 = 0;
    for (vals) |v| sum += v;
    const avg = @as(u64, @intCast(sum / vals.len));
    const p50_idx = vals.len / 2;
    const p99_idx = (vals.len * 99) / 100;
    std.debug.print(
        "{s:<10}{d:>8}{d:>8}{d:>8}{d:>8}{d:>8}           {d}\n",
        .{ label, vals[0] / 1000, avg / 1000, vals[p50_idx] / 1000, vals[p99_idx] / 1000, vals[vals.len - 1] / 1000, timeouts },
    );
}
```

- [ ] **Step 4: Build and run full cycle**

Run: `zig build && WAYSTTY_INPUT_BENCH=both ./zig-out/bin/waystty 2>/tmp/in-full.log; tail -20 /tmp/in-full.log`
Expected: output matches the spec's sample stats block.

- [ ] **Step 5: Commit**

```bash
git add src/bench_input.zig
git commit -m "bench: printStats for input-latency with per-scenario rows"
```

### Task 8.2 (post-headline): Per-stage breakdown for p99 samples

**Files:**
- Modify: `src/bench_input.zig`

- [ ] **Step 1: Join samples with `FrameTiming`**

Add to `bench_input.zig`:

```zig
pub fn printP99Breakdown(
    drv: *const BenchDriver,
    ring: *const bench_stats.FrameTimingRing,
    scenario: Scenario,
) void {
    // Find p99 sample for the given scenario by latency
    var best_idx: ?usize = null;
    var best_lat: u64 = 0;
    for (drv.samples.items[0..drv.samples.count], 0..) |s, i| {
        if (s.scenario != scenario) continue;
        const lat = s.latencyNs() orelse continue;
        if (lat > best_lat) { best_lat = lat; best_idx = i; }
    }
    const idx = best_idx orelse return;
    const sample = drv.samples.items[idx];
    const frame = sample.grid_seen_frame orelse return;

    // Find timing entry with that frame_counter
    var ordered: [bench_stats.FrameTimingRing.capacity]bench_stats.FrameTiming = undefined;
    const entries = ring.orderedSlice(&ordered);
    for (entries) |ft| {
        if (ft.frame_counter == frame) {
            std.debug.print(
                "\np99 {s} breakdown (latency {d}us, frame {d}):\n" ++
                "  snapshot {d}, row_rebuild {d}, atlas_upload {d}, instance_upload {d}, gpu_submit {d}\n",
                .{ @tagName(scenario), best_lat / 1000, frame,
                   ft.snapshot_us, ft.row_rebuild_us, ft.atlas_upload_us,
                   ft.instance_upload_us, ft.gpu_submit_us },
            );
            return;
        }
    }
    std.debug.print("(p99 frame {d} already evicted from timing ring)\n", .{ frame });
}
```

- [ ] **Step 2: Call from main after `printStats`**

```zig
bench_input.printStats(drv, cols, rows);
bench_input.printP99Breakdown(drv, &frame_ring, .cold);
bench_input.printP99Breakdown(drv, &frame_ring, .hot);
```

- [ ] **Step 3: Build and run**

Run: `zig build && WAYSTTY_INPUT_BENCH=both ./zig-out/bin/waystty 2>/tmp/in.log; tail -25 /tmp/in.log`
Expected: breakdown lines appear after the main table.

- [ ] **Step 4: Commit**

```bash
git add src/bench_input.zig src/main.zig
git commit -m "bench: p99 per-stage breakdown joined on frame_counter"
```

---

## Phase 9 — Final smoke test + docs

### Task 9.1: End-to-end smoke on floating window

**Files:** none modified.

- [ ] **Step 1: Launch a floating waystty on sway**

```bash
# Ensure sway config has rule: `for_window [app_id="waystty"] floating enable`
zig build -Doptimize=ReleaseFast
WAYSTTY_INPUT_BENCH=both ./zig-out/bin/waystty 2>bench-input.log
```

Expected: runs for ~40s, prints stats with grid=80×24, cold < hot, both with low timeouts (< 5%).

- [ ] **Step 2: Record baseline numbers**

Commit the output to a freeform note or just eyeball for sanity: cold p50 on the order of one refresh interval (~16ms); hot p99 > cold p99.

### Task 9.2: Run full test suite

- [ ] **Step 1: All tests pass**

Run: `zig build test 2>&1 | tail -30`
Expected: every test passes.

- [ ] **Step 2: Existing bench still works**

Run: `make bench`
Expected: output includes the new grid-size line; no regression in numbers.

- [ ] **Step 3: Commit any final adjustments**

```bash
git add -u
git commit -m "bench: final polish"
```

---

## Self-review notes

**Spec coverage:**

- Goal / cold + hot metrics → Tasks 4.3, 7.1, 8.1.
- `wp_presentation_time` endpoint → Phase 3 + Task 6.3.
- In-process KeyEvent injection → Task 5.2.
- Echo-gated closed loop → Task 6.2 (in_flight guard).
- MAILBOX preserved → no change made (default stays).
- PUA sentinels → Task 5.1.
- Fixed grid (shared) → Phase 1.
- Termios ECHO → Task 4.2.
- Child teardown → Task 4.5.
- WSI fallback → Task 6.2 (finalizeSample's early_timeouts check).
- Frame-counter correlation → Phase 2 + Task 8.2.
- Pair-on-arrival + discarded handling → Tasks 6.1, 6.2, 6.3 (driver doesn't advance on discarded; keeps listening).

**Dependencies between tasks:** Phase 3 blocks Phase 6.3. Phase 4 depends on Phase 1 (for env vars). All others are straightforward linear.

**Compositor compatibility:** The whole bench assumes a compositor that (a) honors `xdg_toplevel` size hints for floating surfaces, and (b) implements `wp_presentation_time`. sway does both. Compositors that don't will fail at Task 1.3 (size mismatch) or Task 3.4 (global missing) — both with clear diagnostics.