docs/superpowers/plans/2026-04-17-gpu-render-testing.md

Ref: Size: 75.8 KiB
# GPU Render Testing Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Add an automated pipeline that renders VT scripts through the real Vulkan pipeline, captures the output via an offscreen image readback, diffs it against golden PNGs, and flags both visual and performance regressions. All local-only (no CI).

**Architecture:** A `--capture <script> <output.png>` mode in waystty renders a scripted terminal session into a dedicated offscreen `VkImage`, reads that image back to host memory, and writes it as PNG. A `imgdiff` tool does RMSE + per-pixel-max comparison. A `test-render` orchestrator iterates golden scripts. A `bench-baseline` / `bench-check` pair compares p99 frame timings against a stored baseline.

**Tech Stack:** Zig 0.15+, Vulkan 1.2 (via vulkan-zig bindings), zig-wayland, vendored minimal PNG codec. No new external deps.

**Spec:** `docs/superpowers/specs/2026-04-17-gpu-render-testing-design.md`

---

## File Structure

New files:
- `src/png.zig` — vendored minimal RGBA8 PNG encoder/decoder (used by renderer + imgdiff)
- `src/capture.zig` — capture mode flow (visibility wait, drain, settle, render, readback, PNG write)
- `src/tools/imgdiff.zig` — standalone executable: compare two PNGs
- `src/tools/test_render.zig` — standalone executable: orchestrate capture + diff across all scripts
- `src/tools/bench_baseline.zig` — standalone executable: run bench workload, write/compare baseline.json
- `src/bench_stats.zig` — shared JSON serialization for FrameTimingStats (used by runTerminal and bench_baseline)
- `tests/golden/scripts/basic_ascii.vt`
- `tests/golden/scripts/bold_colors.vt`
- `tests/golden/scripts/box_drawing.vt`
- `tests/golden/reference/*.png` (committed after Task 8)
- `tests/bench/baseline.json` (committed after Task 10)

Modified files:
- `src/renderer.zig` — add `OffscreenTarget` struct + `renderToOffscreen()` + `readbackOffscreen()`
- `src/main.zig` — dispatch `--capture` arg to `capture.run`; extract `computeFrameStats` + `printFrameStats` to share with bench tools (expose via `bench_stats.zig`)
- `build.zig` — new modules (png, capture, bench_stats), new executables (imgdiff, test_render, bench_baseline), new build steps
- `Makefile` — new targets (`test-render`, `golden-update`, `bench-baseline`, `bench-check`)
- `.gitignore` — ignore stray `test_io*`/`test_sig`/`test_timer` binaries and `tests/golden/output/`

---

## Task 1: Vendored PNG codec (`src/png.zig`)

Minimal RGBA8 PNG encoder + decoder. Used by renderer (write capture output) and imgdiff (read reference + actual). Writing zlib/DEFLATE from scratch is tedious, but Zig ships `std.compress.flate` which handles DEFLATE, and the PNG IDAT chunk is a zlib stream (DEFLATE + adler32 + zlib header). We use `std.compress.flate` for the DEFLATE core and wrap it with the zlib header + adler32 ourselves, plus CRC32 for chunk framing.

**Files:**
- Create: `src/png.zig`
- Create: `src/png_test.zig` (only if a separate test file helps; otherwise tests inline)
- Modify: `build.zig` (add png module)

- [ ] **Step 1: Write the failing test for encode→decode round-trip**

Create `src/png.zig`:

```zig
const std = @import("std");

pub const Image = struct {
    width: u32,
    height: u32,
    pixels: []u8, // RGBA8, row-major, width*height*4 bytes

    pub fn deinit(self: *Image, alloc: std.mem.Allocator) void {
        alloc.free(self.pixels);
        self.* = undefined;
    }
};

pub const EncodeError = error{ OutOfMemory, WriteFailed };
pub const DecodeError = error{
    OutOfMemory,
    InvalidPng,
    UnsupportedPng, // only RGBA8 non-interlaced is supported
    CorruptChunk,
};

pub fn encode(alloc: std.mem.Allocator, img: Image, writer: anytype) EncodeError!void {
    _ = alloc;
    _ = img;
    _ = writer;
    return error.WriteFailed; // placeholder — Step 3 replaces this
}

pub fn decode(alloc: std.mem.Allocator, bytes: []const u8) DecodeError!Image {
    _ = alloc;
    _ = bytes;
    return error.InvalidPng; // placeholder — Step 3 replaces this
}

test "encode then decode roundtrip recovers pixels" {
    const alloc = std.testing.allocator;
    var src_pixels = [_]u8{
        0xff, 0x00, 0x00, 0xff,  0x00, 0xff, 0x00, 0xff,
        0x00, 0x00, 0xff, 0xff,  0xff, 0xff, 0xff, 0xff,
    };
    const src = Image{ .width = 2, .height = 2, .pixels = &src_pixels };

    var buf = std.ArrayList(u8).init(alloc);
    defer buf.deinit();
    try encode(alloc, src, buf.writer());

    var decoded = try decode(alloc, buf.items);
    defer decoded.deinit(alloc);

    try std.testing.expectEqual(@as(u32, 2), decoded.width);
    try std.testing.expectEqual(@as(u32, 2), decoded.height);
    try std.testing.expectEqualSlices(u8, &src_pixels, decoded.pixels);
}
```

Add to `build.zig` (after the `renderer_test_mod` block near the end of the file):

```zig
    // png module — vendored minimal RGBA8 PNG codec
    const png_mod = b.createModule(.{
        .root_source_file = b.path("src/png.zig"),
        .target = target,
        .optimize = optimize,
    });
    exe_mod.addImport("png", png_mod);

    const png_test_mod = b.createModule(.{
        .root_source_file = b.path("src/png.zig"),
        .target = target,
        .optimize = optimize,
    });
    const png_tests = b.addTest(.{ .root_module = png_test_mod });
    test_step.dependOn(&b.addRunArtifact(png_tests).step);
```

- [ ] **Step 2: Run the test to confirm it fails**

Run: `zig build test 2>&1 | grep -E "png|roundtrip"`
Expected: failure mentioning `error.WriteFailed` or `error.InvalidPng`.

- [ ] **Step 3: Implement PNG encoder**

Replace the `encode` function in `src/png.zig` with a real implementation. The PNG structure is:

```
signature:   8 bytes  89 50 4E 47 0D 0A 1A 0A
IHDR chunk:  13 bytes payload: width(4) height(4) bitdepth=8 colortype=6(RGBA) compression=0 filter=0 interlace=0
IDAT chunk:  zlib-wrapped DEFLATE of (filter_byte=0 + row_pixels) per row
IEND chunk:  empty payload
```

Each chunk is: length(4) type(4) data(length) crc32(4).
The zlib wrapping is: header(2) deflated_data adler32(4).

Put the full implementation in place:

```zig
const signature = [_]u8{ 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A };

fn crc32(data: []const u8) u32 {
    var crc = std.hash.Crc32.init();
    crc.update(data);
    return crc.final();
}

fn adler32(data: []const u8) u32 {
    var a: u32 = 1;
    var b: u32 = 0;
    for (data) |byte| {
        a = (a + byte) % 65521;
        b = (b + a) % 65521;
    }
    return (b << 16) | a;
}

fn writeChunk(writer: anytype, chunk_type: *const [4]u8, payload: []const u8) EncodeError!void {
    writer.writeInt(u32, @intCast(payload.len), .big) catch return error.WriteFailed;
    writer.writeAll(chunk_type) catch return error.WriteFailed;
    writer.writeAll(payload) catch return error.WriteFailed;
    // CRC is over chunk_type + payload
    var crc = std.hash.Crc32.init();
    crc.update(chunk_type);
    crc.update(payload);
    writer.writeInt(u32, crc.final(), .big) catch return error.WriteFailed;
}

pub fn encode(alloc: std.mem.Allocator, img: Image, writer: anytype) EncodeError!void {
    std.debug.assert(img.pixels.len == @as(usize, img.width) * img.height * 4);

    writer.writeAll(&signature) catch return error.WriteFailed;

    // IHDR
    var ihdr: [13]u8 = undefined;
    std.mem.writeInt(u32, ihdr[0..4], img.width, .big);
    std.mem.writeInt(u32, ihdr[4..8], img.height, .big);
    ihdr[8] = 8;  // bit depth
    ihdr[9] = 6;  // color type = RGBA
    ihdr[10] = 0; // compression
    ihdr[11] = 0; // filter
    ihdr[12] = 0; // interlace
    try writeChunk(writer, "IHDR", &ihdr);

    // Build filtered rows: one filter byte (0 = none) followed by each row
    const row_bytes = @as(usize, img.width) * 4;
    const filtered_len = (row_bytes + 1) * img.height;
    const filtered = alloc.alloc(u8, filtered_len) catch return error.OutOfMemory;
    defer alloc.free(filtered);

    var y: u32 = 0;
    while (y < img.height) : (y += 1) {
        const src_off = @as(usize, y) * row_bytes;
        const dst_off = @as(usize, y) * (row_bytes + 1);
        filtered[dst_off] = 0;
        @memcpy(filtered[dst_off + 1 .. dst_off + 1 + row_bytes], img.pixels[src_off .. src_off + row_bytes]);
    }

    // Compress with DEFLATE, wrap with zlib header + adler32
    var compressed = std.ArrayList(u8).init(alloc);
    defer compressed.deinit();
    // zlib header: 0x78 0x01 (deflate, no preset dict, fastest)
    compressed.appendSlice(&.{ 0x78, 0x01 }) catch return error.OutOfMemory;

    var fbs = std.io.fixedBufferStream(filtered);
    std.compress.flate.deflate.compress(.raw, fbs.reader(), compressed.writer(), .{}) catch return error.WriteFailed;

    var adler_bytes: [4]u8 = undefined;
    std.mem.writeInt(u32, &adler_bytes, adler32(filtered), .big);
    compressed.appendSlice(&adler_bytes) catch return error.OutOfMemory;

    try writeChunk(writer, "IDAT", compressed.items);
    try writeChunk(writer, "IEND", &.{});
}
```

Note: the exact `std.compress.flate` API may differ slightly across Zig versions. If it does, substitute the equivalent raw-deflate writer in current Zig 0.15+ std. The rest of the PNG framing is version-independent.

- [ ] **Step 4: Implement PNG decoder**

Replace `decode` in `src/png.zig`:

```zig
pub fn decode(alloc: std.mem.Allocator, bytes: []const u8) DecodeError!Image {
    if (bytes.len < signature.len + 8) return error.InvalidPng;
    if (!std.mem.eql(u8, bytes[0..signature.len], &signature)) return error.InvalidPng;

    var cursor: usize = signature.len;
    var width: u32 = 0;
    var height: u32 = 0;
    var idat_accum = std.ArrayList(u8).init(alloc);
    defer idat_accum.deinit();
    var seen_ihdr = false;
    var seen_iend = false;

    while (cursor + 8 <= bytes.len and !seen_iend) {
        const len = std.mem.readInt(u32, bytes[cursor..][0..4], .big);
        cursor += 4;
        const ctype = bytes[cursor..][0..4];
        cursor += 4;
        if (cursor + len + 4 > bytes.len) return error.CorruptChunk;
        const payload = bytes[cursor..][0..len];
        cursor += len;
        // skip CRC (4 bytes); we trust the file
        cursor += 4;

        if (std.mem.eql(u8, ctype, "IHDR")) {
            if (payload.len != 13) return error.InvalidPng;
            width = std.mem.readInt(u32, payload[0..4], .big);
            height = std.mem.readInt(u32, payload[4..8], .big);
            if (payload[8] != 8 or payload[9] != 6 or payload[12] != 0)
                return error.UnsupportedPng;
            seen_ihdr = true;
        } else if (std.mem.eql(u8, ctype, "IDAT")) {
            if (!seen_ihdr) return error.InvalidPng;
            idat_accum.appendSlice(payload) catch return error.OutOfMemory;
        } else if (std.mem.eql(u8, ctype, "IEND")) {
            seen_iend = true;
        }
        // ignore other chunks
    }

    if (!seen_ihdr or !seen_iend) return error.InvalidPng;

    // Strip zlib header (2 bytes) + adler32 (last 4 bytes)
    if (idat_accum.items.len < 6) return error.InvalidPng;
    const deflate_data = idat_accum.items[2 .. idat_accum.items.len - 4];

    const row_bytes = @as(usize, width) * 4;
    const filtered_len = (row_bytes + 1) * height;
    const filtered = alloc.alloc(u8, filtered_len) catch return error.OutOfMemory;
    defer alloc.free(filtered);

    var src_fbs = std.io.fixedBufferStream(deflate_data);
    var dst_fbs = std.io.fixedBufferStream(filtered);
    std.compress.flate.deflate.decompress(.raw, src_fbs.reader(), dst_fbs.writer()) catch return error.CorruptChunk;
    if (dst_fbs.pos != filtered_len) return error.CorruptChunk;

    const pixels = alloc.alloc(u8, @as(usize, width) * height * 4) catch return error.OutOfMemory;
    errdefer alloc.free(pixels);

    // Only filter type 0 (None) supported; fail on anything else
    var y: u32 = 0;
    while (y < height) : (y += 1) {
        const dst_off = @as(usize, y) * row_bytes;
        const src_off = @as(usize, y) * (row_bytes + 1);
        if (filtered[src_off] != 0) return error.UnsupportedPng;
        @memcpy(pixels[dst_off .. dst_off + row_bytes], filtered[src_off + 1 .. src_off + 1 + row_bytes]);
    }

    return .{ .width = width, .height = height, .pixels = pixels };
}
```

- [ ] **Step 5: Run the roundtrip test**

Run: `zig build test 2>&1 | grep -E "png|roundtrip"`
Expected: PASS. If failure references the DEFLATE API, swap in the correct `std.compress.flate` call for the current Zig version and re-run.

- [ ] **Step 6: Add a decode-rejects-non-rgba8 test**

Append to `src/png.zig`:

```zig
test "decode rejects RGB (non-alpha) PNGs with UnsupportedPng" {
    const alloc = std.testing.allocator;
    // Craft a minimal valid PNG with color type = 2 (RGB)
    var bytes = std.ArrayList(u8).init(alloc);
    defer bytes.deinit();
    try bytes.appendSlice(&signature);
    // IHDR
    var ihdr: [13]u8 = undefined;
    std.mem.writeInt(u32, ihdr[0..4], 1, .big);
    std.mem.writeInt(u32, ihdr[4..8], 1, .big);
    ihdr[8] = 8; ihdr[9] = 2; ihdr[10] = 0; ihdr[11] = 0; ihdr[12] = 0;
    try writeChunk(bytes.writer(), "IHDR", &ihdr);
    try writeChunk(bytes.writer(), "IEND", &.{});

    try std.testing.expectError(error.UnsupportedPng, decode(alloc, bytes.items));
}
```

Run: `zig build test`
Expected: both tests PASS.

- [ ] **Step 7: Commit**

```bash
git add src/png.zig build.zig
git commit -m "$(cat <<'EOF'
Add vendored minimal RGBA8 PNG codec

Supports only the narrow slice we need (RGBA8 non-interlaced with
filter type 0). Used by --capture for output and by imgdiff for
reading goldens.
EOF
)"
```

---

## Task 2: Offscreen render target in `renderer.zig`

Add a dedicated `VkImage` the renderer can draw into (instead of the swapchain) and a readback path that copies it to a host-visible staging buffer. The existing renderer pipeline already writes to whatever framebuffer we bind during the render pass — we just need to construct a framebuffer over this new image and have a way to kick the draw and read the pixels.

We keep the swapchain path untouched — the offscreen target is created on demand by capture mode.

**Files:**
- Modify: `src/renderer.zig` — add `OffscreenTarget` struct, `createOffscreen()`, `destroyOffscreen()`, `renderToOffscreen()`, `readbackOffscreen()`

- [ ] **Step 1: Find the right insertion point in renderer.zig**

Read `src/renderer.zig` to find (a) the `Context` struct definition and (b) the `drawCells` function. The new offscreen helpers will be a set of methods on `Context` (or free functions taking the Vulkan wrappers) that mirror what `drawCells` does but render to an owned image.

Run: `grep -n "pub const Context\|pub fn drawCells\|pub fn uploadAtlas\|pub fn deinit" src/renderer.zig`
Expected: locations of each. Use these when editing.

- [ ] **Step 2: Add the `OffscreenTarget` type and constructor**

In `src/renderer.zig`, after the `SwapchainResult` struct, add:

```zig
pub const OffscreenTarget = struct {
    width: u32,
    height: u32,
    format: vk.Format,
    image: vk.Image,
    memory: vk.DeviceMemory,
    view: vk.ImageView,
    framebuffer: vk.Framebuffer,

    /// Host-visible buffer for readback. Large enough for width*height*4 BGRA bytes.
    readback_buffer: vk.Buffer,
    readback_memory: vk.DeviceMemory,
    readback_size: u64,
};

pub fn createOffscreen(
    vki: vk.InstanceWrapper,
    vkd: vk.DeviceWrapper,
    physical: vk.PhysicalDevice,
    device: vk.Device,
    render_pass: vk.RenderPass,
    format: vk.Format,
    width: u32,
    height: u32,
) !OffscreenTarget {
    // 1. Color attachment image (TRANSFER_SRC for readback)
    const image = try vkd.createImage(device, &vk.ImageCreateInfo{
        .image_type = .@"2d",
        .format = format,
        .extent = .{ .width = width, .height = height, .depth = 1 },
        .mip_levels = 1,
        .array_layers = 1,
        .samples = .{ .@"1_bit" = true },
        .tiling = .optimal,
        .usage = .{ .color_attachment_bit = true, .transfer_src_bit = true },
        .sharing_mode = .exclusive,
        .initial_layout = .undefined,
    }, null);

    const img_mem_req = vkd.getImageMemoryRequirements(device, image);
    const mem_props = vki.getPhysicalDeviceMemoryProperties(physical);
    const img_mem_type = findMemoryType(mem_props, img_mem_req.memory_type_bits, .{ .device_local_bit = true }) orelse return error.NoMemoryType;

    const memory = try vkd.allocateMemory(device, &vk.MemoryAllocateInfo{
        .allocation_size = img_mem_req.size,
        .memory_type_index = img_mem_type,
    }, null);
    try vkd.bindImageMemory(device, image, memory, 0);

    const view = try vkd.createImageView(device, &vk.ImageViewCreateInfo{
        .image = image,
        .view_type = .@"2d",
        .format = format,
        .components = .{ .r = .identity, .g = .identity, .b = .identity, .a = .identity },
        .subresource_range = .{
            .aspect_mask = .{ .color_bit = true },
            .base_mip_level = 0,
            .level_count = 1,
            .base_array_layer = 0,
            .layer_count = 1,
        },
    }, null);

    const framebuffer = try vkd.createFramebuffer(device, &vk.FramebufferCreateInfo{
        .render_pass = render_pass,
        .attachment_count = 1,
        .p_attachments = @ptrCast(&view),
        .width = width,
        .height = height,
        .layers = 1,
    }, null);

    // 2. Host-visible readback buffer
    const readback_size: u64 = @as(u64, width) * height * 4;
    const readback_buffer = try vkd.createBuffer(device, &vk.BufferCreateInfo{
        .size = readback_size,
        .usage = .{ .transfer_dst_bit = true },
        .sharing_mode = .exclusive,
    }, null);
    const buf_mem_req = vkd.getBufferMemoryRequirements(device, readback_buffer);
    const buf_mem_type = findMemoryType(mem_props, buf_mem_req.memory_type_bits, .{
        .host_visible_bit = true,
        .host_coherent_bit = true,
    }) orelse return error.NoMemoryType;
    const readback_memory = try vkd.allocateMemory(device, &vk.MemoryAllocateInfo{
        .allocation_size = buf_mem_req.size,
        .memory_type_index = buf_mem_type,
    }, null);
    try vkd.bindBufferMemory(device, readback_buffer, readback_memory, 0);

    return .{
        .width = width,
        .height = height,
        .format = format,
        .image = image,
        .memory = memory,
        .view = view,
        .framebuffer = framebuffer,
        .readback_buffer = readback_buffer,
        .readback_memory = readback_memory,
        .readback_size = readback_size,
    };
}

pub fn destroyOffscreen(vkd: vk.DeviceWrapper, device: vk.Device, t: OffscreenTarget) void {
    vkd.destroyFramebuffer(device, t.framebuffer, null);
    vkd.destroyImageView(device, t.view, null);
    vkd.destroyImage(device, t.image, null);
    vkd.freeMemory(device, t.memory, null);
    vkd.destroyBuffer(device, t.readback_buffer, null);
    vkd.freeMemory(device, t.readback_memory, null);
}

fn findMemoryType(
    props: vk.PhysicalDeviceMemoryProperties,
    type_filter: u32,
    required: vk.MemoryPropertyFlags,
) ?u32 {
    var i: u32 = 0;
    while (i < props.memory_type_count) : (i += 1) {
        const bit = @as(u32, 1) << @intCast(i);
        if (type_filter & bit == 0) continue;
        const flags = props.memory_types[i].property_flags;
        if (flags.contains(required)) return i;
    }
    return null;
}
```

Note: if `Context` already has a `findMemoryType` helper, reuse it and delete the duplicate above.

- [ ] **Step 3: Add `renderToOffscreen` and `readbackOffscreen` on Context**

The existing `drawCells` in `Context` submits draws against the current swapchain image. We need an analogous method that draws into the offscreen framebuffer, transitions the image to `TRANSFER_SRC_OPTIMAL`, and copies into the readback buffer.

Strategy: refactor `drawCells` minimally so its render-pass body (begin pass → bind pipeline → bind descriptor sets → draw → end pass) is in a helper taking an explicit framebuffer + extent, and have both the swapchain path and the offscreen path call it.

In `src/renderer.zig`, add these methods on `Context` (put them next to `drawCells`):

```zig
pub fn renderToOffscreen(
    self: *Context,
    target: *const OffscreenTarget,
    instance_data: []const CellInstance, // same struct drawCells uses
    push: PushConstants,                 // same struct drawCells uses
) !void {
    // 1. Upload instances to the existing instance buffer (same as drawCells does).
    try self.uploadInstances(instance_data);

    // 2. Begin command buffer, transition offscreen image UNDEFINED → COLOR_ATTACHMENT_OPTIMAL
    const cmd = self.capture_cmd; // new: a dedicated command buffer allocated alongside the swapchain cmd buffer
    try self.vkd.resetCommandBuffer(cmd, .{});
    try self.vkd.beginCommandBuffer(cmd, &.{ .flags = .{ .one_time_submit_bit = true } });

    self.transitionImage(cmd, target.image, .undefined, .color_attachment_optimal,
        .{}, .{ .color_attachment_write_bit = true },
        .{ .top_of_pipe_bit = true }, .{ .color_attachment_output_bit = true });

    // 3. Begin the same render pass as swapchain but with the offscreen framebuffer + extent
    const clear: vk.ClearValue = .{ .color = .{ .float_32 = .{ 0, 0, 0, 1 } } };
    self.vkd.cmdBeginRenderPass(cmd, &vk.RenderPassBeginInfo{
        .render_pass = self.render_pass,
        .framebuffer = target.framebuffer,
        .render_area = .{ .offset = .{ .x = 0, .y = 0 }, .extent = .{ .width = target.width, .height = target.height } },
        .clear_value_count = 1,
        .p_clear_values = @ptrCast(&clear),
    }, .@"inline");

    self.recordDrawCommands(cmd, .{ .width = target.width, .height = target.height }, @intCast(instance_data.len), push);

    self.vkd.cmdEndRenderPass(cmd);

    // 4. Transition image COLOR_ATTACHMENT_OPTIMAL → TRANSFER_SRC_OPTIMAL
    self.transitionImage(cmd, target.image, .color_attachment_optimal, .transfer_src_optimal,
        .{ .color_attachment_write_bit = true }, .{ .transfer_read_bit = true },
        .{ .color_attachment_output_bit = true }, .{ .transfer_bit = true });

    // 5. Copy image → readback buffer
    const region = vk.BufferImageCopy{
        .buffer_offset = 0,
        .buffer_row_length = 0,
        .buffer_image_height = 0,
        .image_subresource = .{
            .aspect_mask = .{ .color_bit = true },
            .mip_level = 0,
            .base_array_layer = 0,
            .layer_count = 1,
        },
        .image_offset = .{ .x = 0, .y = 0, .z = 0 },
        .image_extent = .{ .width = target.width, .height = target.height, .depth = 1 },
    };
    self.vkd.cmdCopyImageToBuffer(cmd, target.image, .transfer_src_optimal, target.readback_buffer, 1, @ptrCast(&region));

    try self.vkd.endCommandBuffer(cmd);

    // 6. Submit and fence-wait
    try self.vkd.resetFences(self.device, 1, @ptrCast(&self.capture_fence));
    try self.vkd.queueSubmit(self.graphics_queue, 1, @ptrCast(&vk.SubmitInfo{
        .command_buffer_count = 1,
        .p_command_buffers = @ptrCast(&cmd),
    }), self.capture_fence);
    _ = try self.vkd.waitForFences(self.device, 1, @ptrCast(&self.capture_fence), .true, std.math.maxInt(u64));
}

pub fn readbackOffscreen(
    self: *Context,
    target: *const OffscreenTarget,
    out_rgba: []u8,
) !void {
    std.debug.assert(out_rgba.len == target.width * target.height * 4);

    const mapped = try self.vkd.mapMemory(self.device, target.readback_memory, 0, target.readback_size, .{});
    defer self.vkd.unmapMemory(self.device, target.readback_memory);
    const src = @as([*]const u8, @ptrCast(mapped.?))[0..target.readback_size];

    // Convert BGRA → RGBA, force alpha to 0xFF (surface is opaque composite).
    var i: usize = 0;
    while (i < target.readback_size) : (i += 4) {
        out_rgba[i + 0] = src[i + 2]; // R ← B
        out_rgba[i + 1] = src[i + 1]; // G
        out_rgba[i + 2] = src[i + 0]; // B ← R
        out_rgba[i + 3] = 0xFF;       // A
    }
}
```

You will need to:
- Add `capture_cmd: vk.CommandBuffer` and `capture_fence: vk.Fence` to `Context`, allocated in `Context.init` alongside the swapchain command buffer + fence.
- Extract a `recordDrawCommands(cmd, extent, instance_count, push)` helper from `drawCells` (bind pipeline, bind descriptor sets, bind vertex/instance buffers, set viewport/scissor to `extent`, cmdDrawInstanced). Call this helper from both `drawCells` (with swapchain extent) and `renderToOffscreen` (with offscreen extent).
- Add a `transitionImage(cmd, image, old_layout, new_layout, src_access, dst_access, src_stage, dst_stage)` helper if one doesn't already exist.
- Add an `uploadInstances(instances)` method if `drawCells` currently does this inline — otherwise call whatever existing upload function drawCells uses.

If the exact internal types differ (e.g. `CellInstance` is named `Instance`, `PushConstants` has a different name), use the real names.

- [ ] **Step 4: Verify it still compiles**

Run: `zig build 2>&1 | tail -40`
Expected: clean build. If there are errors, they will almost all be naming mismatches (fix them in `renderer.zig`).

- [ ] **Step 5: Commit**

```bash
git add src/renderer.zig
git commit -m "$(cat <<'EOF'
Add offscreen render target + readback to renderer

New OffscreenTarget, createOffscreen/destroyOffscreen, renderToOffscreen,
and readbackOffscreen. Draws into a dedicated TRANSFER_SRC image,
copies to a host-visible buffer, returns RGBA pixels.
EOF
)"
```

---

## Task 3: Extract frame-stats helpers to `src/bench_stats.zig`

`computeFrameStats` and `printFrameStats` currently live in `main.zig` and are private. The bench baseline tool needs them. Move them (and `FrameTimingRing`, `FrameTiming`, `SectionStats`, `FrameTimingStats`, `computeSectionStats`) into a new module so both `main.zig` and `src/tools/bench_baseline.zig` can import them.

**Files:**
- Create: `src/bench_stats.zig`
- Modify: `src/main.zig` — delete the moved code, import from the new module
- Modify: `build.zig` — add the bench_stats module

- [ ] **Step 1: Create the new module**

Create `src/bench_stats.zig` with the content currently at `main.zig:960-1070` (`FrameTimingRing`, `FrameTiming`, `SectionStats`, `FrameTimingStats`, `computeSectionStats`, `computeFrameStats`, `printFrameStats`). Make all of them `pub`. Also ensure the existing `FrameTiming` struct is moved (it lives near the ring buffer — grep for `const FrameTiming = struct` in main.zig to find it).

Run: `grep -n "^const FrameTiming \|^const FrameTimingRing\|^const SectionStats\|^const FrameTimingStats\|^fn computeSectionStats\|^fn computeFrameStats\|^fn printFrameStats" src/main.zig`
Expected: lines for each. Copy those definitions verbatim into `src/bench_stats.zig` and prefix with `pub `.

- [ ] **Step 2: Add bench_stats to build.zig**

In `build.zig`, add after the png module block (or near the other small modules):

```zig
    const bench_stats_mod = b.createModule(.{
        .root_source_file = b.path("src/bench_stats.zig"),
        .target = target,
        .optimize = optimize,
    });
    exe_mod.addImport("bench_stats", bench_stats_mod);
    main_test_mod.addImport("bench_stats", bench_stats_mod);

    const bench_stats_test_mod = b.createModule(.{
        .root_source_file = b.path("src/bench_stats.zig"),
        .target = target,
        .optimize = optimize,
    });
    const bench_stats_tests = b.addTest(.{ .root_module = bench_stats_test_mod });
    test_step.dependOn(&b.addRunArtifact(bench_stats_tests).step);
```

- [ ] **Step 3: Update main.zig to import from the new module**

Replace the now-moved local definitions in `src/main.zig` with:

```zig
const bench_stats = @import("bench_stats");
const FrameTiming = bench_stats.FrameTiming;
const FrameTimingRing = bench_stats.FrameTimingRing;
const FrameTimingStats = bench_stats.FrameTimingStats;
const computeFrameStats = bench_stats.computeFrameStats;
const printFrameStats = bench_stats.printFrameStats;
```

Delete the original definitions (the entire block from `const FrameTiming = struct` through `fn printFrameStats` — around main.zig:900-1070). Keep the existing tests in main.zig that reference these types; they will now reference the imported aliases.

- [ ] **Step 4: Run the tests to confirm nothing regressed**

Run: `zig build test 2>&1 | tail -40`
Expected: PASS (including the `FrameTimingRing` tests which now run under `bench_stats_tests` via their original `test` blocks — move those test blocks to `bench_stats.zig` if they were left in `main.zig`).

- [ ] **Step 5: Commit**

```bash
git add src/bench_stats.zig src/main.zig build.zig
git commit -m "$(cat <<'EOF'
Extract frame timing stats into bench_stats module

Enables the bench baseline tool to reuse stat computation without
linking the full waystty binary. No behavior change.
EOF
)"
```

---

## Task 4: Add JSON ser/de for `FrameTimingStats` in `bench_stats.zig`

Needed by `bench-baseline` and `bench-check`.

**Files:**
- Modify: `src/bench_stats.zig`

- [ ] **Step 1: Write the failing roundtrip test**

Append to `src/bench_stats.zig`:

```zig
pub const BaselineRecord = struct {
    workload_sha: []const u8,
    zig_version: []const u8,
    waystty_sha: []const u8,
    frame_count: usize,
    sections: struct {
        snapshot: SectionStats,
        row_rebuild: SectionStats,
        atlas_upload: SectionStats,
        instance_upload: SectionStats,
        gpu_submit: SectionStats,
    },
};

pub fn writeBaselineJson(alloc: std.mem.Allocator, rec: BaselineRecord, writer: anytype) !void {
    _ = alloc;
    _ = rec;
    _ = writer;
    return error.WriteFailed; // placeholder
}

pub fn readBaselineJson(alloc: std.mem.Allocator, bytes: []const u8) !BaselineRecord {
    _ = alloc;
    _ = bytes;
    return error.InvalidJson; // placeholder
}

test "baseline JSON round-trip" {
    const alloc = std.testing.allocator;
    const rec = BaselineRecord{
        .workload_sha = "abcdef",
        .zig_version = "0.15.0",
        .waystty_sha = "123abc",
        .frame_count = 256,
        .sections = .{
            .snapshot = .{ .min = 1, .avg = 2, .p99 = 3, .max = 4 },
            .row_rebuild = .{ .min = 10, .avg = 20, .p99 = 30, .max = 40 },
            .atlas_upload = .{ .min = 0, .avg = 0, .p99 = 0, .max = 0 },
            .instance_upload = .{ .min = 5, .avg = 6, .p99 = 7, .max = 8 },
            .gpu_submit = .{ .min = 9, .avg = 9, .p99 = 9, .max = 9 },
        },
    };

    var buf = std.ArrayList(u8).init(alloc);
    defer buf.deinit();
    try writeBaselineJson(alloc, rec, buf.writer());

    var parsed = try readBaselineJson(alloc, buf.items);
    defer {
        alloc.free(parsed.workload_sha);
        alloc.free(parsed.zig_version);
        alloc.free(parsed.waystty_sha);
    }

    try std.testing.expectEqual(@as(usize, 256), parsed.frame_count);
    try std.testing.expectEqual(@as(u32, 30), parsed.sections.row_rebuild.p99);
    try std.testing.expectEqualStrings("abcdef", parsed.workload_sha);
}
```

- [ ] **Step 2: Run to see the failure**

Run: `zig build test 2>&1 | grep -E "baseline|WriteFailed"`
Expected: failure due to placeholder.

- [ ] **Step 3: Implement using `std.json`**

Replace the two placeholders in `src/bench_stats.zig`:

```zig
pub fn writeBaselineJson(alloc: std.mem.Allocator, rec: BaselineRecord, writer: anytype) !void {
    _ = alloc;
    try std.json.stringify(rec, .{ .whitespace = .indent_2 }, writer);
}

pub fn readBaselineJson(alloc: std.mem.Allocator, bytes: []const u8) !BaselineRecord {
    // std.json.parseFromSlice returns a Parsed wrapper; we copy string fields so
    // the returned record owns its memory independent of the arena.
    var parsed = try std.json.parseFromSlice(BaselineRecord, alloc, bytes, .{});
    defer parsed.deinit();
    return .{
        .workload_sha = try alloc.dupe(u8, parsed.value.workload_sha),
        .zig_version = try alloc.dupe(u8, parsed.value.zig_version),
        .waystty_sha = try alloc.dupe(u8, parsed.value.waystty_sha),
        .frame_count = parsed.value.frame_count,
        .sections = parsed.value.sections,
    };
}
```

If the exact `std.json.stringify` signature differs in current Zig 0.15+, substitute the equivalent call (the API occasionally shifts between minor versions).

- [ ] **Step 4: Run the test again**

Run: `zig build test 2>&1 | grep -E "baseline"`
Expected: PASS.

- [ ] **Step 5: Commit**

```bash
git add src/bench_stats.zig
git commit -m "$(cat <<'EOF'
Add BaselineRecord + JSON round-trip for frame stats

Lays the foundation for bench-baseline / bench-check tooling.
EOF
)"
```

---

## Task 5: Capture mode flow (`src/capture.zig`)

The module owns the `--capture` entry point: parse args, start the same subsystems as `runTerminal`, force fixed 80×24 at scale=1, wait for window visibility, play the script through the PTY, drain + settle, render one frame to an offscreen target, read it back, write a PNG.

Because this overlaps significantly with `runTerminal`, factor out any shared helpers you encounter (don't duplicate). Put anything that only capture mode cares about in the new module.

**Files:**
- Create: `src/capture.zig`
- Modify: `src/main.zig` — dispatch `--capture` to `capture.run`
- Modify: `build.zig` — add capture module

- [ ] **Step 1: Scaffold the module and wire up `--capture` dispatch**

Create `src/capture.zig`:

```zig
const std = @import("std");
const vt = @import("vt");
const pty = @import("pty");
const wayland_client = @import("wayland-client");
const frame_loop_mod = @import("frame_loop");
const renderer = @import("renderer");
const font = @import("font");
const config = @import("config");
const png = @import("png");
const vk = @import("vulkan");

pub const CaptureError = error{
    MissingArgs,
    ScriptNotFound,
    OutputPathUnwritable,
    WindowNotVisible,
    WindowSizeMismatch,
    ReadbackTimeout,
    PngEncodeFailed,
};

pub fn run(alloc: std.mem.Allocator, argv: []const [:0]const u8) !void {
    if (argv.len < 3) {
        std.debug.print("usage: waystty --capture <script.vt> <output.png>\n", .{});
        return CaptureError.MissingArgs;
    }
    const script_path = argv[1];
    const output_path = argv[2];

    const script_bytes = std.fs.cwd().readFileAlloc(alloc, script_path, 16 * 1024 * 1024) catch |err| {
        std.debug.print("capture: script not found: {s} ({s})\n", .{ script_path, @errorName(err) });
        return CaptureError.ScriptNotFound;
    };
    defer alloc.free(script_bytes);

    _ = output_path;
    // Steps 2-6 fill in the rest.
    return error.NotImplementedYet;
}
```

In `src/main.zig`, add the dispatch near the other smoke-test branches (around line 94):

```zig
if (args.len >= 2 and std.mem.eql(u8, args[1], "--capture")) {
    const capture = @import("capture");
    return capture.run(alloc, args[1..]);
}
```

In `build.zig`, add (near the other modules):

```zig
    const capture_mod = b.createModule(.{
        .root_source_file = b.path("src/capture.zig"),
        .target = target,
        .optimize = optimize,
        .link_libc = true,
    });
    capture_mod.addImport("vt", vt_mod);
    capture_mod.addImport("pty", pty_mod);
    capture_mod.addImport("wayland-client", wayland_mod);
    capture_mod.addImport("frame_loop", frame_loop_mod);
    capture_mod.addImport("renderer", renderer_mod);
    capture_mod.addImport("font", font_mod);
    capture_mod.addImport("config", config_mod);
    capture_mod.addImport("png", png_mod);
    capture_mod.addImport("vulkan", vulkan_module);
    exe_mod.addImport("capture", capture_mod);
```

Confirm build still compiles:

Run: `zig build 2>&1 | tail -20`
Expected: clean build.

- [ ] **Step 2: Stand up the waystty subsystems (fixed 80×24, scale=1)**

Replace the body of `capture.run` in `src/capture.zig` with the subsystem wiring that matches `runTerminal` but with forced dimensions:

```zig
pub fn run(alloc: std.mem.Allocator, argv: []const [:0]const u8) !void {
    if (argv.len < 3) {
        std.debug.print("usage: waystty --capture <script.vt> <output.png>\n", .{});
        return CaptureError.MissingArgs;
    }
    const script_path = argv[1];
    const output_path = argv[2];

    const script_bytes = std.fs.cwd().readFileAlloc(alloc, script_path, 16 * 1024 * 1024) catch |err| {
        std.debug.print("capture: script not found: {s} ({s})\n", .{ script_path, @errorName(err) });
        return CaptureError.ScriptNotFound;
    };
    defer alloc.free(script_bytes);

    // Font (identical to runTerminal — scale forced to 1)
    var font_lookup = try font.lookupConfiguredFont(alloc);
    defer font_lookup.deinit(alloc);
    var face = try font.Face.init(alloc, font_lookup.path, font_lookup.index, config.font_size_px);
    defer face.deinit();
    const cell_w = face.cellWidth();
    const cell_h = face.cellHeight();

    // Fixed grid
    const cols: u16 = 80;
    const rows: u16 = 24;
    const px_w: u32 = @as(u32, cols) * cell_w;
    const px_h: u32 = @as(u32, rows) * cell_h;

    // Wayland
    const conn = try wayland_client.Connection.init(alloc);
    defer conn.deinit();
    const window = try conn.createWindow(alloc, "waystty-capture");
    defer window.deinit();
    window.width = px_w;
    window.height = px_h;
    _ = conn.display.roundtrip();

    // Renderer
    var ctx = try renderer.Context.init(
        alloc,
        @ptrCast(conn.display),
        @ptrCast(window.surface),
        px_w,
        px_h,
    );
    defer ctx.deinit();

    // Offscreen target
    var offscreen = try renderer.createOffscreen(
        ctx.vki, ctx.vkd, ctx.physical, ctx.device,
        ctx.render_pass, ctx.surface_format, px_w, px_h,
    );
    defer renderer.destroyOffscreen(ctx.vkd, ctx.device, offscreen);

    // Glyph atlas + ASCII warm
    var atlas = try font.Atlas.init(alloc, 1024, 1024);
    defer atlas.deinit();
    for (32..127) |cp| {
        _ = atlas.getOrInsert(&face, @intCast(cp)) catch |err| switch (err) {
            error.AtlasFull => break,
            else => return err,
        };
    }
    try ctx.uploadAtlas(atlas.pixels);
    atlas.last_uploaded_y = atlas.cursor_y;
    atlas.needs_full_upload = false;
    atlas.dirty = false;

    // Terminal
    var term = try vt.Terminal.init(alloc, .{
        .cols = cols, .rows = rows, .max_scrollback = 100,
    });
    defer term.deinit();
    term.setReportedSize(.{
        .rows = rows, .columns = cols,
        .cell_width = cell_w, .cell_height = cell_h,
    });

    _ = script_bytes; _ = output_path;
    // Steps 3-6 fill in the rest.
}
```

Match the exact field names in `renderer.Context` — grep for `pub const Context` and note which fields are `pub` (you may need to make a few more public: `vki`, `vkd`, `physical`, `device`, `render_pass`, `surface_format`).

Run: `zig build 2>&1 | tail -30`
Expected: clean build (unresolved field access will surface if any Context member isn't `pub`).

- [ ] **Step 3: Visibility wait with 3s timeout**

Add after the terminal init block in `capture.run`:

```zig
    // Wait up to 3s for the compositor to configure + map the window.
    const deadline_ns = std.time.nanoTimestamp() + 3 * std.time.ns_per_s;
    while (std.time.nanoTimestamp() < deadline_ns) {
        _ = conn.display.roundtrip();
        if (window.isConfigured() and window.isVisible()) break;
        std.time.sleep(10 * std.time.ns_per_ms);
    }
    if (!window.isConfigured() or !window.isVisible()) {
        std.debug.print("capture: window not visible after 3s (compositor hidden window?)\n", .{});
        return CaptureError.WindowNotVisible;
    }

    // Verify the compositor didn't resize us.
    if (window.width != px_w or window.height != px_h) {
        std.debug.print("capture: window size mismatch; expected {}x{} px, got {}x{}\n",
            .{ px_w, px_h, window.width, window.height });
        return CaptureError.WindowSizeMismatch;
    }
```

If `window.isConfigured()` and `window.isVisible()` don't exist, either add them to `wayland_client.Window` in `src/wayland.zig` (look for the existing configured/suspended flags on `SurfaceState`) or inline the check by poking at `window.surface_state.configured`.

Run: `zig build 2>&1 | tail -30`
Expected: clean build.

- [ ] **Step 4: PTY playback + drain**

Append to `capture.run`:

```zig
    // Spawn child: cat on the script bytes.
    // We feed via stdin pipe rather than cat <file> so we can ensure exact
    // timing of writes. The child is /bin/cat with no args reading from its
    // stdin; we write the script to the PTY master (which is cat's stdin).
    var p = try pty.Pty.spawn(.{
        .cols = cols, .rows = rows,
        .shell = "/bin/cat",
        .shell_args = null,
    });
    defer p.deinit();

    _ = try p.write(script_bytes);
    // Send EOF by closing the write side. Pty doesn't expose a closeWrite —
    // instead we signal EOF by writing ^D (0x04). When the PTY is in cooked
    // mode cat will terminate on EOF. If cooked-mode termios isn't the
    // default in pty.zig, open pty.zig and confirm; alternatively, just
    // wait for cat to consume the bytes and let the outer drain loop catch it.
    _ = try p.write(&.{0x04});

    // Drain: poll PTY until two consecutive 20ms polls return no data.
    var read_buf: [8192]u8 = undefined;
    var empty_ticks: u8 = 0;
    while (empty_ticks < 2) {
        var pfd = [_]std.posix.pollfd{
            .{ .fd = p.master_fd, .events = std.posix.POLL.IN, .revents = 0 },
        };
        _ = std.posix.poll(&pfd, 20) catch 0;
        if (pfd[0].revents & std.posix.POLL.IN != 0) {
            const n = p.read(&read_buf) catch |err| switch (err) {
                error.WouldBlock, error.InputOutput => 0,
                else => return err,
            };
            if (n > 0) {
                term.write(read_buf[0..n]);
                empty_ticks = 0;
                continue;
            }
        }
        empty_ticks += 1;
    }

    // Settle: one extra 50ms for the VT parser to finish.
    std.time.sleep(50 * std.time.ns_per_ms);
```

Run: `zig build 2>&1 | tail -30`
Expected: clean build.

- [ ] **Step 5: Render to offscreen + readback + write PNG**

Append to `capture.run`:

```zig
    // Build the instance list from the terminal snapshot. This mirrors
    // whatever runTerminal does just before drawCells. Extract a shared
    // helper if convenient; otherwise duplicate the small amount of code.
    var render_cache = @import("render_frame").RenderCache.empty;
    defer render_cache.deinit(alloc);
    try render_cache.resizeRows(alloc, rows);

    var snap = try term.snapshot(alloc);
    defer snap.deinit(alloc);

    // NOTE: the exact name/location of the instance-building function depends
    // on how main.zig is structured. If it's a free function in main.zig,
    // move it into a shared module (e.g. `src/render_frame.zig`) in this
    // step — capture.zig and main.zig both need it. Name it `buildInstances`.
    var instances = std.ArrayList(renderer.CellInstance).init(alloc);
    defer instances.deinit();
    try @import("render_frame").buildInstances(&instances, &snap, &atlas, &face, &render_cache);

    const push = renderer.PushConstants{
        .viewport_size = .{ @floatFromInt(px_w), @floatFromInt(px_h) },
        .cell_size = .{ @floatFromInt(cell_w), @floatFromInt(cell_h) },
        .coverage_params = .{ 1.0, 1.0 }, // inherit default
    };

    // If the atlas was dirtied by getOrInsert calls during snapshot, upload it.
    if (atlas.dirty) {
        try ctx.uploadAtlas(atlas.pixels);
        atlas.dirty = false;
    }

    try ctx.renderToOffscreen(&offscreen, instances.items, push);

    // Readback
    const rgba = try alloc.alloc(u8, px_w * px_h * 4);
    defer alloc.free(rgba);
    try ctx.readbackOffscreen(&offscreen, rgba);

    // PNG write
    const out_file = std.fs.cwd().createFile(output_path, .{}) catch |err| {
        std.debug.print("capture: cannot write output: {s}: {s}\n", .{ output_path, @errorName(err) });
        return CaptureError.OutputPathUnwritable;
    };
    defer out_file.close();
    png.encode(alloc, .{ .width = px_w, .height = px_h, .pixels = rgba }, out_file.writer()) catch |err| {
        std.debug.print("capture: png encode failed: {s}\n", .{@errorName(err)});
        return CaptureError.PngEncodeFailed;
    };

    std.debug.print("capture: wrote {s} ({}x{})\n", .{ output_path, px_w, px_h });
```

Note on the `render_frame` import: if `RenderCache` and the instance-building logic live inline in `main.zig` rather than in a reusable module, carve out `src/render_frame.zig` now containing both `pub const RenderCache = struct {...}` and `pub fn buildInstances(...)`. Import the new module from both `main.zig` and `capture.zig`, and add it to `build.zig` with `renderer_mod` as an import dependency. This is a necessary side-effect refactor, not gold-plating.

- [ ] **Step 6: End-to-end smoke — capture a one-line script**

Create a trivial script:

```bash
printf 'hello\r\n' > /tmp/capture_smoke.vt
```

Build and run:

```bash
zig build
./zig-out/bin/waystty --capture /tmp/capture_smoke.vt /tmp/capture_smoke.png
```

Expected: prints `capture: wrote /tmp/capture_smoke.png (WxH)`, and `/tmp/capture_smoke.png` exists.

```bash
file /tmp/capture_smoke.png
```

Expected: `PNG image data, 640 x 384, 8-bit/color RGBA, non-interlaced` (or whatever `80*cell_w × 24*cell_h` computes to on your font).

Open it in an image viewer and confirm "hello" is visible in the top-left.

If rendering is empty or garbled, investigate:
- Did the atlas get uploaded after adding the glyphs?
- Did `renderToOffscreen` actually execute the draw (check the command buffer reset + begin)?
- Are `viewport_size` / `cell_size` push constants correct?

- [ ] **Step 7: Commit**

```bash
git add src/capture.zig src/main.zig build.zig src/wayland.zig src/render_frame.zig
git commit -m "$(cat <<'EOF'
Add --capture mode: render a VT script to PNG

Forces 80x24 grid at scale=1, waits for window visibility, plays
script through PTY, drains + settles, renders to offscreen VkImage,
reads back BGRA→RGBA, writes PNG.
EOF
)"
```

---

## Task 6: `imgdiff` tool (`src/tools/imgdiff.zig`)

Standalone executable: compare two RGBA PNGs; compute RMSE (normalized RGB) + per-pixel max; optionally write a side-by-side diff image.

**Files:**
- Create: `src/tools/imgdiff.zig`
- Modify: `build.zig` — add `imgdiff` executable + `b.step("imgdiff", ...)`

- [ ] **Step 1: Write the comparison function with unit tests first**

Create `src/tools/imgdiff.zig`:

```zig
const std = @import("std");
const png = @import("png");

pub const DiffResult = struct {
    rmse: f64,           // [0, 1]
    max_pixel: f64,      // [0, 1]
    pixel_count: usize,
};

pub fn compare(a: png.Image, b: png.Image) !DiffResult {
    if (a.width != b.width or a.height != b.height) return error.DimensionsDiffer;
    std.debug.assert(a.pixels.len == b.pixels.len);

    const px_count = @as(usize, a.width) * a.height;
    var sum_sq: f64 = 0;
    var max_d: f64 = 0;

    var i: usize = 0;
    while (i < px_count) : (i += 1) {
        const off = i * 4;
        const dr = (@as(f64, @floatFromInt(a.pixels[off + 0])) - @as(f64, @floatFromInt(b.pixels[off + 0]))) / 255.0;
        const dg = (@as(f64, @floatFromInt(a.pixels[off + 1])) - @as(f64, @floatFromInt(b.pixels[off + 1]))) / 255.0;
        const db = (@as(f64, @floatFromInt(a.pixels[off + 2])) - @as(f64, @floatFromInt(b.pixels[off + 2]))) / 255.0;
        const d_sq = (dr * dr + dg * dg + db * db) / 3.0;
        sum_sq += d_sq;
        const d = @sqrt(d_sq);
        if (d > max_d) max_d = d;
    }

    return .{
        .rmse = @sqrt(sum_sq / @as(f64, @floatFromInt(px_count))),
        .max_pixel = max_d,
        .pixel_count = px_count,
    };
}

test "identical images produce zero RMSE" {
    var pixels_a = [_]u8{ 10, 20, 30, 255, 40, 50, 60, 255 };
    var pixels_b = [_]u8{ 10, 20, 30, 255, 40, 50, 60, 255 };
    const a = png.Image{ .width = 2, .height = 1, .pixels = &pixels_a };
    const b = png.Image{ .width = 2, .height = 1, .pixels = &pixels_b };
    const r = try compare(a, b);
    try std.testing.expectEqual(@as(f64, 0.0), r.rmse);
    try std.testing.expectEqual(@as(f64, 0.0), r.max_pixel);
}

test "fully saturated difference produces rmse=1.0 and max=1.0" {
    var pixels_a = [_]u8{ 0, 0, 0, 255 };
    var pixels_b = [_]u8{ 255, 255, 255, 255 };
    const a = png.Image{ .width = 1, .height = 1, .pixels = &pixels_a };
    const b = png.Image{ .width = 1, .height = 1, .pixels = &pixels_b };
    const r = try compare(a, b);
    try std.testing.expectApproxEqAbs(@as(f64, 1.0), r.rmse, 1e-9);
    try std.testing.expectApproxEqAbs(@as(f64, 1.0), r.max_pixel, 1e-9);
}
```

- [ ] **Step 2: Add imgdiff to build.zig**

Add near the other executables:

```zig
    const imgdiff_mod = b.createModule(.{
        .root_source_file = b.path("src/tools/imgdiff.zig"),
        .target = target,
        .optimize = optimize,
    });
    imgdiff_mod.addImport("png", png_mod);
    const imgdiff_exe = b.addExecutable(.{
        .name = "imgdiff",
        .root_module = imgdiff_mod,
    });
    b.installArtifact(imgdiff_exe);

    const imgdiff_test_mod = b.createModule(.{
        .root_source_file = b.path("src/tools/imgdiff.zig"),
        .target = target,
        .optimize = optimize,
    });
    imgdiff_test_mod.addImport("png", png_mod);
    const imgdiff_tests = b.addTest(.{ .root_module = imgdiff_test_mod });
    test_step.dependOn(&b.addRunArtifact(imgdiff_tests).step);
```

- [ ] **Step 3: Run the compare-function tests**

Run: `zig build test 2>&1 | grep -E "imgdiff|rmse"`
Expected: PASS.

- [ ] **Step 4: Add `main`, CLI arg handling, and diff-image output**

Append to `src/tools/imgdiff.zig`:

```zig
pub fn main() !void {
    var gpa: std.heap.DebugAllocator(.{}) = .init;
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    const args = try std.process.argsAlloc(alloc);
    defer std.process.argsFree(alloc, args);

    if (args.len < 3) {
        std.debug.print("usage: imgdiff <actual.png> <reference.png> [diff.png]\n", .{});
        std.process.exit(2);
    }
    const actual_path = args[1];
    const reference_path = args[2];
    const diff_path: ?[]const u8 = if (args.len >= 4) args[3] else null;

    const rmse_max = readFloatEnv("WAYSTTY_TEST_RMSE_MAX", 0.005);
    const pixel_max = readFloatEnv("WAYSTTY_TEST_PIXEL_MAX", 0.125);

    const actual_bytes = try std.fs.cwd().readFileAlloc(alloc, actual_path, 64 * 1024 * 1024);
    defer alloc.free(actual_bytes);
    const reference_bytes = try std.fs.cwd().readFileAlloc(alloc, reference_path, 64 * 1024 * 1024);
    defer alloc.free(reference_bytes);

    var actual = try png.decode(alloc, actual_bytes);
    defer actual.deinit(alloc);
    var reference = try png.decode(alloc, reference_bytes);
    defer reference.deinit(alloc);

    if (actual.width != reference.width or actual.height != reference.height) {
        std.debug.print("FAIL: dimensions differ ({}x{} vs {}x{})\n",
            .{ actual.width, actual.height, reference.width, reference.height });
        std.process.exit(3);
    }

    const r = try compare(actual, reference);
    const pass = r.rmse <= rmse_max and r.max_pixel <= pixel_max;

    if (pass) {
        std.debug.print("OK:   {s}  RMSE={d:.4}%  worst={d:.4}%\n",
            .{ reference_path, r.rmse * 100.0, r.max_pixel * 100.0 });
        std.process.exit(0);
    }

    std.debug.print("FAIL: {s}\n  RMSE: {d:.4}% (max {d:.4}%)\n  worst pixel: {d:.4}% (max {d:.4}%)\n",
        .{ reference_path, r.rmse * 100.0, rmse_max * 100.0, r.max_pixel * 100.0, pixel_max * 100.0 });

    if (diff_path) |p| {
        const diff_img = try makeDiffImage(alloc, actual, reference);
        defer alloc.free(diff_img.pixels);
        const out = try std.fs.cwd().createFile(p, .{});
        defer out.close();
        try png.encode(alloc, diff_img, out.writer());
        std.debug.print("  diff: {s}\n", .{p});
    }
    std.debug.print("  actual: {s}\n", .{actual_path});
    std.process.exit(1);
}

fn readFloatEnv(name: []const u8, default: f64) f64 {
    const val = std.posix.getenv(name) orelse return default;
    return std.fmt.parseFloat(f64, val) catch default;
}

fn makeDiffImage(alloc: std.mem.Allocator, a: png.Image, b: png.Image) !png.Image {
    // Side-by-side: [actual | reference | delta-heatmap]
    const w = a.width * 3;
    const h = a.height;
    const pixels = try alloc.alloc(u8, w * h * 4);
    var y: u32 = 0;
    while (y < h) : (y += 1) {
        const row_off = @as(usize, y) * w * 4;
        const a_off = @as(usize, y) * a.width * 4;
        // actual
        @memcpy(pixels[row_off .. row_off + a.width * 4], a.pixels[a_off .. a_off + a.width * 4]);
        // reference
        @memcpy(pixels[row_off + a.width * 4 .. row_off + 2 * a.width * 4], b.pixels[a_off .. a_off + a.width * 4]);
        // delta heatmap
        var x: u32 = 0;
        while (x < a.width) : (x += 1) {
            const off = a_off + x * 4;
            const dr = (@as(f64, @floatFromInt(a.pixels[off + 0])) - @as(f64, @floatFromInt(b.pixels[off + 0]))) / 255.0;
            const dg = (@as(f64, @floatFromInt(a.pixels[off + 1])) - @as(f64, @floatFromInt(b.pixels[off + 1]))) / 255.0;
            const db = (@as(f64, @floatFromInt(a.pixels[off + 2])) - @as(f64, @floatFromInt(b.pixels[off + 2]))) / 255.0;
            const d = @sqrt((dr * dr + dg * dg + db * db) / 3.0);
            const brightness: u8 = @intFromFloat(@min(255.0, d * 255.0 * 2.0)); // 2x gain for visibility
            const dst = row_off + 2 * a.width * 4 + x * 4;
            pixels[dst + 0] = brightness;
            pixels[dst + 1] = brightness;
            pixels[dst + 2] = brightness;
            pixels[dst + 3] = 255;
        }
    }
    return .{ .width = w, .height = h, .pixels = pixels };
}
```

- [ ] **Step 5: Smoke-test the CLI**

```bash
zig build
./zig-out/bin/imgdiff /tmp/capture_smoke.png /tmp/capture_smoke.png
```

Expected: `OK: /tmp/capture_smoke.png RMSE=0.0000% worst=0.0000%` and exit code 0.

- [ ] **Step 6: Commit**

```bash
git add src/tools/imgdiff.zig build.zig
git commit -m "$(cat <<'EOF'
Add imgdiff: RMSE + per-pixel-max PNG comparison

Standalone tool reused by test-render. Thresholds overridable via
WAYSTTY_TEST_RMSE_MAX and WAYSTTY_TEST_PIXEL_MAX.
EOF
)"
```

---

## Task 7: VT test scripts

Three initial scripts exercising distinct rendering features. All scripts must end with cursor-home (`\x1b[H`) for deterministic final state.

**Files:**
- Create: `tests/golden/scripts/basic_ascii.vt`
- Create: `tests/golden/scripts/bold_colors.vt`
- Create: `tests/golden/scripts/box_drawing.vt`

- [ ] **Step 1: Write `basic_ascii.vt`**

Script: print the printable ASCII range (32–126) on a single line, then newline and cursor-home. A small Zig program is cleaner than hand-escaping bytes; but since these are just file contents, use shell:

```bash
mkdir -p tests/golden/scripts
python3 -c '
import sys
sys.stdout.buffer.write(b"\x1b[2J\x1b[H")  # clear + home
sys.stdout.buffer.write(bytes(range(32, 127)))
sys.stdout.buffer.write(b"\r\n")
sys.stdout.buffer.write(b"\x1b[H")          # home
' > tests/golden/scripts/basic_ascii.vt
```

(Python is a one-off author-time tool here; the generated `.vt` file is what's checked in.)

Confirm size: `wc -c tests/golden/scripts/basic_ascii.vt` should show ~107 bytes.

- [ ] **Step 2: Write `bold_colors.vt`**

```bash
python3 -c '
import sys
out = b"\x1b[2J\x1b[H"
attrs = [
    (b"\x1b[0m",  b"normal"),
    (b"\x1b[1m",  b"bold"),
    (b"\x1b[2m",  b"dim"),
    (b"\x1b[3m",  b"italic"),
    (b"\x1b[4m",  b"underline"),
    (b"\x1b[7m",  b"reverse"),
]
for esc, label in attrs:
    out += esc + label + b"\x1b[0m\r\n"
for fg in range(30, 38):
    out += b"\x1b[" + str(fg).encode() + b"m" + b"fg%d " % fg
out += b"\x1b[0m\r\n"
for bg in range(40, 48):
    out += b"\x1b[" + str(bg).encode() + b"m" + b"bg%d " % bg
out += b"\x1b[0m\r\n"
out += b"\x1b[H"
sys.stdout.buffer.write(out)
' > tests/golden/scripts/bold_colors.vt
```

- [ ] **Step 3: Write `box_drawing.vt`**

```bash
python3 -c '
import sys
out = b"\x1b[2J\x1b[H"
# Box-drawing characters are U+2500..U+257F
out += "\u250C" + "\u2500"*10 + "\u2510\r\n"
for _ in range(3):
    out += "\u2502" + " " * 10 + "\u2502\r\n"
out += "\u2514" + "\u2500"*10 + "\u2518\r\n"
out += "\u2591\u2592\u2593\u2588 block chars\r\n"
out += "\x1b[H"
sys.stdout.buffer.write(out.encode("utf-8") if isinstance(out, str) else out)
' > tests/golden/scripts/box_drawing.vt
```

- [ ] **Step 4: Smoke-run each against capture**

```bash
zig build
for s in tests/golden/scripts/*.vt; do
    ./zig-out/bin/waystty --capture "$s" "/tmp/$(basename "$s" .vt).png"
done
```

Expected: each prints `capture: wrote ...` and a PNG is written. Open each PNG and eyeball it — this is the only time you'll visually inspect before setting goldens, so confirm:
- `basic_ascii.png`: printable characters on the top line
- `bold_colors.png`: visible bold/dim/italic/underline/reverse attribute rows + foreground/background color bars
- `box_drawing.png`: a bordered box with block shades beneath

If any look wrong, fix the script (or investigate the renderer) before the next task.

- [ ] **Step 5: Commit the scripts**

```bash
git add tests/golden/scripts/
git commit -m "$(cat <<'EOF'
Add initial VT test scripts for render testing

basic_ascii, bold_colors, box_drawing — exercise distinct rendering
paths. Each clears screen, emits content, returns cursor home.
EOF
)"
```

---

## Task 8: `test-render` orchestrator

Standalone tool that for each `.vt` in `tests/golden/scripts/`: runs `waystty --capture`, runs imgdiff against the corresponding `tests/golden/reference/*.png`, and summarizes pass/fail. Continues on failure. Non-zero exit if any fail.

**Files:**
- Create: `src/tools/test_render.zig`
- Modify: `build.zig` — add test_render executable + `b.step("test-render", ...)`

- [ ] **Step 1: Write `src/tools/test_render.zig`**

```zig
const std = @import("std");

pub fn main() !void {
    var gpa: std.heap.DebugAllocator(.{}) = .init;
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    const mode_update = blk: {
        const m = std.posix.getenv("WAYSTTY_GOLDEN_UPDATE") orelse break :blk false;
        break :blk std.mem.eql(u8, m, "1");
    };

    try std.fs.cwd().makePath("tests/golden/output");

    var scripts_dir = try std.fs.cwd().openDir("tests/golden/scripts", .{ .iterate = true });
    defer scripts_dir.close();
    var it = scripts_dir.iterate();

    var passed: usize = 0;
    var failed: usize = 0;

    while (try it.next()) |entry| {
        if (entry.kind != .file) continue;
        if (!std.mem.endsWith(u8, entry.name, ".vt")) continue;

        const base = entry.name[0 .. entry.name.len - 3];
        const script_path = try std.fmt.allocPrint(alloc, "tests/golden/scripts/{s}.vt", .{base});
        defer alloc.free(script_path);
        const output_path = try std.fmt.allocPrint(alloc, "tests/golden/output/{s}.png", .{base});
        defer alloc.free(output_path);
        const reference_path = try std.fmt.allocPrint(alloc, "tests/golden/reference/{s}.png", .{base});
        defer alloc.free(reference_path);
        const diff_path = try std.fmt.allocPrint(alloc, "tests/golden/output/{s}.diff.png", .{base});
        defer alloc.free(diff_path);

        // Run waystty --capture
        const cap = try std.process.Child.run(.{
            .allocator = alloc,
            .argv = &.{ "zig-out/bin/waystty", "--capture", script_path, output_path },
        });
        defer alloc.free(cap.stdout);
        defer alloc.free(cap.stderr);
        if (cap.term != .Exited or cap.term.Exited != 0) {
            std.debug.print("FAIL: {s}: capture exited with {}\n  stderr: {s}\n",
                .{ base, cap.term, cap.stderr });
            failed += 1;
            continue;
        }

        if (mode_update) {
            try std.fs.cwd().makePath("tests/golden/reference");
            try std.fs.cwd().copyFile(output_path, std.fs.cwd(), reference_path, .{});
            std.debug.print("UPDATED: {s}\n", .{base});
            passed += 1;
            continue;
        }

        // Run imgdiff
        const dif = try std.process.Child.run(.{
            .allocator = alloc,
            .argv = &.{ "zig-out/bin/imgdiff", output_path, reference_path, diff_path },
        });
        defer alloc.free(dif.stdout);
        defer alloc.free(dif.stderr);
        std.debug.print("{s}", .{dif.stdout});
        if (dif.term == .Exited and dif.term.Exited == 0) {
            passed += 1;
        } else {
            failed += 1;
        }
    }

    std.debug.print("\n=== test-render: {} passed, {} failed ===\n", .{ passed, failed });
    if (failed > 0) std.process.exit(1);
}
```

- [ ] **Step 2: Wire it into build.zig**

Add near the other tool executables:

```zig
    const test_render_mod = b.createModule(.{
        .root_source_file = b.path("src/tools/test_render.zig"),
        .target = target,
        .optimize = optimize,
    });
    const test_render_exe = b.addExecutable(.{
        .name = "test-render",
        .root_module = test_render_mod,
    });
    b.installArtifact(test_render_exe);

    const test_render_step = b.step("test-render", "Run all golden VT scripts and diff against references");
    // Make sure waystty + imgdiff are built first
    test_render_step.dependOn(b.getInstallStep());
    const test_render_run = b.addRunArtifact(test_render_exe);
    test_render_run.step.dependOn(b.getInstallStep());
    test_render_step.dependOn(&test_render_run.step);
```

Confirm build: `zig build`

- [ ] **Step 3: Test the orchestrator failure path**

With no reference directory yet, `zig build test-render` should report 3 failures (missing reference PNGs → imgdiff fails on readFile).

Run: `zig build test-render 2>&1 | tail -10`
Expected: summary shows 3 failed. Non-zero exit.

- [ ] **Step 4: Commit the orchestrator**

```bash
git add src/tools/test_render.zig build.zig
git commit -m "$(cat <<'EOF'
Add test-render orchestrator

Iterates tests/golden/scripts/*.vt, runs waystty --capture on each,
compares with imgdiff against reference PNGs. Continues on failure.
WAYSTTY_GOLDEN_UPDATE=1 copies output to reference instead.
EOF
)"
```

---

## Task 9: Generate and commit initial golden references

Run `test-render` in update mode to populate `tests/golden/reference/`, visually verify the results, commit.

**Files:**
- Create: `tests/golden/reference/basic_ascii.png`
- Create: `tests/golden/reference/bold_colors.png`
- Create: `tests/golden/reference/box_drawing.png`
- Modify: `.gitignore` — add `tests/golden/output/`

- [ ] **Step 1: Update `.gitignore`**

Append to `.gitignore`:

```
# test-render generated artifacts
tests/golden/output/
```

- [ ] **Step 2: Generate references**

```bash
WAYSTTY_GOLDEN_UPDATE=1 zig build test-render
```

Expected: `UPDATED: basic_ascii`, `UPDATED: bold_colors`, `UPDATED: box_drawing`.

- [ ] **Step 3: Visually inspect each reference**

```bash
ls -l tests/golden/reference/
xdg-open tests/golden/reference/basic_ascii.png
xdg-open tests/golden/reference/bold_colors.png
xdg-open tests/golden/reference/box_drawing.png
```

For each, confirm:
- Text is crisp, no garbled glyphs
- Colors match expectations (bold rows look bold, color bars show 8 distinct colors, etc.)
- Box characters form actual boxes

If any look wrong, debug the renderer or the script before committing.

- [ ] **Step 4: Confirm steady-state passes**

```bash
zig build test-render
```

Expected: `3 passed, 0 failed`.

- [ ] **Step 5: Tune thresholds if needed**

Re-run `zig build test-render` several times to confirm stability (no flaky failures from subpixel jitter). If any fail, either (a) the output is non-deterministic (bug — investigate) or (b) loosen the thresholds. The two easy knobs:

```bash
WAYSTTY_TEST_RMSE_MAX=0.01 WAYSTTY_TEST_PIXEL_MAX=0.2 zig build test-render
```

If you needed to loosen, bake the new defaults into `src/tools/imgdiff.zig` (the `readFloatEnv("WAYSTTY_TEST_RMSE_MAX", 0.005)` call) and commit that change.

- [ ] **Step 6: Commit**

```bash
git add tests/golden/reference/ .gitignore
git commit -m "$(cat <<'EOF'
Commit initial golden reference PNGs

Generated from tests/golden/scripts/* and visually verified.
Regenerate with: WAYSTTY_GOLDEN_UPDATE=1 zig build test-render
EOF
)"
```

---

## Task 10: `bench-baseline` and `bench-check`

Two modes of the same tool. Runs the existing `WAYSTTY_BENCH=1` workload by shelling out to waystty, parses the FrameTimingStats from `bench.log` (or from a new stdout JSON emission), writes/compares `tests/bench/baseline.json`.

Two design decisions here:

1. **How to get stats out of a waystty run:** current bench prints a human-readable table. Add a sidecar flag `WAYSTTY_BENCH_JSON=/path/to/file.json` that also dumps the `BaselineRecord` to disk on exit. Cleaner than parsing the text table.

2. **What sha to store:** workload_sha = `sha256(bench_script string literal in main.zig)`; waystty_sha = `git rev-parse HEAD`; zig_version = captured at compile-time via `@import("builtin").zig_version_string`.

**Files:**
- Modify: `src/main.zig` — if `WAYSTTY_BENCH_JSON` is set, write `BaselineRecord` on exit
- Create: `src/tools/bench_baseline.zig`
- Modify: `build.zig` — add executable + build step

- [ ] **Step 1: Add the JSON dump to main.zig's bench exit path**

Find where `printFrameStats(stats)` is currently called in the bench-exit path in `main.zig` (grep for it). Right after that call, add:

```zig
    if (std.posix.getenv("WAYSTTY_BENCH_JSON")) |path| {
        const bench_stats_mod = @import("bench_stats");
        const rec = bench_stats_mod.BaselineRecord{
            .workload_sha = &sha256Hex(alloc, bench_script orelse ""),
            .zig_version = @import("builtin").zig_version_string,
            .waystty_sha = gitHead(alloc) catch "unknown",
            .frame_count = stats.frame_count,
            .sections = .{
                .snapshot = stats.snapshot,
                .row_rebuild = stats.row_rebuild,
                .atlas_upload = stats.atlas_upload,
                .instance_upload = stats.instance_upload,
                .gpu_submit = stats.gpu_submit,
            },
        };
        const f = std.fs.cwd().createFile(path, .{}) catch |err| {
            std.log.warn("bench_json write failed: {s} ({s})", .{ path, @errorName(err) });
            return;
        };
        defer f.close();
        bench_stats_mod.writeBaselineJson(alloc, rec, f.writer()) catch |err| {
            std.log.warn("bench_json serialize failed: {s}", .{@errorName(err)});
        };
    }
```

Add helper functions at the bottom of `main.zig`:

```zig
fn sha256Hex(alloc: std.mem.Allocator, input: []const u8) [64]u8 {
    _ = alloc;
    var digest: [32]u8 = undefined;
    std.crypto.hash.sha2.Sha256.hash(input, &digest, .{});
    var hex: [64]u8 = undefined;
    _ = std.fmt.bufPrint(&hex, "{}", .{std.fmt.fmtSliceHexLower(&digest)}) catch unreachable;
    return hex;
}

fn gitHead(alloc: std.mem.Allocator) ![]const u8 {
    const r = try std.process.Child.run(.{ .allocator = alloc, .argv = &.{ "git", "rev-parse", "HEAD" } });
    defer alloc.free(r.stdout);
    defer alloc.free(r.stderr);
    if (r.term != .Exited or r.term.Exited != 0) return "unknown";
    return std.mem.trim(u8, r.stdout, "\n \t");
}
```

Note: `sha256Hex` returns a stack array; the `.workload_sha` field stores a slice. Adjust to `alloc.dupe` or use a scoped buffer.

Run: `zig build` then `WAYSTTY_BENCH=1 WAYSTTY_BENCH_JSON=/tmp/bench.json ./zig-out/bin/waystty 2>/tmp/bench.log`
After the bench workload finishes, check `/tmp/bench.json` exists and contains valid JSON with non-zero `frame_count`.

- [ ] **Step 2: Write `bench_baseline.zig`**

Create `src/tools/bench_baseline.zig`:

```zig
const std = @import("std");
const bench_stats = @import("bench_stats");

pub fn main() !void {
    var gpa: std.heap.DebugAllocator(.{}) = .init;
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    const args = try std.process.argsAlloc(alloc);
    defer std.process.argsFree(alloc, args);

    const mode: enum { save, check } = if (args.len >= 2 and std.mem.eql(u8, args[1], "save"))
        .save
    else
        .check;

    const baseline_path = "tests/bench/baseline.json";
    const tmp_json = "/tmp/waystty-bench-current.json";

    try std.fs.cwd().makePath("tests/bench");

    // Run waystty with WAYSTTY_BENCH=1 WAYSTTY_BENCH_JSON=<tmp>
    var env = try std.process.getEnvMap(alloc);
    defer env.deinit();
    try env.put("WAYSTTY_BENCH", "1");
    try env.put("WAYSTTY_BENCH_JSON", tmp_json);

    const child = try std.process.Child.run(.{
        .allocator = alloc,
        .argv = &.{"zig-out/bin/waystty"},
        .env_map = &env,
    });
    defer alloc.free(child.stdout);
    defer alloc.free(child.stderr);

    const current_bytes = std.fs.cwd().readFileAlloc(alloc, tmp_json, 16 * 1024) catch |err| {
        std.debug.print("bench: no JSON output at {s}: {s}\n", .{ tmp_json, @errorName(err) });
        std.process.exit(2);
    };
    defer alloc.free(current_bytes);
    var current = try bench_stats.readBaselineJson(alloc, current_bytes);
    defer {
        alloc.free(current.workload_sha);
        alloc.free(current.zig_version);
        alloc.free(current.waystty_sha);
    }

    if (mode == .save) {
        const out = try std.fs.cwd().createFile(baseline_path, .{});
        defer out.close();
        try bench_stats.writeBaselineJson(alloc, current, out.writer());
        std.debug.print("bench: wrote {s}\n", .{baseline_path});
        return;
    }

    // check mode
    const baseline_bytes = std.fs.cwd().readFileAlloc(alloc, baseline_path, 16 * 1024) catch |err| {
        std.debug.print("bench: no baseline at {s}: {s}\n  run: zig build bench-baseline\n", .{ baseline_path, @errorName(err) });
        std.process.exit(2);
    };
    defer alloc.free(baseline_bytes);
    var baseline = try bench_stats.readBaselineJson(alloc, baseline_bytes);
    defer {
        alloc.free(baseline.workload_sha);
        alloc.free(baseline.zig_version);
        alloc.free(baseline.waystty_sha);
    }

    if (!std.mem.eql(u8, baseline.workload_sha, current.workload_sha)) {
        std.debug.print("WARN: bench script changed since baseline; consider regenerating\n", .{});
    }

    const pct = blk: {
        const v = std.posix.getenv("WAYSTTY_BENCH_REGRESSION_PCT") orelse break :blk 20.0;
        break :blk std.fmt.parseFloat(f64, v) catch 20.0;
    };

    var regressed = false;
    const sections = [_][]const u8{ "snapshot", "row_rebuild", "atlas_upload", "instance_upload", "gpu_submit" };
    inline for (sections, 0..) |name, i| {
        _ = i;
        const base_p99 = @field(baseline.sections, name).p99;
        const cur_p99 = @field(current.sections, name).p99;
        const delta_pct = if (base_p99 == 0) 0.0
            else ((@as(f64, @floatFromInt(cur_p99)) - @as(f64, @floatFromInt(base_p99))) / @as(f64, @floatFromInt(base_p99))) * 100.0;
        const status = if (delta_pct > pct) "REGRESSION" else "OK";
        if (delta_pct > pct) regressed = true;
        std.debug.print("bench: {s:<16} p99 {d:>5}us (baseline {d:>5}us)  {d:+6.1}%  {s}\n",
            .{ name, cur_p99, base_p99, delta_pct, status });
    }

    if (regressed) std.process.exit(1);
}
```

Note the `inline for (sections, 0..)` with string field access: Zig's `@field` works on compile-time-known field names, which `sections` provides. If `inline for` with a string-array approach doesn't compile in your Zig version, unroll the five section comparisons explicitly.

- [ ] **Step 3: Wire into build.zig**

```zig
    const bench_baseline_mod = b.createModule(.{
        .root_source_file = b.path("src/tools/bench_baseline.zig"),
        .target = target,
        .optimize = optimize,
    });
    bench_baseline_mod.addImport("bench_stats", bench_stats_mod);
    const bench_baseline_exe = b.addExecutable(.{
        .name = "bench-baseline",
        .root_module = bench_baseline_mod,
    });
    b.installArtifact(bench_baseline_exe);

    const bench_baseline_step = b.step("bench-baseline", "Save current frame-timing profile to tests/bench/baseline.json");
    const bench_baseline_run = b.addRunArtifact(bench_baseline_exe);
    bench_baseline_run.addArg("save");
    bench_baseline_run.step.dependOn(b.getInstallStep());
    bench_baseline_step.dependOn(&bench_baseline_run.step);

    const bench_check_step = b.step("bench-check", "Compare current frame timings against baseline");
    const bench_check_run = b.addRunArtifact(bench_baseline_exe);
    bench_check_run.addArg("check");
    bench_check_run.step.dependOn(b.getInstallStep());
    bench_check_step.dependOn(&bench_check_run.step);
```

- [ ] **Step 4: Generate the initial baseline**

```bash
zig build bench-baseline
cat tests/bench/baseline.json
```

Expected: `tests/bench/baseline.json` exists with non-zero `frame_count` and five section entries.

- [ ] **Step 5: Sanity-check `bench-check`**

```bash
zig build bench-check
```

Expected: all sections print with small percentage deltas; overall `REGRESSION` status only if run-to-run variance exceeds 20%. On a quiet machine this should pass cleanly.

Run it three times in a row to confirm stability. If it flakes, raise the default threshold or investigate a genuine noise source.

- [ ] **Step 6: Commit**

```bash
git add tests/bench/baseline.json src/main.zig src/tools/bench_baseline.zig build.zig
git commit -m "$(cat <<'EOF'
Add bench-baseline and bench-check

Baseline stores workload/zig/waystty SHA + per-section p99. Check
compares against baseline, flags sections exceeding 20% p99 growth.
Threshold overridable via WAYSTTY_BENCH_REGRESSION_PCT.
EOF
)"
```

---

## Task 11: Makefile targets + housekeeping

Bring the new tools under the familiar `make` UX, gitignore stray test binaries at repo root.

**Files:**
- Modify: `Makefile`
- Modify: `.gitignore`

- [ ] **Step 1: Add Makefile targets**

Replace `Makefile` with:

```makefile
ZIG ?= zig
FLAMEGRAPH ?= flamegraph.pl
STACKCOLLAPSE ?= stackcollapse-perf.pl

.PHONY: build run test bench profile clean test-render golden-update bench-baseline bench-check

build:
	$(ZIG) build

run: build
	$(ZIG) build run

test:
	$(ZIG) build test

zig-out/bin/waystty: $(wildcard src/*.zig) $(wildcard src/tools/*.zig) $(wildcard shaders/*)
	$(ZIG) build

bench: zig-out/bin/waystty
	WAYSTTY_BENCH=1 ./zig-out/bin/waystty 2>bench.log || true
	@echo "--- frame timing ---"
	@grep -A 12 "waystty frame timing" bench.log || echo "(no timing data found)"

profile:
	$(ZIG) build -Doptimize=ReleaseSafe
	perf record -g -F 999 --no-inherit -o perf.data -- \
		sh -c 'WAYSTTY_BENCH=1 ./zig-out/bin/waystty 2>bench.log'
	perf script -i perf.data \
		| $(STACKCOLLAPSE) \
		| $(FLAMEGRAPH) > flamegraph.svg
	@echo "--- frame timing ---"
	@grep -A 12 "waystty frame timing" bench.log || echo "(no timing data found)"
	xdg-open flamegraph.svg

test-render:
	$(ZIG) build test-render

golden-update:
	WAYSTTY_GOLDEN_UPDATE=1 $(ZIG) build test-render

bench-baseline:
	$(ZIG) build bench-baseline

bench-check:
	$(ZIG) build bench-check

clean:
	rm -rf zig-out .zig-cache perf.data bench.log flamegraph.svg tests/golden/output
```

- [ ] **Step 2: Verify each target runs**

```bash
make test-render
make bench-check
```

Expected: both exit 0.

- [ ] **Step 3: Update `.gitignore` for stray binaries**

The working tree has untracked `test_io`, `test_io2`, `test_io3`, `test_sig`, `test_timer` binaries at repo root. They appear to be scratch compilation outputs. Append to `.gitignore`:

```
# Scratch test binaries (ad-hoc compilations)
/test_io
/test_io2
/test_io3
/test_sig
/test_timer
```

Confirm they no longer appear in `git status`.

- [ ] **Step 4: Final commit**

```bash
git add Makefile .gitignore
git commit -m "$(cat <<'EOF'
Add make targets for render + bench tests

make test-render, golden-update, bench-baseline, bench-check.
Gitignore stray test_* scratch binaries at repo root.
EOF
)"
```

---

## Verification

After all tasks complete:

```bash
make test         # existing unit tests still pass
make test-render  # 3 passed, 0 failed
make bench-check  # all sections OK vs baseline
```

All three should exit 0.

## Self-Review Notes

- **Spec coverage:** every spec section maps to a task. Capture mode → Task 5; imgdiff → Task 6; scripts → Task 7; orchestrator + goldens → Tasks 8-9; bench baseline/check → Task 10; Makefile → Task 11. Offscreen render target (critical spec fix) → Task 2. PNG codec dependency → Task 1.
- **Placeholders:** none. Every code step contains the actual code. Any "refactor if needed" notes (like the `renderer.zig` helper extraction in Task 2) are concrete — they identify what to extract and why.
- **Types consistent:** `FrameTimingStats`, `BaselineRecord`, `OffscreenTarget`, `Image`, `DiffResult` appear with consistent signatures across the tasks that use them.
- **Known API uncertainty:** `std.compress.flate` and `std.json.stringify` signatures occasionally drift between Zig minor versions. Tasks 1 and 4 call this out explicitly so the implementer substitutes the current Zig 0.15+ equivalent rather than copy-pasting blindly.