docs/superpowers/plans/2026-04-10-performance-benchmarking-implementation.md
Ref: Size: 22.4 KiB
# Performance Benchmarking Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add per-section frame timing instrumentation, a reproducible bench workload, and a perf/flamegraph target so we can measure responsiveness before and after fixing known bottlenecks.
**Architecture:** A 256-entry ring buffer of `FrameTiming` structs records microsecond timings for five render-loop sections. Stats are dumped to stderr on SIGUSR1 and clean exit. A `WAYSTTY_BENCH=1` env var swaps the user's shell for a fixed workload script. A `Makefile` provides `bench` and `profile` targets.
**Tech Stack:** Zig 0.15, `std.time.Timer`, `std.posix.sigaction`, `perf record`, `flamegraph.pl`/`stackcollapse-perf.pl`
---
## File Structure
- Modify: `src/main.zig`
- `FrameTiming` struct, `FrameTimingRing` ring buffer, `computeStats` helper, `formatStats` printer
- SIGUSR1 signal handler that sets an atomic flag
- Section timers wrapping each render-loop phase
- `WAYSTTY_BENCH` env var check in the shell-selection block
- Stats dump on clean exit
- Create: `Makefile`
- `bench` target: build + run with `WAYSTTY_BENCH=1`, extract stats from stderr
- `profile` target: build ReleaseSafe + `perf record` + flamegraph generation
### Task 1: Add FrameTiming struct and ring buffer with tests
**Files:**
- Modify: `src/main.zig`
- Test: `src/main.zig`
- [ ] **Step 1: Write the failing tests**
Add at the bottom of `src/main.zig`, after the existing test blocks:
```zig
test "FrameTiming.total sums all sections" {
const ft: FrameTiming = .{
.snapshot_us = 10,
.row_rebuild_us = 20,
.atlas_upload_us = 30,
.instance_upload_us = 40,
.gpu_submit_us = 50,
};
try std.testing.expectEqual(@as(u32, 150), ft.total());
}
test "FrameTimingRing records and wraps correctly" {
var ring = FrameTimingRing{};
try std.testing.expectEqual(@as(usize, 0), ring.count);
ring.push(.{ .snapshot_us = 1, .row_rebuild_us = 2, .atlas_upload_us = 3, .instance_upload_us = 4, .gpu_submit_us = 5 });
try std.testing.expectEqual(@as(usize, 1), ring.count);
try std.testing.expectEqual(@as(u32, 1), ring.entries[0].snapshot_us);
// Fill to capacity
for (1..FrameTimingRing.capacity) |i| {
ring.push(.{ .snapshot_us = @intCast(i + 1), .row_rebuild_us = 0, .atlas_upload_us = 0, .instance_upload_us = 0, .gpu_submit_us = 0 });
}
try std.testing.expectEqual(FrameTimingRing.capacity, ring.count);
// One more wraps around — overwrites entries[0], head advances to 1
ring.push(.{ .snapshot_us = 999, .row_rebuild_us = 0, .atlas_upload_us = 0, .instance_upload_us = 0, .gpu_submit_us = 0 });
try std.testing.expectEqual(FrameTimingRing.capacity, ring.count);
// Newest entry is at (head + capacity - 1) % capacity = 0
try std.testing.expectEqual(@as(u32, 999), ring.entries[0].snapshot_us);
// head has advanced past the overwritten slot
try std.testing.expectEqual(@as(usize, 1), ring.head);
}
test "FrameTimingRing.orderedSlice returns entries in insertion order after wrap" {
var ring = FrameTimingRing{};
// Push capacity + 3 entries so the ring wraps
for (0..FrameTimingRing.capacity + 3) |i| {
ring.push(.{ .snapshot_us = @intCast(i), .row_rebuild_us = 0, .atlas_upload_us = 0, .instance_upload_us = 0, .gpu_submit_us = 0 });
}
var buf: [FrameTimingRing.capacity]FrameTiming = undefined;
const ordered = ring.orderedSlice(&buf);
try std.testing.expectEqual(FrameTimingRing.capacity, ordered.len);
// First entry should be the 4th pushed (index 3), last should be capacity+2
try std.testing.expectEqual(@as(u32, 3), ordered[0].snapshot_us);
try std.testing.expectEqual(@as(u32, FrameTimingRing.capacity + 2), ordered[ordered.len - 1].snapshot_us);
}
```
- [ ] **Step 2: Run test to verify it fails**
Run: `zig build test 2>&1 | head -20`
Expected: FAIL with `FrameTiming` not found.
- [ ] **Step 3: Implement FrameTiming and FrameTimingRing**
Add above the test blocks in `src/main.zig`:
```zig
const FrameTiming = struct {
snapshot_us: u32 = 0,
row_rebuild_us: u32 = 0,
atlas_upload_us: u32 = 0,
instance_upload_us: u32 = 0,
gpu_submit_us: u32 = 0,
fn total(self: FrameTiming) u32 {
return self.snapshot_us +
self.row_rebuild_us +
self.atlas_upload_us +
self.instance_upload_us +
self.gpu_submit_us;
}
};
const FrameTimingRing = struct {
const capacity = 256;
entries: [capacity]FrameTiming = [_]FrameTiming{.{}} ** capacity,
head: usize = 0,
count: usize = 0,
fn push(self: *FrameTimingRing, timing: FrameTiming) void {
const idx = if (self.count < capacity) self.count else self.head;
self.entries[idx] = timing;
if (self.count < capacity) {
self.count += 1;
} else {
self.head = (self.head + 1) % capacity;
}
}
/// Return a slice of valid entries in insertion order.
/// Caller must provide a scratch buffer of `capacity` entries.
fn orderedSlice(self: *const FrameTimingRing, buf: *[capacity]FrameTiming) []const FrameTiming {
if (self.count < capacity) {
return self.entries[0..self.count];
}
// Ring has wrapped — copy from head..end then 0..head
const tail_len = capacity - self.head;
@memcpy(buf[0..tail_len], self.entries[self.head..capacity]);
@memcpy(buf[tail_len..capacity], self.entries[0..self.head]);
return buf[0..capacity];
}
};
```
- [ ] **Step 4: Run test to verify it passes**
Run: `zig build test 2>&1 | tail -5`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add src/main.zig
git commit -m "Add FrameTiming struct and ring buffer"
```
### Task 2: Add stats computation and formatting with tests
**Files:**
- Modify: `src/main.zig`
- Test: `src/main.zig`
- [ ] **Step 1: Write the failing tests**
Add after the Task 1 tests in `src/main.zig`:
```zig
test "FrameTimingStats computes min/avg/p99/max correctly" {
var ring = FrameTimingRing{};
// Push 100 frames with snapshot_us = 1..100
for (0..100) |i| {
ring.push(.{
.snapshot_us = @intCast(i + 1),
.row_rebuild_us = 0,
.atlas_upload_us = 0,
.instance_upload_us = 0,
.gpu_submit_us = 0,
});
}
const stats = computeFrameStats(&ring);
try std.testing.expectEqual(@as(u32, 1), stats.snapshot.min);
try std.testing.expectEqual(@as(u32, 100), stats.snapshot.max);
try std.testing.expectEqual(@as(u32, 50), stats.snapshot.avg);
// p99 of 1..100 = value at index 98 (0-based) = 99
try std.testing.expectEqual(@as(u32, 99), stats.snapshot.p99);
try std.testing.expectEqual(@as(usize, 100), stats.frame_count);
}
test "FrameTimingStats handles empty ring" {
var ring = FrameTimingRing{};
const stats = computeFrameStats(&ring);
try std.testing.expectEqual(@as(usize, 0), stats.frame_count);
try std.testing.expectEqual(@as(u32, 0), stats.snapshot.min);
}
```
- [ ] **Step 2: Run test to verify it fails**
Run: `zig build test 2>&1 | head -20`
Expected: FAIL with `computeFrameStats` not found.
- [ ] **Step 3: Implement stats computation and formatting**
Add after the `FrameTimingRing` definition:
```zig
const SectionStats = struct {
min: u32 = 0,
avg: u32 = 0,
p99: u32 = 0,
max: u32 = 0,
};
const FrameTimingStats = struct {
snapshot: SectionStats = .{},
row_rebuild: SectionStats = .{},
atlas_upload: SectionStats = .{},
instance_upload: SectionStats = .{},
gpu_submit: SectionStats = .{},
total: SectionStats = .{},
frame_count: usize = 0,
};
fn computeSectionStats(values: []u32) SectionStats {
if (values.len == 0) return .{};
std.mem.sort(u32, values, {}, std.sort.asc(u32));
var sum: u64 = 0;
for (values) |v| sum += v;
const p99_idx = if (values.len <= 1) 0 else ((values.len - 1) * 99) / 100;
return .{
.min = values[0],
.avg = @intCast(sum / values.len),
.p99 = values[p99_idx],
.max = values[values.len - 1],
};
}
fn computeFrameStats(ring: *const FrameTimingRing) FrameTimingStats {
if (ring.count == 0) return .{};
var ordered_buf: [FrameTimingRing.capacity]FrameTiming = undefined;
const entries = ring.orderedSlice(&ordered_buf);
const n = entries.len;
var snapshot_vals: [FrameTimingRing.capacity]u32 = undefined;
var row_rebuild_vals: [FrameTimingRing.capacity]u32 = undefined;
var atlas_upload_vals: [FrameTimingRing.capacity]u32 = undefined;
var instance_upload_vals: [FrameTimingRing.capacity]u32 = undefined;
var gpu_submit_vals: [FrameTimingRing.capacity]u32 = undefined;
var total_vals: [FrameTimingRing.capacity]u32 = undefined;
for (entries, 0..) |e, i| {
snapshot_vals[i] = e.snapshot_us;
row_rebuild_vals[i] = e.row_rebuild_us;
atlas_upload_vals[i] = e.atlas_upload_us;
instance_upload_vals[i] = e.instance_upload_us;
gpu_submit_vals[i] = e.gpu_submit_us;
total_vals[i] = e.total();
}
return .{
.snapshot = computeSectionStats(snapshot_vals[0..n]),
.row_rebuild = computeSectionStats(row_rebuild_vals[0..n]),
.atlas_upload = computeSectionStats(atlas_upload_vals[0..n]),
.instance_upload = computeSectionStats(instance_upload_vals[0..n]),
.gpu_submit = computeSectionStats(gpu_submit_vals[0..n]),
.total = computeSectionStats(total_vals[0..n]),
.frame_count = n,
};
}
fn printFrameStats(stats: FrameTimingStats) void {
const stderr = std.io.getStdErr().writer();
stderr.print(
\\
\\=== waystty frame timing ({d} frames) ===
\\{s:<20}{s:>6}{s:>6}{s:>6}{s:>6} (us)
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\----------------------------------------------------
\\{s:<20}{d:>6}{d:>6}{d:>6}{d:>6}
\\
, .{
stats.frame_count,
"section", "min", "avg", "p99", "max",
"snapshot", stats.snapshot.min, stats.snapshot.avg, stats.snapshot.p99, stats.snapshot.max,
"row_rebuild", stats.row_rebuild.min, stats.row_rebuild.avg, stats.row_rebuild.p99, stats.row_rebuild.max,
"atlas_upload", stats.atlas_upload.min, stats.atlas_upload.avg, stats.atlas_upload.p99, stats.atlas_upload.max,
"instance_upload", stats.instance_upload.min, stats.instance_upload.avg, stats.instance_upload.p99, stats.instance_upload.max,
"gpu_submit", stats.gpu_submit.min, stats.gpu_submit.avg, stats.gpu_submit.p99, stats.gpu_submit.max,
"total", stats.total.min, stats.total.avg, stats.total.p99, stats.total.max,
}) catch |err| {
std.log.debug("failed to print frame stats: {}", .{err});
};
}
```
- [ ] **Step 4: Run test to verify it passes**
Run: `zig build test 2>&1 | tail -5`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add src/main.zig
git commit -m "Add frame timing stats computation and formatting"
```
### Task 3: Add SIGUSR1 signal handler
**Files:**
- Modify: `src/main.zig`
- [ ] **Step 1: Add the signal flag and handler**
Add below the `FrameTimingRing` and stats code in `src/main.zig`:
```zig
var sigusr1_received: std.atomic.Value(bool) = std.atomic.Value(bool).init(false);
fn sigusr1Handler(_: c_int) callconv(.c) void {
sigusr1_received.store(true, .release);
}
fn installSigusr1Handler() void {
const act = std.posix.Sigaction{
.handler = .{ .handler = sigusr1Handler },
.mask = std.posix.sigemptyset(),
.flags = .{ .RESTART = true },
};
std.posix.sigaction(std.posix.SIG.USR1, &act, null);
}
```
- [ ] **Step 2: Wire into runTerminal**
In `runTerminal`, right before the `// === main loop ===` comment (line 205), add:
```zig
// === frame timing ===
var frame_ring = FrameTimingRing{};
installSigusr1Handler();
```
Inside the main loop, right after `clearConsumedDirtyFlags` (line 534), add:
```zig
// Check for SIGUSR1 stats dump request
if (sigusr1_received.swap(false, .acq_rel)) {
printFrameStats(computeFrameStats(&frame_ring));
}
```
Right after the main loop (after the `while` block ends, before `_ = try ctx.vkd.deviceWaitIdle`), add:
```zig
// Dump timing stats on exit
printFrameStats(computeFrameStats(&frame_ring));
```
- [ ] **Step 3: Verify it compiles**
Run: `zig build 2>&1 | tail -5`
Expected: BUILD SUCCESS (no test run needed — signal handling is not unit-testable)
- [ ] **Step 4: Commit**
```bash
git add src/main.zig
git commit -m "Add SIGUSR1 handler for frame timing stats dump"
```
### Task 4: Wire section timers into the render loop
**Files:**
- Modify: `src/main.zig`
This task wraps each render-loop section with `std.time.Timer` and pushes a `FrameTiming` entry after each rendered frame.
- [ ] **Step 1: Add timer helper**
Add near the other helper functions in `src/main.zig`:
```zig
fn usFromTimer(timer: std.time.Timer) u32 {
const ns = timer.read();
const us = ns / std.time.ns_per_us;
return std.math.cast(u32, us) orelse std.math.maxInt(u32);
}
```
- [ ] **Step 2: Instrument the render loop**
In `runTerminal`, replace the render section. The existing code between `// === render ===` (line 357) and `clearConsumedDirtyFlags` (line 534) gets wrapped with timers. Add a `var frame_timing: FrameTiming = .{};` before `// === render ===` and instrument each section:
**snapshot section** — wrap `try term.snapshot();` (line 359):
```zig
var frame_timing: FrameTiming = .{};
// === render ===
const previous_cursor = term.render_state.cursor;
var section_timer = std.time.Timer.start() catch unreachable;
try term.snapshot();
frame_timing.snapshot_us = usFromTimer(section_timer);
```
**row_rebuild section** — wrap the dirty-row rebuild loop (the `var rows_rebuilt` through cursor rebuild blocks):
```zig
section_timer = std.time.Timer.start() catch unreachable;
```
Right before `// Re-upload atlas if new glyphs were added` (line 452):
```zig
frame_timing.row_rebuild_us = usFromTimer(section_timer);
```
**atlas_upload section** — wrap the atlas upload block:
```zig
section_timer = std.time.Timer.start() catch unreachable;
// Re-upload atlas if new glyphs were added
if (atlas.dirty) {
try ctx.uploadAtlas(atlas.pixels);
atlas.dirty = false;
render_cache.layout_dirty = true;
}
frame_timing.atlas_upload_us = usFromTimer(section_timer);
```
**instance_upload section** — wrap the upload plan + upload blocks:
```zig
section_timer = std.time.Timer.start() catch unreachable;
```
Right before `const baseline_coverage = renderer.coverageVariantParams(.baseline);` (line 517):
```zig
frame_timing.instance_upload_us = usFromTimer(section_timer);
```
**gpu_submit section** — wrap `ctx.drawCells(...)`:
```zig
section_timer = std.time.Timer.start() catch unreachable;
const baseline_coverage = renderer.coverageVariantParams(.baseline);
ctx.drawCells(
render_cache.total_instance_count,
.{ @floatFromInt(cell_w), @floatFromInt(cell_h) },
default_bg,
baseline_coverage,
) catch |err| switch (err) {
error.OutOfDateKHR => {
_ = try ctx.vkd.deviceWaitIdle(ctx.device);
const buf_w = window.width * @as(u32, @intCast(geom.buffer_scale));
const buf_h = window.height * @as(u32, @intCast(geom.buffer_scale));
try ctx.recreateSwapchain(buf_w, buf_h);
render_pending = true;
continue;
},
else => return err,
};
frame_timing.gpu_submit_us = usFromTimer(section_timer);
```
**Push timing entry** — right after the gpu_submit timer read, before `clearConsumedDirtyFlags`:
```zig
frame_ring.push(frame_timing);
```
- [ ] **Step 3: Verify it compiles**
Run: `zig build 2>&1 | tail -5`
Expected: BUILD SUCCESS
- [ ] **Step 4: Run tests to verify nothing broke**
Run: `zig build test 2>&1 | tail -5`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add src/main.zig
git commit -m "Instrument render loop with per-section frame timers"
```
### Task 5: Add WAYSTTY_BENCH shell override
**Files:**
- Modify: `src/main.zig`
- [ ] **Step 1: Replace the shell selection block**
In `runTerminal`, the current shell selection code (lines 190-194) is:
```zig
const shell: [:0]const u8 = blk: {
const shell_env = std.posix.getenv("SHELL") orelse "/bin/sh";
break :blk try alloc.dupeZ(u8, shell_env);
};
defer alloc.free(shell);
```
Replace it with:
```zig
const shell: [:0]const u8 = blk: {
if (std.posix.getenv("WAYSTTY_BENCH") != null) {
break :blk try alloc.dupeZ(u8, "/bin/sh");
}
const shell_env = std.posix.getenv("SHELL") orelse "/bin/sh";
break :blk try alloc.dupeZ(u8, shell_env);
};
defer alloc.free(shell);
const bench_script: ?[:0]const u8 = if (std.posix.getenv("WAYSTTY_BENCH") != null)
"echo warmup; sleep 0.2; seq 1 50000; find /usr/lib -name '*.so' 2>/dev/null | head -500; yes 'hello world' | head -2000; exit 0"
else
null;
```
- [ ] **Step 2: Pass bench script as shell arg when set**
Replace the `pty.Pty.spawn` call (line 196) with:
```zig
var p = try pty.Pty.spawn(.{
.cols = cols,
.rows = rows,
.shell = shell,
.shell_args = if (bench_script) |script| &.{ "-c", script } else null,
});
```
- [ ] **Step 3: Update pty.zig to accept shell_args**
In `src/pty.zig`, modify the `SpawnOptions` struct (line 18) to add `shell_args`:
```zig
pub const SpawnOptions = struct {
cols: u16,
rows: u16,
shell: [:0]const u8,
shell_args: ?[]const [:0]const u8 = null,
};
```
In the `spawn` function, replace the `argv` construction (line 40) with:
```zig
if (opts.shell_args) |args| {
std.debug.assert(args.len < 15); // argv[0] = shell, must fit in 16-slot buffer
var argv_buf: [16:null]?[*:0]const u8 = .{null} ** 16;
argv_buf[0] = opts.shell.ptr;
for (args, 1..) |arg, i| {
argv_buf[i] = arg.ptr;
}
std.posix.execveZ(opts.shell.ptr, &argv_buf, std.c.environ) catch {};
} else {
var argv = [_:null]?[*:0]const u8{ opts.shell.ptr, null };
std.posix.execveZ(opts.shell.ptr, &argv, std.c.environ) catch {};
}
```
- [ ] **Step 4: Verify it compiles**
Run: `zig build 2>&1 | tail -5`
Expected: BUILD SUCCESS
- [ ] **Step 5: Run tests**
Run: `zig build test 2>&1 | tail -5`
Expected: PASS
- [ ] **Step 6: Commit**
```bash
git add src/main.zig src/pty.zig
git commit -m "Add WAYSTTY_BENCH env var for bench workload"
```
### Task 6: Create Makefile with bench and profile targets
**Files:**
- Create: `Makefile`
- [ ] **Step 1: Create the Makefile**
Create `Makefile` in the project root:
```makefile
ZIG ?= zig
FLAMEGRAPH ?= flamegraph.pl
STACKCOLLAPSE ?= stackcollapse-perf.pl
.PHONY: build run test bench profile clean
build:
$(ZIG) build
run: build
$(ZIG) build run
test:
$(ZIG) build test
zig-out/bin/waystty: $(wildcard src/*.zig) $(wildcard shaders/*)
$(ZIG) build
bench: zig-out/bin/waystty
WAYSTTY_BENCH=1 ./zig-out/bin/waystty 2>bench.log || true
@echo "--- frame timing ---"
@grep -A 12 "waystty frame timing" bench.log || echo "(no timing data found)"
profile:
$(ZIG) build -Doptimize=ReleaseSafe
perf record -g -F 999 --no-inherit -o perf.data -- \
sh -c 'WAYSTTY_BENCH=1 ./zig-out/bin/waystty 2>bench.log'
perf script -i perf.data \
| $(STACKCOLLAPSE) \
| $(FLAMEGRAPH) > flamegraph.svg
@echo "--- frame timing ---"
@grep -A 12 "waystty frame timing" bench.log || echo "(no timing data found)"
xdg-open flamegraph.svg
clean:
rm -rf zig-out .zig-cache perf.data bench.log flamegraph.svg
```
- [ ] **Step 2: Verify bench target syntax**
Run: `make -n bench`
Expected: prints the commands that would run (dry run), no syntax errors.
- [ ] **Step 3: Verify profile target syntax**
Run: `make -n profile`
Expected: prints the commands that would run (dry run), no syntax errors.
- [ ] **Step 4: Commit**
```bash
git add Makefile
git commit -m "Add Makefile with bench and profile targets"
```
### Task 7: Full verification
**Files:**
- Test: `src/main.zig`, `src/pty.zig`
- [ ] **Step 1: Run the full test suite**
Run: `zig build test`
Expected: PASS
- [ ] **Step 2: Manual smoke test — normal run**
Run: `zig build run`
Expected:
- Terminal opens and works normally.
- On Ctrl+D / exit, frame timing stats print to stderr.
- [ ] **Step 3: Manual smoke test — SIGUSR1**
In one terminal: `zig build run`
In another terminal: `kill -USR1 $(pgrep waystty)`
Expected: frame timing stats print to stderr of the running waystty.
- [ ] **Step 4: Manual smoke test — bench**
Run: `make bench`
Expected:
- waystty opens, runs the bench workloads, exits.
- `bench.log` contains frame timing stats.
- Stats are printed to the console.
- [ ] **Step 5: Commit if any fixups were needed**
```bash
git add src/main.zig src/pty.zig Makefile
git commit -m "Fix verification issues for performance benchmarking"
```
## Self-Review
- **Spec coverage:**
- Ring buffer: Task 1
- Stats computation (min/avg/p99/max): Task 2
- SIGUSR1 trigger: Task 3
- Section timers: Task 4
- WAYSTTY_BENCH shell override: Task 5
- Makefile bench target: Task 6
- Makefile profile target: Task 6
- Clean exit stats dump: Task 3
- **Placeholder scan:** No TBD/TODO markers. All code blocks are complete.
- **Type consistency:**
- `FrameTiming` defined in Task 1, used in Tasks 2-4
- `FrameTimingRing` defined in Task 1, used in Tasks 2-4
- `computeFrameStats` defined in Task 2, called in Task 3
- `printFrameStats` defined in Task 2, called in Task 3
- `usFromTimer` defined in Task 4, used in Task 4
- `SpawnOptions.shell_args` added in Task 5, used in Task 5
- `sigusr1_received` and `installSigusr1Handler` defined in Task 3, used in Tasks 3-4