docs/superpowers/specs/2026-04-16-frame-callback-throttling-design.md

Ref: Size: 18.3 KiB
# Frame-Callback Throttling & Wayland Loop Rearchitecture

**Date:** 2026-04-16
**Status:** Approved design
**Motivation:** waystty freezes when its window is moved to a hidden sway workspace. Root cause: the render path makes blocking Vulkan WSI calls (`vkWaitForFences`, `vkAcquireNextImageKHR`) with infinite timeouts while the compositor holds swapchain buffers without releasing them on an unmapped surface. This is a symptom of a deeper architectural gap — waystty renders eagerly on dirty state rather than pacing via `wl_surface.frame` callbacks, which is the canonical Wayland pattern.

## Goals

1. **Eliminate the freeze.** waystty must stay responsive (keyboard, pty, pointer, clipboard) when its surface is hidden or suspended.
2. **Follow the canonical Wayland client pattern.** Render only when the compositor signals readiness via frame callback. Treat compositor signals (enter/leave, configure states, frame callbacks) as authoritative for visibility and pacing.
3. **Consolidate duplication.** Today there are four near-identical Wayland loops (main terminal, text-coverage compare, draw-smoke, benchmark). They've already drifted. Extract a shared readiness primitive.
4. **Preserve testability.** The pacing logic should be unit-testable without a live compositor.
5. **Preserve benchmark validity.** Per-frame section timings (`snapshot_us`, `row_rebuild_us`, `atlas_upload_us`, `instance_upload_us`, `gpu_submit_us`) remain identical under the throttle. Wall-clock aggregates become vsync-capped; an opt-out env var (`WAYSTTY_BENCH_UNTHROTTLED=1`) preserves raw-throughput measurement at the cost of freeze-safety.

## Non-goals

- Rewriting the terminal state machine, selection model, or PTY handling.
- Adding damage tracking (partial commits) — out of scope, but this rearchitecture doesn't foreclose it.
- Multithreaded rendering.
- Adopting `wp_fractional_scale_v1` / `wp_viewporter` — integer scales only (matches current behavior; see MEMORY notes).
- `wp_presentation_time` feedback — bench measurements rely on per-frame section timings, not presentation latency.
- `cursor-shape-v1`, `xdg-activation-v1`, `xdg-decoration` — none present today, none introduced here.

## Architecture

Three layers, each independently testable:

```
┌───────────────────────────────────────────────┐
│  app code (main, text-compare, smoke, bench)  │  mutates state, owns render fn
└───────────────────────────────────────────────┘
                        │ uses
┌───────────────────────────────────────────────┐
│  FrameLoop (new: src/frame_loop.zig)          │  pacing + readiness
│  - pending_callback, armed                    │
│  - waitForWork, canRender, commitRender       │
└───────────────────────────────────────────────┘
                        │ uses
┌───────────────────────────────────────────────┐
│  Surface lifecycle (wayland.zig, extended)    │  configure, enter/leave, suspended
│  - SurfaceState { configured, suspended,      │
│                   pending_configure, tracker }│
└───────────────────────────────────────────────┘
```

**Key invariants:**

- `FrameLoop` never touches Vulkan or terminal state. It is a pure readiness primitive over a wl_display + wl_surface.
- `canRender()` returns `armed && state.visible()`, where `state.visible() == configured && !suspended && entered_outputs > 0`. All four conditions gate every render.
- `commitRender()` is called by the app *after* it has committed the surface with new content. It requests the next `wl_surface.frame()` callback and flips `armed` to false.
- State transitions that change visibility (enter/leave, configure, suspended flag) notify the FrameLoop via `onSurfaceHidden()` / `onSurfaceShown()`. These update the pending-callback bookkeeping but do not force a render.

## Components

### `src/frame_loop.zig` (new, ~200 LOC)

```zig
pub const FrameLoop = struct {
    display: *wl.Display,
    surface: *wl.Surface,
    state: *const SurfaceState,  // borrowed

    pending_callback: ?*wl.Callback = null,
    armed: bool = true,  // first render is unconditionally allowed

    pub fn init(display: *wl.Display, surface: *wl.Surface, state: *const SurfaceState) FrameLoop;
    pub fn deinit(self: *FrameLoop) void;  // destroys pending callback if any

    // Blocks until wl_display or any extra_fd is readable, or timeout_ms elapses.
    // Dispatches any wl events that arrive. Safe to call even if not armed —
    // visibility/state changes are still processed.
    pub fn waitForWork(self: *FrameLoop, extra: []std.posix.pollfd, timeout_ms: i32) !void;

    pub fn canRender(self: *const FrameLoop) bool;

    // Caller has already committed the surface with new content.
    // Requests the next frame callback, flips armed=false.
    pub fn commitRender(self: *FrameLoop) !void;

    // State-transition hooks called from Window listeners.
    // onSurfaceHidden drops the pending callback but leaves `armed` unchanged;
    // canRender() will still be false because state.visible() is false.
    // onSurfaceShown sets armed=true unconditionally (the compositor may or
    // may not redeliver a pre-hide callback; re-arming is idempotent).
    pub fn onSurfaceHidden(self: *FrameLoop) void;
    pub fn onSurfaceShown(self: *FrameLoop) void;

    // Recovery path for OUT_OF_DATE: no commit happened, so bypass the callback gate.
    pub fn forceArm(self: *FrameLoop) void;
};
```

### `src/wayland.zig` additions

```zig
pub const SurfaceState = struct {
    configured: bool = false,
    suspended: bool = false,
    tracker: *ScaleTracker,

    pub fn visible(self: *const SurfaceState) bool {
        return self.configured
            and !self.suspended
            and self.tracker.enteredCount() > 0;
    }
};

// Added to ScaleTracker:
pub fn enteredCount(self: *const ScaleTracker) usize;
```

**`ack_configure` stays inline** (as today at wayland.zig:1044). The earlier draft proposed deferring ack until the next render commit; on review, deferral adds protocol risk (configure serials pile up while hidden, some compositors treat long ack delays as hostile) without any real batching benefit. Inline ack on receipt is spec-compliant, well-tested in the current codebase, and makes `SurfaceState` simpler.

Listener changes:

- `xdgSurfaceListener` continues to call `ackConfigure` inline; additionally sets `state.configured = true` on first configure.
- `xdgToplevelListener.configure` scans `cfg.states` for `.suspended` and sets `state.suspended` accordingly.
- `surfaceListener.enter/leave` continues to update the scale tracker and additionally invokes the FrameLoop's `onSurfaceShown` / `onSurfaceHidden` when the visibility boolean transitions.
- `wm_base` bind bumped from version 5 → version 6 in `registryListener` (wayland.zig:1092). The `.suspended` state is a v6 feature; older compositors simply never set it, in which case frame callbacks alone still correctly throttle — belt-and-suspenders.
- `scale_generation` (wayland.zig:1055, :1060) is unchanged. Its purpose remains "was there a visibility/scale transition since last check" and it is still consumed only by the main loop. A scale change while hidden increments the counter; the deferred resize path (see below) picks it up when visible.
- `xdg_wm_base.ping` is already handled by the existing `wmBaseListener`. Because `FrameLoop.waitForWork` calls `dispatchPending` on every iteration regardless of `armed`, pings are answered promptly even while hidden.

### `src/main.zig` main-loop body

Shrinks from ~350 LOC to ~150. **All Vulkan-touching work is gated on `canRender()`** — not just `drawCells`, but also `deviceWaitIdle`, `recreateSwapchain`, and the `rebuildFaceForScale` path (which itself calls `deviceWaitIdle`). Without this gate, a configure arriving on a hidden surface would still trigger swapchain teardown/recreate and block on in-flight acquires held by the compositor.

Shape:

```zig
while (!window.should_close and pty.isChildAlive()) {
    try frame_loop.waitForWork(&extra_fds, repeat_timeout);
    applyPtyOutput(...);              // non-Vulkan: reads pty, updates term
    applyKeyboardEvents(...);         // non-Vulkan
    applyPointerEvents(...);          // non-Vulkan
    observeResize(...);               // non-Vulkan: records pending resize from
                                      //   listeners into local state
    if (!frame_loop.canRender()) continue;  // hidden: no Vulkan work at all
    if (dirty) {
        applyPendingResize(...);      // Vulkan: deviceWaitIdle + recreateSwapchain
                                      //   + rebuildFaceForScale if scale changed
        renderFrame(...) catch |err| switch (err) {
            error.OutOfDateKHR => {
                try ctx.recreateSwapchain(...);
                frame_loop.forceArm();
                continue;
            },
            else => return err,
        };
        try frame_loop.commitRender();
        dirty = false;
    }
}
```

`observeResize` detects `window.width/height/bufferScale` changes and records a "resize pending" flag + the new values; it does not call any Vulkan API. `applyPendingResize` performs the actual Vulkan work and runs only when `canRender()`. This separation guarantees no blocking call executes on a hidden surface.

Bench, text-compare, and draw-smoke modes follow the same structure with their own render function and extra_fds.

## Data flow

**Startup.** Registry roundtrip → wm_base v6 bound → outputs discovered → window created → initial empty surface commit → `xdg_surface.configure` stores serial → first loop iteration: `armed=true`, `configured=true`, `entered>0` once the compositor places the surface on an output → first render commits with content → `commitRender` requests the first frame callback.

**Steady-state typing.** pty_fd readable → term.write → dirty=true. `waitForWork` returns. `canRender` false while waiting on callback — loop re-enters wait. `wl_callback.done` fires → `armed=true`. Next iteration renders, commits, re-arms.

**Workspace hidden.** sway sends `wl_surface.leave` for all outputs (and/or `xdg_toplevel.configure` with `.suspended` on v6). Window listener detects visibility transition and calls `frame_loop.onSurfaceHidden()`, which destroys the pending callback. pty activity continues; dirty flips to true repeatedly; `canRender` is false every iteration (state.visible()=false). No Vulkan calls. Loop remains responsive to keyboard/pointer/pty/clipboard.

**Workspace visible again.** `wl_surface.enter` (and/or suspended cleared). Window listener calls `frame_loop.onSurfaceShown()` → `armed=true`. Next iteration renders and requests a fresh callback. Normal pacing resumes.

**Resize while awaiting callback.** `xdg_surface.configure` is acked inline by the listener; `xdg_toplevel.configure` updates width/height. `observeResize` records a pending resize; dirty flips true. `canRender` is false (callback still pending) — wait. Callback fires → `canRender` true → `applyPendingResize` runs Vulkan work → `renderFrame` → `commitRender`.

**Configure while hidden.** Listener acks inline (no-op from the client's perspective beyond sending the ack). `observeResize` records the new dimensions. `canRender` is false — no Vulkan work runs. When the surface becomes visible again, `applyPendingResize` executes, catching up the swapchain/atlas to the compositor-configured size in one go before the next render.

**OUT_OF_DATE.** `renderFrame` returns `error.OutOfDateKHR`. No commit happened; `deviceWaitIdle` + `recreateSwapchain`; `forceArm()`; keep `dirty=true`; `continue`. Next iteration re-renders at new swapchain dims.

## Error handling

- **Wayland disconnect / protocol error.** `readEvents` / `dispatchPending` errors propagate up. Main loop treats as fatal (same failure class as child pty death). No recovery.
- **Vulkan OUT_OF_DATE / SUBOPTIMAL.** Handled in the app render function: rebuild swapchain, `forceArm`, retry. SUBOPTIMAL treated as OUT_OF_DATE.
- **Pending callback when surface hidden.** `onSurfaceHidden` destroys the orphaned wl_callback client-side. Compositor won't fire it. Prevents a stale callback from firing after re-arm.
- **Hidden→visible transition.** `onSurfaceShown` unconditionally sets `armed=true`. We can't know whether a callback is queued for us; re-arming is idempotent from the compositor's view.
- **Stale frame callbacks.** `wl_callback.destroy()` is client-side; the compositor may still deliver a `done` event for a destroyed callback if it was already queued on the wire. The `wl_callback.done` listener therefore must verify identity before acting:
  ```zig
  fn frameCallbackListener(cb: *wl.Callback, _: wl.Callback.Event, loop: *FrameLoop) void {
      if (loop.pending_callback != cb) return;  // stale — we already moved on
      cb.destroy();
      loop.pending_callback = null;
      loop.armed = true;
  }
  ```
- **FrameLoop deinit.** Destroys pending callback before surface teardown. Order: `FrameLoop.deinit → Window.deinit`.
- **Thread safety.** Single-threaded. All mutation happens in the main thread; listeners run synchronously inside `dispatchPending`.

## Testing

### Unit tests — `FrameLoop`

FrameLoop is parameterized over a `DisplayOps` trait (fn pointers for `prepareRead`, `readEvents`, `dispatchPending`, `flush`, `surface.frame`, and `callback.setListener`). Production wraps `*wl.Display` + `*wl.Surface` + `*wl.Callback` thinly; tests inject a mock that synthesizes `done` events by calling the stored listener directly. Budget: ~80 LOC of indirection including a `MockCallback` shim type (zig-wayland's `*wl.Callback` is an opaque concrete type, not an interface, so the mock's callback handle is a separate type that satisfies the trait). Unlocks:

- `initial state: armed, no pending callback`
- `commitRender stores pending callback and flips armed=false`
- `simulated callback.done flips armed=true and clears pending`
- `onSurfaceHidden destroys pending callback without firing`
- `onSurfaceShown force-arms regardless of previous state`
- `canRender requires armed && state.visible()`
- `forceArm bypasses the callback gate`

### Unit tests — `SurfaceState` / `ScaleTracker`

- `visible requires configured && !suspended && enteredCount > 0`
- `suspended set/cleared based on xdg_toplevel.configure.states`
- `enteredCount reflects entered-output set` (extends existing tracker tests)

### Integration test — hidden-freeze regression

Two forms, both supported:

1. **Synthetic automated variant** (no compositor required). Uses the mock `DisplayOps` from the unit tests. Simulates: startup → commit frame → simulated `onSurfaceHidden` → pty write loop (100 iterations, asserts loop body executes each time without blocking) → simulated `onSurfaceShown` → simulated `wl_callback.done` → assert `canRender()` true → simulated render. Runs in the normal `zig build test` pass.

2. **Manual mode** under sway with two workspaces. Opt-in mode `--hidden-freeze-regression` (gated like `--text-coverage-compare`): spawns waystty, prints "move this window to another workspace; I will flood pty for 5s; move it back", waits for stdin confirmation, runs the flood, exits 0 on responsiveness. Documented in the mode's help text.

### Preserved coverage

All existing tests in `main.zig`, `wayland.zig`, `scale_tracker.zig`, `vt.zig`, `pty.zig` continue to pass unmodified. The dirty-row / selection / PTY / VT paths are untouched — only the driver around them moves.

### Benchmark mode

Per-frame section timings (`snapshot_us`, `row_rebuild_us`, `atlas_upload_us`, `instance_upload_us`, `gpu_submit_us`) measure work *inside* a frame and stay identical under the throttle — they're load-independent. Aggregate wall-clock numbers ("frames per wall-second", average end-to-end loop time) become vsync-capped at ~60–144 Hz and lose their value for comparing changes that only affect per-frame cost.

Escape hatch: `WAYSTTY_BENCH_UNTHROTTLED=1` bypasses `FrameLoop` in bench mode and reverts to today's eager loop for wall-clock measurements. Unthrottled bench is explicitly not freeze-safe — workspace-change during an unthrottled bench will still deadlock. Benchmark output includes a header line indicating which mode was used.

## Scope

Files touched:

- `src/frame_loop.zig` (new, ~200 LOC + tests)
- `src/wayland.zig` (+ `SurfaceState`, + `enteredCount`, listener changes, wm_base v5 → v6) — ~100 LOC delta
- `src/scale_tracker.zig` (+ `enteredCount`) — ~10 LOC delta
- `src/main.zig` — four loop sites refactored, ~500 LOC net delta (mostly reduction)

No changes to: `src/renderer.zig`, `src/vt.zig`, `src/pty.zig`, `src/font.zig`, `src/config.zig`, shaders.

## Rollout

Single-commit change is too large. Plan to split into ordered steps:

1. Add `SurfaceState` + `enteredCount` + tests (no behavior change yet).
2. Bump wm_base to v6; add `suspended` handling.
3. Introduce FrameLoop module + tests (not yet used).
4. Migrate main terminal loop to FrameLoop; verify manual test (switch workspaces, no freeze).
5. Migrate text-coverage-compare loop.
6. Migrate draw-smoke loop.
7. Migrate benchmark loop; verify bench output still readable.

Each step compiles and passes tests. Step 4 is the earliest point the freeze is fixed; 5–7 complete the duplication cleanup.

## Open questions

None as of approval. All design decisions resolved in brainstorming + review:

- Gating signal: pure frame-callback + visibility flags, no polling fallback.
- Benchmark: throttled by default; `WAYSTTY_BENCH_UNTHROTTLED=1` opts out (not freeze-safe).
- Configure ack: inline on receipt (reverted from an earlier "defer to commit" draft — deferral added protocol risk without benefit).
- xdg_toplevel `.suspended`: added as secondary signal, not required (compositors without v6 still handled by frame callbacks).
- All Vulkan work (including `deviceWaitIdle`, `recreateSwapchain`, `rebuildFaceForScale`) gated on `canRender()`, not just `drawCells`.
- `wl_callback.done` listener verifies callback identity before acting, to defend against stale callbacks already on the wire at destroy time.