f5d1aa15

Add dirty-row rendering design spec
a73x 2026-04-08 18:12
diff --git a/docs/superpowers/specs/2026-04-08-dirty-row-rendering-design.md b/docs/superpowers/specs/2026-04-08-dirty-row-rendering-design.md
new file mode 100644
index 0000000..ff2cfcb
--- /dev/null
+++ b/docs/superpowers/specs/2026-04-08-dirty-row-rendering-design.md
@@ -0,0 +1,171 @@
# Dirty-Row Rendering Design

## Goal

Reduce terminal rendering cost for small updates by making CPU-side instance generation and GPU instance-buffer uploads proportional to changed rows instead of total grid size.

## Current Problem

The event loop no longer spins at 60 Hz when idle, but any dirty terminal frame still does too much work:

- `term.snapshot()` is followed by a full traversal of every visible cell.
- The code rebuilds one flat `instances` array from scratch on every redraw.
- The full instance buffer is uploaded to Vulkan even if only one row changed.

This means a single character echo still pays near-full-frame CPU and buffer-copy cost.

## Chosen Approach

Keep the existing single-pipeline, contiguous-instance-buffer renderer, but insert a row cache between `RenderState` and Vulkan uploads.

The cache stores per-row instance slices plus a small cursor slice. On each redraw:

1. Snapshot terminal render state.
2. Inspect `term.render_state.dirty`.
3. Rebuild all cached rows only for `.full`, or rebuild only rows whose `row_data.items(.dirty)[y]` is `true` for `.partial`.
4. Repack cached row slices into one contiguous CPU-side instance array only when row lengths or ordering changed.
5. Upload only the byte ranges that changed in the GPU instance buffer when layout remains stable.
6. Draw using the existing `drawCells` path with one contiguous instance span.

This keeps the current renderer architecture intact while removing most per-frame work for localized updates.

## Data Structures

Add a render-cache layer in `main.zig`:

- `RowInstanceCache`
  - `instances: std.ArrayListUnmanaged(renderer.Instance)`
  - `gpu_offset_instances: u32`
  - `gpu_len_instances: u32`
- `RenderCache`
  - `rows: []RowInstanceCache`
  - `cursor_instances: std.ArrayListUnmanaged(renderer.Instance)`
  - `packed_instances: std.ArrayListUnmanaged(renderer.Instance)`
  - `total_instance_count: u32`
  - `layout_dirty: bool`

Rules:

- Each terminal row owns its own instance list.
- `gpu_offset_instances` tracks where that row currently lives inside the packed GPU buffer.
- If rebuilding a row changes its instance count, mark `layout_dirty = true`.
- Cursor instances stay separate so cursor movement or visibility changes can be handled without forcing row cache rebuilds.

## Update Flow

### Full redraw

Triggered when:

- `term.render_state.dirty == .full`
- Grid dimensions changed
- Atlas upload or swapchain recreation invalidates cached assumptions

Behavior:

- Rebuild every row cache from current render state.
- Rebuild cursor cache.
- Repack all rows plus cursor into `packed_instances`.
- Upload the full packed buffer.
- Refresh all row offsets and lengths.

### Partial redraw

Triggered when `term.render_state.dirty == .partial`.

Behavior:

- Inspect `row_data.items(.dirty)`.
- Rebuild only dirty row caches.
- Rebuild cursor cache if cursor moved, changed visibility, or if old/new cursor rows are dirty.
- If any rebuilt row changed instance count, set `layout_dirty = true`.
- If `layout_dirty == true`, repack all rows and upload the full packed buffer.
- Otherwise, overwrite only the changed row byte ranges and cursor byte range in the GPU instance buffer.

### No redraw

If terminal state is not dirty and no window/swapchain event forces a frame, skip both cache work and draw submission.

## Renderer Changes

`renderer.zig` gains a partial instance upload helper:

- `uploadInstanceRange(offset_instances: u32, instances: []const Instance) !void`

Behavior:

- Ensure total buffer capacity still covers the highest written instance.
- Map only the needed memory range.
- Copy the provided slice into the instance buffer at `offset_instances * @sizeOf(Instance)`.
- Unmap memory.

Keep `uploadInstances` for full-buffer writes.

No pipeline or draw-call structure changes are required.

## Row Rebuild Logic

Extract the current per-cell emission logic into a reusable row builder:

- `rebuildRowInstances(...)`

Inputs:

- row index
- row cell data from `term.render_state`
- font metrics
- atlas
- default background

Outputs:

- Replaces exactly one `RowInstanceCache.instances`
- Returns whether the row length changed

This preserves the existing glyph/background behavior and minimizes risk.

## Cursor Handling

Cursor rendering should not force whole-frame rebuilds.

Plan:

- Track the previous cursor viewport position and visibility in the cache.
- Rebuild `cursor_instances` each render pass where cursor state may have changed.
- If cursor slice length changes, mark layout dirty.
- Otherwise overwrite only its GPU range.

## Error Handling

- If glyph atlas insertion occurs while rebuilding a row, continue using the existing atlas-dirty path.
- Any atlas growth/upload marks the frame for a full packed upload after the atlas transfer succeeds.
- Swapchain `OutOfDateKHR` keeps the current behavior: recreate swapchain and request a full redraw.

## Testing

Add tests before implementation for:

- Dirty-row planning logic chooses full rebuild for `.full`.
- Dirty-row planning logic selects only flagged rows for `.partial`.
- Layout remains stable when a rebuilt row keeps the same instance count.
- Layout becomes dirty when a rebuilt row changes instance count.
- Packing offsets remain contiguous after full repack.

Verification after implementation:

- `zig build test`
- Manual smoke test: run shell, type in one line, confirm redraw responsiveness remains good.
- Manual smoke test: scrolling and full-screen clears still redraw correctly.

## Non-Goals

- Multi-draw rendering
- Sparse GPU allocation per row
- Shader or Vulkan pipeline changes
- Scrollback virtualization changes

## Risks

- The cursor can move without many rows being dirty, so cursor cache invalidation must be handled explicitly.
- Row count changes on resize must fully rebuild caches and offsets.
- If dirty flags are consumed incorrectly, rows may stop updating; tests should focus on rebuild planning and offset stability.