docs/superpowers/specs/2026-04-08-dirty-row-rendering-design.md
Ref: Size: 7.2 KiB
# Dirty-Row Rendering Design ## Goal Reduce terminal rendering cost for small updates by making CPU-side instance generation and GPU instance-buffer uploads proportional to changed rows instead of total grid size. ## Current Problem The event loop no longer spins at 60 Hz when idle, but any dirty terminal frame still does too much work: - `term.snapshot()` is followed by a full traversal of every visible cell. - The code rebuilds one flat `instances` array from scratch on every redraw. - The full instance buffer is uploaded to Vulkan even if only one row changed. This means a single character echo still pays near-full-frame CPU and buffer-copy cost. ## Chosen Approach Keep the existing single-pipeline, contiguous-instance-buffer renderer, but insert a row cache between `RenderState` and Vulkan uploads. The cache stores per-row instance slices plus a small cursor slice. On each redraw: 1. Snapshot terminal render state. 2. Inspect `term.render_state.dirty`. 3. Rebuild all cached rows only for `.full`, or rebuild only rows whose `row_data.items(.dirty)[y]` is `true` for `.partial`. 4. Repack cached row slices into one contiguous CPU-side instance array only when row lengths or ordering changed. 5. Upload only the byte ranges that changed in the GPU instance buffer when layout remains stable. 6. Draw using the existing `drawCells` path with one contiguous instance span. This keeps the current renderer architecture intact while removing most per-frame work for localized updates. ## Data Structures Add a render-cache layer in `main.zig`: - `RowInstanceCache` - `instances: std.ArrayListUnmanaged(renderer.Instance)` - `gpu_offset_instances: u32` - `gpu_len_instances: u32` - `RenderCache` - `rows: []RowInstanceCache` - `cursor_instances: std.ArrayListUnmanaged(renderer.Instance)` - `packed_instances: std.ArrayListUnmanaged(renderer.Instance)` - `total_instance_count: u32` - `layout_dirty: bool` Rules: - Each terminal row owns its own instance list. - `gpu_offset_instances` tracks where that row currently lives inside the packed GPU buffer. - If rebuilding a row changes its instance count, mark `layout_dirty = true`. - Cursor instances stay separate so cursor movement or visibility changes can be handled without forcing row cache rebuilds. ## Update Flow ### Full redraw Triggered when: - `term.render_state.dirty == .full` - Grid dimensions changed - Atlas upload or swapchain recreation invalidates cached assumptions Behavior: - Rebuild every row cache from current render state. - Rebuild cursor cache. - Repack all rows plus cursor into `packed_instances`. - Upload the full packed buffer. - Refresh all row offsets and lengths. ### Partial redraw Triggered when `term.render_state.dirty == .partial`. Behavior: - Inspect `row_data.items(.dirty)`. - Rebuild only dirty row caches. - Rebuild cursor cache if cursor moved, changed visibility, or if old/new cursor rows are dirty. - If any rebuilt row changed instance count, set `layout_dirty = true`. - If `layout_dirty == true`, repack all rows and upload the full packed buffer. - Otherwise, overwrite only the changed row byte ranges and cursor byte range in the GPU instance buffer. ### Dirty-flag lifecycle Dirty flags are consumed by the renderer cache and must be cleared only after cache state has been brought in sync with the current render state. Rules: - Read `term.render_state.dirty` and `row_data.items(.dirty)` immediately after `term.snapshot()`. - Complete all required row-cache rebuilds, cursor-cache rebuilds, and any repack decisions before clearing flags. - Clear `term.render_state.dirty` and any consumed row dirty flags only after CPU cache state is updated successfully. - If a rebuild, repack, atlas insertion, or upload-preparation step fails, do not clear dirty flags for that frame. This prevents both permanent “always dirty” behavior and lost updates caused by clearing too early. ### No redraw If terminal state is not dirty and no window/swapchain event forces a frame, skip both cache work and draw submission. ## Renderer Changes `renderer.zig` gains a partial instance upload helper: - `uploadInstanceRange(offset_instances: u32, instances: []const Instance) !void` Behavior: - Ensure total buffer capacity still covers the highest written instance. - Map only the needed memory range. - Copy the provided slice into the instance buffer at `offset_instances * @sizeOf(Instance)`. - Unmap memory. Important fallback rule: - If capacity growth is required, `uploadInstanceRange` must not attempt to preserve prior GPU contents implicitly. - Buffer growth forces `layout_dirty = true`, followed by full repack and a full-buffer upload from `packed_instances`. - Partial range uploads are only valid when the underlying GPU buffer remains allocated and existing offsets stay valid. Keep `uploadInstances` for full-buffer writes. No pipeline or draw-call structure changes are required. ## Row Rebuild Logic Extract the current per-cell emission logic into a reusable row builder: - `rebuildRowInstances(...)` Inputs: - row index - row cell data from `term.render_state` - font metrics - atlas - default background Outputs: - Replaces exactly one `RowInstanceCache.instances` - Returns whether the row length changed This preserves the existing glyph/background behavior and minimizes risk. ## Cursor Handling Cursor rendering should not force whole-frame rebuilds. Plan: - Track the previous cursor viewport position and visibility in the cache. - Rebuild `cursor_instances` each render pass where cursor state may have changed. - If cursor slice length changes, mark layout dirty. - Otherwise overwrite only its GPU range. ## Error Handling - If glyph atlas insertion occurs while rebuilding a row, continue using the existing atlas-dirty path. - Any atlas growth/upload marks the frame for a full packed upload after the atlas transfer succeeds. - Swapchain `OutOfDateKHR` keeps the current behavior: recreate swapchain and request a full redraw. ## Testing Add tests before implementation for: - Dirty-row planning logic chooses full rebuild for `.full`. - Dirty-row planning logic selects only flagged rows for `.partial`. - Layout remains stable when a rebuilt row keeps the same instance count. - Layout becomes dirty when a rebuilt row changes instance count. - Packing offsets remain contiguous after full repack. - Cursor-only updates rebuild cursor cache without requiring unrelated row rebuilds. - Instance-buffer growth during a partial update forces full repack/full upload fallback. - Dirty flags are cleared only after cache refresh planning succeeds. Verification after implementation: - `zig build test` - Manual smoke test: run shell, type in one line, confirm redraw responsiveness remains good. - Manual smoke test: scrolling and full-screen clears still redraw correctly. ## Non-Goals - Multi-draw rendering - Sparse GPU allocation per row - Shader or Vulkan pipeline changes - Scrollback virtualization changes ## Risks - The cursor can move without many rows being dirty, so cursor cache invalidation must be handled explicitly. - Row count changes on resize must fully rebuild caches and offsets. - If dirty flags are consumed incorrectly, rows may stop updating; tests should focus on rebuild planning and offset stability.