a73x

docs/superpowers/specs/2026-04-10-incremental-atlas-upload-design.md

Ref:   Size: 6.2 KiB

# Incremental Atlas Upload Design

## Goal

Reduce atlas upload cost from full-texture re-upload (~1.7ms avg, 3.6ms peak) to near-zero for steady-state frames by uploading only new glyph rows and precomputing the common ASCII set at startup.

## Current Problem

Every time a new glyph is rasterized into the atlas, `uploadAtlas` re-uploads the entire atlas texture (1024x1024 = 1MB at 1x, 2048x2048 = 4MB at 2x) through a freshly allocated staging buffer, transitions the image layout from `UNDEFINED` (discarding GPU cache), and calls `queueWaitIdle` (CPU stall). Bench data shows this is 61% of average frame time.

## Two Complementary Changes

### 1. Atlas precomputation

Rasterize printable ASCII (codepoints 32–126, 95 characters) into the atlas at startup, before the first frame renders. Do a single full upload of the warm atlas. This eliminates the cold-start spike entirely — most terminal content uses only these characters.

### 2. Incremental upload

For glyphs added after startup (Unicode, CJK, symbols), upload only the new rows instead of the entire texture.

## Dirty-Region Tracking

Add two fields to the `Atlas` struct:
- `last_uploaded_y: u32` — initialized to 0. Tracks how far up the GPU atlas is known-good.
- `needs_full_upload: bool` — initialized to `true`. Set to `true` by `init()` and `reset()`. Cleared after a full upload completes.

The dirty region is always a horizontal band spanning the full atlas width:
- `y_start` = `last_uploaded_y`
- `y_end` = `cursor_y + row_height`

After a successful upload, set `last_uploaded_y = cursor_y` (NOT `cursor_y + row_height`). This ensures the in-progress row is always re-uploaded on the next frame if new glyphs are added to it at new X positions. The cost of re-uploading one row (~20KB for a 20px row in a 1024-wide atlas) is negligible.

Once the packing cursor wraps to a new row, `cursor_y` advances past the previously uploaded row, and those rows are never re-uploaded again.

On `reset()` (DPI/scale change), set `last_uploaded_y = 0` and `needs_full_upload = true`.

If `y_start == y_end`, skip the upload and clear `atlas.dirty` — no pixels actually changed.

## Renderer Changes

Replace `uploadAtlas(pixels)` with `uploadAtlasRegion(pixels, y_start, y_end, full)`:

### Persistent staging buffer

Allocate once at `Context.init`, sized to hold the full atlas (1024x1024 = 1MB, fixed regardless of DPI). Reuse across frames. Free at `Context.deinit`. No per-frame alloc/free.

### Partial staging copy

Only copy the dirty band of pixels into the staging buffer. Byte range: `y_start * atlas_width` to `y_end * atlas_width`.

### Layout transition preserves contents

- Incremental upload: `SHADER_READ_ONLY_OPTIMAL → TRANSFER_DST_OPTIMAL` (preserves existing GPU data)
- Full upload (after reset or initial): `UNDEFINED → TRANSFER_DST_OPTIMAL` (discards, no preservation needed)

The `needs_full_upload` flag controls which transition is used.

### Post-copy barrier

After the `BufferImageCopy`, transition back: `TRANSFER_DST_OPTIMAL → SHADER_READ_ONLY_OPTIMAL`. This is required for both full and incremental uploads (same as the existing code).

### Partial image copy

The `BufferImageCopy` region targets only the dirty rows:
- `image_offset = { .x = 0, .y = y_start, .z = 0 }`
- `image_extent = { .width = atlas_width, .height = y_end - y_start, .depth = 1 }`

### Remove queueWaitIdle

Replace with a dedicated transfer fence. At the start of `uploadAtlasRegion`, if a prior transfer fence is unsignaled, wait on it before writing to the staging buffer or re-recording the command buffer. This prevents corruption if two uploads happen in consecutive frames. After submitting the transfer command, signal the fence.

This is still a win over `queueWaitIdle` because the fence only waits on the single transfer command, not the entire graphics queue.

## Caller-Side Wiring (main.zig)

### Startup precompute

After `Atlas.init` and before the main loop, rasterize codepoints 32–126 into the atlas, then do a single full upload via the existing `uploadAtlas` path.

### Render loop

Replace:
```zig
if (atlas.dirty) {
    try ctx.uploadAtlas(atlas.pixels);
    atlas.dirty = false;
    render_cache.layout_dirty = true;
}
```

With:
```zig
if (atlas.dirty) {
    const y_start = atlas.last_uploaded_y;
    const y_end = atlas.cursor_y + atlas.row_height;
    if (y_start < y_end) {
        try ctx.uploadAtlasRegion(
            atlas.pixels,
            y_start,
            y_end,
            atlas.needs_full_upload,
        );
        atlas.last_uploaded_y = atlas.cursor_y;
        atlas.needs_full_upload = false;
        render_cache.layout_dirty = true;
    }
    atlas.dirty = false;
}
```

## Files Changed

- `src/font.zig` — add `last_uploaded_y` and `needs_full_upload` fields to `Atlas`, reset them in `reset()`
- `src/renderer.zig` — add persistent staging buffer, `uploadAtlasRegion` method, dedicated transfer fence and command buffer
- `src/main.zig` — startup precompute loop, render-loop wiring change

## Testing

### Unit tests (font.zig)

- `last_uploaded_y` starts at 0 and `needs_full_upload` starts `true` after `init()`
- After inserting a glyph, dirty region is `0..cursor_y + row_height`
- After `reset()`, `last_uploaded_y` resets to 0 and `needs_full_upload` is `true`

### Unit tests (renderer.zig)

- `uploadAtlasRegion` byte offset/length calculation: `y_start * width` to `y_end * width`
- Full-upload flag selects `UNDEFINED` vs `SHADER_READ_ONLY` as the old layout

### Manual smoke tests

- Startup shows text correctly (precomputed atlas works)
- Typing a rare Unicode character (`echo "★"`) renders correctly (incremental upload works)
- DPI change still works (full re-upload after reset)
- `make bench` shows atlas_upload dropping from ~1700us to near-zero steady state

## Future Consideration

Precomputing box-drawing (U+2500–U+257F) and block element (U+2580–U+259F) characters would improve first-render for TUI apps like tmux, htop, and tree. Not needed for this phase — the incremental upload handles them correctly on first appearance.

## Non-Goals

- Atlas resizing (atlas is fixed at 1024x1024 regardless of DPI, returns `AtlasFull` error if exhausted)
- Double-buffered atlas images (overkill for a terminal)
- Async transfer queue (single queue is sufficient)