a73x

docs/superpowers/specs/2026-04-12-commit-issue-link-design.md

Ref:   Size: 18.5 KiB

# Auto-Link Commits to Issues via `Issue:` Trailer

**Status:** design approved
**Date:** 2026-04-12

## Summary

During `git-collab sync`, walk all local branches and look for commits whose messages contain an `Issue: <prefix>` git trailer. For each such commit, resolve the prefix to an issue and emit a new `issue.commit_link` event on that issue's DAG. The operation is idempotent from a single machine: re-running sync against unchanged history produces no new events. Across machines, temporary duplication is possible if two contributors emit a link event for the same (issue, commit) before either has pushed; the renderer dedups by commit SHA so the timeline never shows the same link twice.

## Motivation

Today, commits and issues live in parallel. A contributor can mention an issue ID in a commit message, but nothing structured ties them together — the collab view of an issue has no idea which commits reference it. Auto-linking closes that gap without requiring any new command, UI, or manual step beyond writing a trailer.

Explicitly out of scope: auto-resolving or auto-closing issues from commits. This change records links only.

## User Experience

A contributor writes a commit message like:

```
Fix off-by-one in patch walker

The revwalk was skipping the final commit on branches that
diverged from main before the last sync.

Issue: a3f9
```

They run `git-collab sync`. Output includes:

```
Linked 1 commit(s) to issues.
```

Later, `git-collab issue show a3f9` shows a new line in the timeline:

```
· linked 4b2e1cd "Fix off-by-one in patch walker" by bob (linked by alice, 2h ago)
```

Showing both the commit author (`bob`) and the event author (`alice`) matters on shared checkouts and CI boxes: the person who runs sync isn't always the person who wrote the commit, and the timeline needs to reflect that honestly.

If the contributor forgets the trailer on the first commit but adds it on a later commit, running sync again picks up just the new one. Re-running sync against unchanged history does nothing and prints nothing.

## Architecture

One new module, one new event variant, one insertion point in the sync pipeline, one new case in the issue-timeline renderer.

### Data flow per sync

1. `sync()` completes `reconcile_refs("issues", ...)` and `reconcile_refs("patches", ...)` as today. After reconcile, the local issue DAGs contain every link event that has been pushed to the remote, so the dedup cache we're about to build reflects the merged remote state.
2. Before `collect_push_refs`, `sync()` calls `commit_link::scan_and_link(&repo, &author, &sk)`.
3. `scan_and_link` builds a revwalk seeded from every `refs/heads/*` tip, deduplicating shared ancestors in a per-sync `HashSet<Oid>`. If there are no local branches (e.g. detached HEAD immediately after `git checkout <sha>`) the revwalk is empty and `scan_and_link` is a silent no-op. Shallow clones are fine: the revwalk stops at the shallow boundary, so we link whatever is locally reachable, and deepening the clone and re-syncing picks up anything we missed.
4. For each commit, the module parses `Issue:` trailers from the commit message.
5. For each trailer value, it resolves the prefix via `state::resolve_issue_ref`. Unresolved or ambiguous prefixes become one-line stderr warnings and are skipped. If the resolved ref lives under `refs/collab/archive/issues/...`, the module warns and skips — archived issues are a frozen namespace and should not accrue new events.
6. For each resolved (non-archived) issue, the module lazily loads the issue's existing event log into a per-issue `HashSet<String>` of already-linked commit SHAs. If this commit's SHA is not in the set, it emits a new `IssueCommitLink { commit }` event via `dag::append_event` and adds the SHA to the set.
7. New events are picked up by the subsequent `collect_push_refs` and pushed to the remote in the same sync.

### Dedup — why two sets

**Revwalk dedup (`HashSet<Oid>`) — within a single sync.** Branches share ancestors. Without this set, `main`'s history would be walked once per branch tip. It's a perf optimization; correctness is unaffected.

**Per-issue linked-SHA dedup (`HashMap<IssueRef, HashSet<String>>`) — across sync runs.** A commit with `Issue: abc` will keep matching on every future sync — we need to not emit a new link event every time. The source of truth is the issue's own event DAG: if it already contains an `IssueCommitLink` for this commit, we skip. The first time a given issue is matched in a sync, we walk its DAG once to build the set, cache it, and use the cache for the rest of the sync. New emissions are inserted into the cache so two trailers pointing at the same issue in one sync don't both try to emit.

The cache key is the *resolved ref name* returned by `state::resolve_issue_ref`, not a reconstructed `refs/collab/issues/<id>` string. This matters because resolution may return either an active or archived ref; using the resolved name keeps the cache, `collect_linked_shas`, and `dag::append_event` all talking about the same ref. (Archived refs are then filtered out entirely per step 5 above; the cache keying is still the right shape for when active refs collide with archived ones.)

Net effect on a single machine: each (issue, commit) pair produces exactly one `issue.commit_link` event, ever, regardless of how many branches contain the commit or how often sync runs.

**Cross-machine duplication is possible and tolerated.** If Alice emits a link event that fails to push, and Bob syncs from a different machine before Alice's next sync, Bob's `collect_linked_shas` won't see Alice's pending event (it isn't on the remote yet), and Bob will emit a second link for the same (issue, commit). Both events eventually merge into the DAG. This is correct for the event model but would produce a duplicated timeline entry, so the renderer dedups by commit SHA at render time (see "Issue timeline renderer" below).

## Components

### `src/commit_link.rs` (new)

```rust
pub fn scan_and_link(
    repo: &Repository,
    author: &Author,
    sk: &SigningKey,
) -> Result<usize, Error>;  // returns count of events emitted

fn parse_issue_trailers(message: &str) -> Vec<String>;
fn collect_linked_shas(repo: &Repository, issue_ref: &str) -> Result<HashSet<String>, Error>;
```

`parse_issue_trailers` is pure and string-only: no git, no I/O. It implements the strict trailer-block rules in the "Trailer Parsing Rules" section below. Easy to unit-test with a table of fixtures.

`collect_linked_shas` walks the DAG for a single issue ref and returns every SHA mentioned in an existing `IssueCommitLink` event. Called lazily on first match per issue; result cached in `scan_and_link`'s local `HashMap` keyed by the *resolved ref name* (not a reconstructed path).

`scan_and_link` orchestrates: build a `git2::Revwalk`, push every `refs/heads/*` tip, iterate, dedup via `HashSet<Oid>`, per commit call `parse_issue_trailers`, for each trailer value call `state::resolve_issue_ref`, skip archived refs with a warning, populate the per-issue cache on demand (keyed by resolved ref name), and emit via `dag::append_event` using the same resolved ref name.

### `src/event.rs`

Add one variant to the `Action` enum (currently ending at line 105):

```rust
#[serde(rename = "issue.commit_link")]
IssueCommitLink { commit: String },
```

Serialization follows the existing `rename` pattern. No `alias` needed — this is a new type.

### `src/sync.rs`

Insert one call after `reconcile_refs(&repo, "patches", ...)` at line 365, before `collect_push_refs`:

```rust
match commit_link::scan_and_link(&repo, &author, &sk) {
    Ok(n) if n > 0 => println!("Linked {} commit(s) to issues.", n),
    Ok(_) => {}
    Err(e) => eprintln!("warning: commit link scan failed: {}", e),
}
```

Sync proceeds to `collect_push_refs` regardless of the result. Any events that were successfully emitted before a failure still get pushed in the same sync.

### Issue timeline renderer

Locate the existing match statement that renders events for `issue show`. Add an arm for `IssueCommitLink` that prints:

```
· linked <short-sha> "<commit subject>" by <commit-author> (linked by <event-author>, <rel-time>)
```

- `<short-sha>`: first 7 characters of the commit SHA.
- `<commit subject>`: first line of the commit message, truncated to 60 characters (with `…` on truncation).
- `<commit-author>`: `commit.author().name()` from the git commit object — the person who wrote the code.
- `<event-author>` and `<rel-time>`: from the event's existing `author` and `timestamp` fields — the person whose sync emitted the link event. On a single-user checkout these are usually the same; on shared checkouts and CI boxes they differ, and showing both is the honest rendering.

If the commit has been GC'd or isn't available locally at render time, fall back to:

```
· linked <short-sha> (commit <short-sha> not in local repo) (linked by <event-author>, <rel-time>)
```

**Render-time dedup.** Before rendering, collect `IssueCommitLink` events into a `HashMap<commit_sha, Event>` keyed by commit SHA, keeping the earliest (lowest `clock`, then earliest `timestamp`) event per key. Only the kept event is rendered. This absorbs the cross-machine duplication case described in the dedup section: two machines each emitting a link for the same (issue, commit) yields two DAG events but exactly one timeline line.

## Trailer Parsing Rules

**Strict trailer block, matching git's own definition.** Split the commit message on blank lines to get paragraphs. Take the final paragraph. The final paragraph is a trailer block **only if every non-empty line in it is trailer-shaped** — that is, matches `^[A-Za-z][A-Za-z0-9-]*\s*:\s*\S.*$` (a token, colon, non-empty value). If *any* line in the final paragraph is not trailer-shaped, the final paragraph is prose, not a trailer block, and we extract zero trailers from the message. If the whole message is one paragraph with no blank lines, the same rule applies to the whole message.

This rule deliberately diverges from "final paragraph contains any trailer-looking line" (which would false-positive on a paragraph like `Thanks to Bob for the catch.\nIssue: a3f9`) and aligns with `git interpret-trailers --parse` semantics. Contributors who want to link an issue must use a dedicated trailer block separated from prose by a blank line — the same convention they already use for `Signed-off-by:`.

**Line match within a confirmed trailer block** = `(?i)^\s*issue\s*:\s*(\S+)\s*$`. Only the exact key `Issue`, case-insensitive. The value must be a single non-whitespace token followed by optional trailing whitespace and end-of-line — nothing else. `Issue: abc fixes thing` does **not** match (the trailing ` fixes thing` is rejected). That's deliberate: anything that looks like a loose commentary on the issue should not silently become a prefix lookup that warns every sync forever. Not `Issues`, not `Fixes`, not `Closes`. Scope is deliberately tight; other keys can be added in follow-up work if needed.

**One issue prefix per line.** No comma-separated lists. If a contributor wants to link two issues, they write two `Issue:` lines. The `(\S+)` match enforces this at parse time.

**Multiple trailers on one commit** are all honored. Each resolves independently; each produces its own `IssueCommitLink` event on its respective issue.

**Resolution** goes through `state::resolve_issue_ref`, which already handles prefix matching and ambiguity:

- unknown prefix → `warning: commit <sha>: Issue: <val> — no such issue, skipping`
- ambiguous prefix → `warning: commit <sha>: Issue: <val> — ambiguous (matches N issues), skipping`
- resolved to an archived ref → `warning: commit <sha>: Issue: <val> — issue is archived, skipping`
- resolved to an active ref → proceed to dedup and emit

**Not parsed**: prose mentions in the commit body like "this fixes issue abc". Only the structured trailer form. That's the point of picking a trailer — no false positives from commit messages that happen to mention an issue ID.

## Error Handling

**Philosophy: `scan_and_link` must never break sync.** Commits with trailers are a nice-to-have; losing the ability to push collab state because of a parse hiccup would be a bad trade.

**Inside `scan_and_link`** — per-commit and per-issue errors are caught, turned into one-line stderr warnings prefixed `warning:`, and iteration continues:

- Commit object fails to load → warn, skip commit.
- Trailer resolves to unknown or ambiguous issue prefix → warn with commit SHA and trailer value, skip that trailer (other trailers on the same commit are still processed).
- Reading an issue's existing event log fails → warn, mark that issue as poisoned in the per-sync cache so we don't retry on every subsequent match *within this sync run*. Poisoning is not persisted; the next sync starts fresh and tries again.
- `dag::append_event` fails for one issue → warn with issue ID and commit SHA, continue. Other emissions in the same sync are unaffected.

`scan_and_link` returns `Result<usize, Error>` but the only errors that actually propagate are "couldn't even start" failures: repo handle invalid, HEAD missing, revwalk construction failed. Everything else is absorbed.

**At the sync call site**, the `Err` branch logs and continues. Sync reaches `collect_push_refs` regardless. Any link events that were emitted before the failure still get pushed.

**Warnings are one-line, `warning:` prefix, stderr**, matching the existing sync output style.

**Deliberately not handled:**
- Duplicate-warning suppression. An ambiguous or unknown prefix will warn on every sync until the contributor amends the trailer. That's the right incentive.
- Retrying failed dag writes. The next sync naturally retries via a cache miss.

## Testing

### Unit tests — `tests/commit_link_test.rs` (new)

Pure parser, table-driven over message → expected trailers:

- No trailer block → `[]`
- Single `Issue: abc` in a pure trailer-block final paragraph → `["abc"]`
- Case variants (`issue:`, `ISSUE :`, `  Issue:  abc  `) → `["abc"]`
- Two `Issue:` lines in a pure trailer block → `["abc", "def"]`
- `Issue:` in body but not the final paragraph → `[]`
- `Issues: abc` (wrong key) → `[]`
- Prose mention `"fixes issue abc"` → `[]`
- Message with no blank lines, the whole message is a pure trailer block containing `Issue: abc` → `["abc"]`
- Empty message → `[]`
- **Mixed final paragraph** (`Thanks to Bob for the catch.\nIssue: a3f9`) → `[]` — final paragraph contains a prose line, so it's not a trailer block and we extract nothing
- **Trailing whitespace-only final paragraph** after a real trailer block → `["abc"]` — empty trailing paragraphs don't shadow the previous trailer block
- **Pure trailer block with mixed keys** (`Signed-off-by: alice <a@x>\nIssue: abc`) → `["abc"]` — `Signed-off-by:` is trailer-shaped so the paragraph still qualifies as a trailer block
- **Value with trailing garbage** (`Issue: abc fixes thing`) → `[]` — `(\S+)` with anchored `\s*$` rejects the second token
- **Value with only whitespace** (`Issue:   `) → `[]` — no captured token

### Integration tests — `tests/sync_test.rs` (extending existing patterns)

1. **Happy path** — create issue `foo`, commit with `Issue: foo` trailer, run sync, assert the issue ref has one new `IssueCommitLink` event with the right SHA.
2. **Idempotency** — run sync twice against the same history, assert exactly one link event exists.
3. **Multiple branches sharing an ancestor** — same linked commit reachable from two branches, assert exactly one link event (exercises both dedup sets).
4. **Multi-issue trailer** — commit with two `Issue:` lines, assert both issues get one event each.
5. **Unknown prefix** — `Issue: zzz` where no issue matches, assert sync completes successfully, no event emitted, warning on stderr.
6. **Ambiguous prefix** — two issues sharing a prefix, trailer uses that prefix, assert no event, warning, sync still succeeds.
7. **Prior manual link then re-sync** — seed the issue's DAG with an `IssueCommitLink` for SHA X, add a new commit with `Issue: foo`, sync, assert only the new event is added.
8. **Remote-originated link dedup** — simulate a remote that already holds an `IssueCommitLink` for commit X (set up via a second repo, push, then clone/fetch into the test repo). Run sync locally with the same commit X present in local history. After `reconcile_refs` the merged log contains the remote event; assert `scan_and_link` sees it via `collect_linked_shas` and does not emit a duplicate.
9. **Archived issue** — archive an issue, add a commit with `Issue: <archived-id>`, sync, assert no event emitted, warning on stderr, archived issue's DAG unchanged, sync still succeeds.
10. **Detached HEAD** — check out a bare SHA with no branches pointing at it, add a commit with `Issue: foo` (orphaned), sync, assert no event emitted (revwalk from `refs/heads/*` is empty) and sync succeeds without error.
11. **Commit-author vs event-author rendering** — emit a link event where the commit author email differs from the git user running the test. Render the issue; assert the one-line format shows both names distinctly.
12. **Render-time dedup** — seed an issue's DAG with two `IssueCommitLink` events for the same commit SHA from different event authors. Render the issue; assert exactly one link line appears, showing the earliest event.

### Renderer

Add a case to the existing `issue show` event-rendering test. Assert the one-line format renders as specified. The exact test file will be located during implementation.

### Not tested

- Performance on very large histories. The O(history) walk is an accepted cost per the depth-handling decision: parsing is microsecond-cheap, and only commits with actual trailers pay any I/O cost beyond that.
- Concurrent syncs. Already covered by the existing `SyncLock` in `sync.rs`.

## Alternatives Considered

1. **Commit-msg hook for immediate linking.** Rejected: requires per-machine install via `git-collab init`, fails open when not installed, and duplicates the sync path for a feature that naturally belongs at publish time.
2. **Persistent scan cursor (last-scanned OID per branch) to bound re-walk work.** Rejected as speculative optimization. Parsing a commit message is microseconds; the per-issue dedup cache is `O(events_on_issue)`; the only real cost is trailer-bearing commits, which are rare.
3. **Cap scan at N commits from each tip.** Rejected: bounded but lossy. Silently misses trailers on older commits for contributors who sync late.
4. **Also parse `Fixes:` / `Closes:` keys and auto-close issues.** Explicitly out of scope. The user asked for link tracking only, and mixing status mutation into the scan would broaden the feature surface considerably.