1d8f7b57
Revise commit-link spec after independent review
alex emery 2026-04-12 05:48
Tightens trailer parsing to strict git-style blocks with a single-token value regex, adds archived-issue handling, documents cross-machine duplication and render-time dedup, shows both commit author and event author in the timeline, and covers the gaps with new test cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md b/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md index 424a0b3..231a57a 100644 --- a/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md +++ b/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md @@ -5,7 +5,7 @@ ## Summary During `git-collab sync`, walk all local branches and look for commits whose messages contain an `Issue: <prefix>` git trailer. For each such commit, resolve the prefix to an issue and emit a new `issue.commit_link` event on that issue's DAG. The operation is idempotent: each (issue, commit) pair produces exactly one link event across the lifetime of the repository. During `git-collab sync`, walk all local branches and look for commits whose messages contain an `Issue: <prefix>` git trailer. For each such commit, resolve the prefix to an issue and emit a new `issue.commit_link` event on that issue's DAG. The operation is idempotent from a single machine: re-running sync against unchanged history produces no new events. Across machines, temporary duplication is possible if two contributors emit a link event for the same (issue, commit) before either has pushed; the renderer dedups by commit SHA so the timeline never shows the same link twice. ## Motivation @@ -35,9 +35,11 @@ Linked 1 commit(s) to issues. Later, `git-collab issue show a3f9` shows a new line in the timeline: ``` · linked 4b2e1cd "Fix off-by-one in patch walker" — alice, 2h ago · linked 4b2e1cd "Fix off-by-one in patch walker" by bob (linked by alice, 2h ago) ``` Showing both the commit author (`bob`) and the event author (`alice`) matters on shared checkouts and CI boxes: the person who runs sync isn't always the person who wrote the commit, and the timeline needs to reflect that honestly. If the contributor forgets the trailer on the first commit but adds it on a later commit, running sync again picks up just the new one. Re-running sync against unchanged history does nothing and prints nothing. ## Architecture @@ -46,21 +48,25 @@ One new module, one new event variant, one insertion point in the sync pipeline, ### Data flow per sync 1. `sync()` completes `reconcile_refs("issues", ...)` and `reconcile_refs("patches", ...)` as today. 1. `sync()` completes `reconcile_refs("issues", ...)` and `reconcile_refs("patches", ...)` as today. After reconcile, the local issue DAGs contain every link event that has been pushed to the remote, so the dedup cache we're about to build reflects the merged remote state. 2. Before `collect_push_refs`, `sync()` calls `commit_link::scan_and_link(&repo, &author, &sk)`. 3. `scan_and_link` walks every commit reachable from `refs/heads/*`, deduplicating shared ancestors in a per-sync `HashSet<Oid>`. 3. `scan_and_link` builds a revwalk seeded from every `refs/heads/*` tip, deduplicating shared ancestors in a per-sync `HashSet<Oid>`. If there are no local branches (e.g. detached HEAD immediately after `git checkout <sha>`) the revwalk is empty and `scan_and_link` is a silent no-op. Shallow clones are fine: the revwalk stops at the shallow boundary, so we link whatever is locally reachable, and deepening the clone and re-syncing picks up anything we missed. 4. For each commit, the module parses `Issue:` trailers from the commit message. 5. For each trailer value, it resolves the prefix via `state::resolve_issue_ref`. Unresolved or ambiguous prefixes become one-line stderr warnings and are skipped. 6. For each resolved issue, the module lazily loads the issue's existing event log into a per-issue `HashSet<String>` of already-linked commit SHAs. If this commit's SHA is not in the set, it emits a new `IssueCommitLink { commit }` event via `dag::append_event` and adds the SHA to the set. 5. For each trailer value, it resolves the prefix via `state::resolve_issue_ref`. Unresolved or ambiguous prefixes become one-line stderr warnings and are skipped. If the resolved ref lives under `refs/collab/archive/issues/...`, the module warns and skips — archived issues are a frozen namespace and should not accrue new events. 6. For each resolved (non-archived) issue, the module lazily loads the issue's existing event log into a per-issue `HashSet<String>` of already-linked commit SHAs. If this commit's SHA is not in the set, it emits a new `IssueCommitLink { commit }` event via `dag::append_event` and adds the SHA to the set. 7. New events are picked up by the subsequent `collect_push_refs` and pushed to the remote in the same sync. ### Dedup — why two sets **Revwalk dedup (`HashSet<Oid>`) — within a single sync.** Branches share ancestors. Without this set, `main`'s history would be walked once per branch tip. It's a perf optimization; correctness is unaffected. **Per-issue linked-SHA dedup (`HashMap<IssueId, HashSet<String>>`) — across sync runs.** A commit with `Issue: abc` will keep matching on every future sync — we need to not emit a new link event every time. The source of truth is the issue's own event DAG: if it already contains an `IssueCommitLink` for this commit, we skip. The first time a given issue is matched in a sync, we walk its DAG once to build the set, cache it, and use the cache for the rest of the sync. New emissions are inserted into the cache so two trailers pointing at the same issue in one sync don't both try to emit. **Per-issue linked-SHA dedup (`HashMap<IssueRef, HashSet<String>>`) — across sync runs.** A commit with `Issue: abc` will keep matching on every future sync — we need to not emit a new link event every time. The source of truth is the issue's own event DAG: if it already contains an `IssueCommitLink` for this commit, we skip. The first time a given issue is matched in a sync, we walk its DAG once to build the set, cache it, and use the cache for the rest of the sync. New emissions are inserted into the cache so two trailers pointing at the same issue in one sync don't both try to emit. The cache key is the *resolved ref name* returned by `state::resolve_issue_ref`, not a reconstructed `refs/collab/issues/<id>` string. This matters because resolution may return either an active or archived ref; using the resolved name keeps the cache, `collect_linked_shas`, and `dag::append_event` all talking about the same ref. (Archived refs are then filtered out entirely per step 5 above; the cache keying is still the right shape for when active refs collide with archived ones.) Net effect on a single machine: each (issue, commit) pair produces exactly one `issue.commit_link` event, ever, regardless of how many branches contain the commit or how often sync runs. Net effect: each (issue, commit) pair produces exactly one `issue.commit_link` event, ever, regardless of how many branches contain the commit or how often sync runs. **Cross-machine duplication is possible and tolerated.** If Alice emits a link event that fails to push, and Bob syncs from a different machine before Alice's next sync, Bob's `collect_linked_shas` won't see Alice's pending event (it isn't on the remote yet), and Bob will emit a second link for the same (issue, commit). Both events eventually merge into the DAG. This is correct for the event model but would produce a duplicated timeline entry, so the renderer dedups by commit SHA at render time (see "Issue timeline renderer" below). ## Components @@ -77,11 +83,11 @@ fn parse_issue_trailers(message: &str) -> Vec<String>; fn collect_linked_shas(repo: &Repository, issue_ref: &str) -> Result<HashSet<String>, Error>; ``` `parse_issue_trailers` is pure and string-only: no git, no I/O. It splits the message on blank lines, takes the final paragraph as the trailer block, iterates its lines, matches `(?i)^\s*issue\s*:\s*(\S.*?)\s*$`, and returns the captured values. Easy to unit-test with a table of fixtures. `parse_issue_trailers` is pure and string-only: no git, no I/O. It implements the strict trailer-block rules in the "Trailer Parsing Rules" section below. Easy to unit-test with a table of fixtures. `collect_linked_shas` walks the DAG for a single issue ref and returns every SHA mentioned in an existing `IssueCommitLink` event. Called lazily on first match per issue; result cached in `scan_and_link`'s local `HashMap`. `collect_linked_shas` walks the DAG for a single issue ref and returns every SHA mentioned in an existing `IssueCommitLink` event. Called lazily on first match per issue; result cached in `scan_and_link`'s local `HashMap` keyed by the *resolved ref name* (not a reconstructed path). `scan_and_link` orchestrates: build a `git2::Revwalk`, push every `refs/heads/*` tip, iterate, dedup via `HashSet<Oid>`, per commit call `parse_issue_trailers`, for each trailer value call `state::resolve_issue_ref`, populate the per-issue cache on demand, and emit via `dag::append_event`. `scan_and_link` orchestrates: build a `git2::Revwalk`, push every `refs/heads/*` tip, iterate, dedup via `HashSet<Oid>`, per commit call `parse_issue_trailers`, for each trailer value call `state::resolve_issue_ref`, skip archived refs with a warning, populate the per-issue cache on demand (keyed by resolved ref name), and emit via `dag::append_event` using the same resolved ref name. ### `src/event.rs` @@ -113,26 +119,31 @@ Sync proceeds to `collect_push_refs` regardless of the result. Any events that w Locate the existing match statement that renders events for `issue show`. Add an arm for `IssueCommitLink` that prints: ``` · linked <short-sha> "<commit subject>" — <author>, <rel-time> · linked <short-sha> "<commit subject>" by <commit-author> (linked by <event-author>, <rel-time>) ``` - `<short-sha>`: first 7 characters of the commit SHA. - `<commit subject>`: first line of the commit message, truncated to 60 characters (with `…` on truncation). - `<author>` and `<rel-time>`: from the event's existing `author` and `timestamp` fields (not the commit's — we want to show who linked it and when, which is the sync author). - `<commit-author>`: `commit.author().name()` from the git commit object — the person who wrote the code. - `<event-author>` and `<rel-time>`: from the event's existing `author` and `timestamp` fields — the person whose sync emitted the link event. On a single-user checkout these are usually the same; on shared checkouts and CI boxes they differ, and showing both is the honest rendering. If the commit has been GC'd or isn't available locally at render time: If the commit has been GC'd or isn't available locally at render time, fall back to: ``` · linked <short-sha> (commit not available) · linked <short-sha> (commit <short-sha> not in local repo) (linked by <event-author>, <rel-time>) ``` **Render-time dedup.** Before rendering, collect `IssueCommitLink` events into a `HashMap<commit_sha, Event>` keyed by commit SHA, keeping the earliest (lowest `clock`, then earliest `timestamp`) event per key. Only the kept event is rendered. This absorbs the cross-machine duplication case described in the dedup section: two machines each emitting a link for the same (issue, commit) yields two DAG events but exactly one timeline line. ## Trailer Parsing Rules **Trailer block** = the final paragraph of the commit message, where paragraphs are separated by blank lines. This matches git's own trailer convention. If the message contains no blank lines, the entire message is the trailer block. **Strict trailer block, matching git's own definition.** Split the commit message on blank lines to get paragraphs. Take the final paragraph. The final paragraph is a trailer block **only if every non-empty line in it is trailer-shaped** — that is, matches `^[A-Za-z][A-Za-z0-9-]*\s*:\s*\S.*$` (a token, colon, non-empty value). If *any* line in the final paragraph is not trailer-shaped, the final paragraph is prose, not a trailer block, and we extract zero trailers from the message. If the whole message is one paragraph with no blank lines, the same rule applies to the whole message. This rule deliberately diverges from "final paragraph contains any trailer-looking line" (which would false-positive on a paragraph like `Thanks to Bob for the catch.\nIssue: a3f9`) and aligns with `git interpret-trailers --parse` semantics. Contributors who want to link an issue must use a dedicated trailer block separated from prose by a blank line — the same convention they already use for `Signed-off-by:`. **Line match** = `(?i)^\s*issue\s*:\s*(\S.*?)\s*$`. Only the exact key `Issue`, case-insensitive. Not `Issues`, not `Fixes`, not `Closes`. Scope is deliberately tight; other keys can be added in follow-up work if needed. **Line match within a confirmed trailer block** = `(?i)^\s*issue\s*:\s*(\S+)\s*$`. Only the exact key `Issue`, case-insensitive. The value must be a single non-whitespace token followed by optional trailing whitespace and end-of-line — nothing else. `Issue: abc fixes thing` does **not** match (the trailing ` fixes thing` is rejected). That's deliberate: anything that looks like a loose commentary on the issue should not silently become a prefix lookup that warns every sync forever. Not `Issues`, not `Fixes`, not `Closes`. Scope is deliberately tight; other keys can be added in follow-up work if needed. **One issue prefix per line.** No comma-separated lists. If a contributor wants to link two issues, they write two `Issue:` lines. This avoids ambiguity about where a prefix ends. **One issue prefix per line.** No comma-separated lists. If a contributor wants to link two issues, they write two `Issue:` lines. The `(\S+)` match enforces this at parse time. **Multiple trailers on one commit** are all honored. Each resolves independently; each produces its own `IssueCommitLink` event on its respective issue. @@ -140,7 +151,8 @@ If the commit has been GC'd or isn't available locally at render time: - unknown prefix → `warning: commit <sha>: Issue: <val> — no such issue, skipping` - ambiguous prefix → `warning: commit <sha>: Issue: <val> — ambiguous (matches N issues), skipping` - resolved → proceed to dedup and emit - resolved to an archived ref → `warning: commit <sha>: Issue: <val> — issue is archived, skipping` - resolved to an active ref → proceed to dedup and emit **Not parsed**: prose mentions in the commit body like "this fixes issue abc". Only the structured trailer form. That's the point of picking a trailer — no false positives from commit messages that happen to mention an issue ID. @@ -152,7 +164,7 @@ If the commit has been GC'd or isn't available locally at render time: - Commit object fails to load → warn, skip commit. - Trailer resolves to unknown or ambiguous issue prefix → warn with commit SHA and trailer value, skip that trailer (other trailers on the same commit are still processed). - Reading an issue's existing event log fails → warn, mark that issue as poisoned in the cache so we don't retry on every subsequent match in this sync. - Reading an issue's existing event log fails → warn, mark that issue as poisoned in the per-sync cache so we don't retry on every subsequent match *within this sync run*. Poisoning is not persisted; the next sync starts fresh and tries again. - `dag::append_event` fails for one issue → warn with issue ID and commit SHA, continue. Other emissions in the same sync are unaffected. `scan_and_link` returns `Result<usize, Error>` but the only errors that actually propagate are "couldn't even start" failures: repo handle invalid, HEAD missing, revwalk construction failed. Everything else is absorbed. @@ -172,14 +184,19 @@ If the commit has been GC'd or isn't available locally at render time: Pure parser, table-driven over message → expected trailers: - No trailer block → `[]` - Single `Issue: abc` in final paragraph → `["abc"]` - Single `Issue: abc` in a pure trailer-block final paragraph → `["abc"]` - Case variants (`issue:`, `ISSUE :`, ` Issue: abc `) → `["abc"]` - Two `Issue:` lines in the trailer block → `["abc", "def"]` - Two `Issue:` lines in a pure trailer block → `["abc", "def"]` - `Issue:` in body but not the final paragraph → `[]` - `Issues: abc` (wrong key) → `[]` - Prose mention `"fixes issue abc"` → `[]` - Message with no blank lines, single `Issue: abc` line → `["abc"]` - Message with no blank lines, the whole message is a pure trailer block containing `Issue: abc` → `["abc"]` - Empty message → `[]` - **Mixed final paragraph** (`Thanks to Bob for the catch.\nIssue: a3f9`) → `[]` — final paragraph contains a prose line, so it's not a trailer block and we extract nothing - **Trailing whitespace-only final paragraph** after a real trailer block → `["abc"]` — empty trailing paragraphs don't shadow the previous trailer block - **Pure trailer block with mixed keys** (`Signed-off-by: alice <a@x>\nIssue: abc`) → `["abc"]` — `Signed-off-by:` is trailer-shaped so the paragraph still qualifies as a trailer block - **Value with trailing garbage** (`Issue: abc fixes thing`) → `[]` — `(\S+)` with anchored `\s*$` rejects the second token - **Value with only whitespace** (`Issue: `) → `[]` — no captured token ### Integration tests — `tests/sync_test.rs` (extending existing patterns) @@ -190,6 +207,11 @@ Pure parser, table-driven over message → expected trailers: 5. **Unknown prefix** — `Issue: zzz` where no issue matches, assert sync completes successfully, no event emitted, warning on stderr. 6. **Ambiguous prefix** — two issues sharing a prefix, trailer uses that prefix, assert no event, warning, sync still succeeds. 7. **Prior manual link then re-sync** — seed the issue's DAG with an `IssueCommitLink` for SHA X, add a new commit with `Issue: foo`, sync, assert only the new event is added. 8. **Remote-originated link dedup** — simulate a remote that already holds an `IssueCommitLink` for commit X (set up via a second repo, push, then clone/fetch into the test repo). Run sync locally with the same commit X present in local history. After `reconcile_refs` the merged log contains the remote event; assert `scan_and_link` sees it via `collect_linked_shas` and does not emit a duplicate. 9. **Archived issue** — archive an issue, add a commit with `Issue: <archived-id>`, sync, assert no event emitted, warning on stderr, archived issue's DAG unchanged, sync still succeeds. 10. **Detached HEAD** — check out a bare SHA with no branches pointing at it, add a commit with `Issue: foo` (orphaned), sync, assert no event emitted (revwalk from `refs/heads/*` is empty) and sync succeeds without error. 11. **Commit-author vs event-author rendering** — emit a link event where the commit author email differs from the git user running the test. Render the issue; assert the one-line format shows both names distinctly. 12. **Render-time dedup** — seed an issue's DAG with two `IssueCommitLink` events for the same commit SHA from different event authors. Render the issue; assert exactly one link line appears, showing the earliest event. ### Renderer