3a3cf2a9
Add design spec for auto-linking commits to issues
alex emery 2026-04-12 05:39
Records the brainstormed design for parsing `Issue:` trailers during sync and emitting idempotent link events on the matching issue DAG. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md b/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md new file mode 100644 index 0000000..424a0b3 --- /dev/null +++ b/docs/superpowers/specs/2026-04-12-commit-issue-link-design.md @@ -0,0 +1,208 @@ # Auto-Link Commits to Issues via `Issue:` Trailer **Status:** design approved **Date:** 2026-04-12 ## Summary During `git-collab sync`, walk all local branches and look for commits whose messages contain an `Issue: <prefix>` git trailer. For each such commit, resolve the prefix to an issue and emit a new `issue.commit_link` event on that issue's DAG. The operation is idempotent: each (issue, commit) pair produces exactly one link event across the lifetime of the repository. ## Motivation Today, commits and issues live in parallel. A contributor can mention an issue ID in a commit message, but nothing structured ties them together — the collab view of an issue has no idea which commits reference it. Auto-linking closes that gap without requiring any new command, UI, or manual step beyond writing a trailer. Explicitly out of scope: auto-resolving or auto-closing issues from commits. This change records links only. ## User Experience A contributor writes a commit message like: ``` Fix off-by-one in patch walker The revwalk was skipping the final commit on branches that diverged from main before the last sync. Issue: a3f9 ``` They run `git-collab sync`. Output includes: ``` Linked 1 commit(s) to issues. ``` Later, `git-collab issue show a3f9` shows a new line in the timeline: ``` · linked 4b2e1cd "Fix off-by-one in patch walker" — alice, 2h ago ``` If the contributor forgets the trailer on the first commit but adds it on a later commit, running sync again picks up just the new one. Re-running sync against unchanged history does nothing and prints nothing. ## Architecture One new module, one new event variant, one insertion point in the sync pipeline, one new case in the issue-timeline renderer. ### Data flow per sync 1. `sync()` completes `reconcile_refs("issues", ...)` and `reconcile_refs("patches", ...)` as today. 2. Before `collect_push_refs`, `sync()` calls `commit_link::scan_and_link(&repo, &author, &sk)`. 3. `scan_and_link` walks every commit reachable from `refs/heads/*`, deduplicating shared ancestors in a per-sync `HashSet<Oid>`. 4. For each commit, the module parses `Issue:` trailers from the commit message. 5. For each trailer value, it resolves the prefix via `state::resolve_issue_ref`. Unresolved or ambiguous prefixes become one-line stderr warnings and are skipped. 6. For each resolved issue, the module lazily loads the issue's existing event log into a per-issue `HashSet<String>` of already-linked commit SHAs. If this commit's SHA is not in the set, it emits a new `IssueCommitLink { commit }` event via `dag::append_event` and adds the SHA to the set. 7. New events are picked up by the subsequent `collect_push_refs` and pushed to the remote in the same sync. ### Dedup — why two sets **Revwalk dedup (`HashSet<Oid>`) — within a single sync.** Branches share ancestors. Without this set, `main`'s history would be walked once per branch tip. It's a perf optimization; correctness is unaffected. **Per-issue linked-SHA dedup (`HashMap<IssueId, HashSet<String>>`) — across sync runs.** A commit with `Issue: abc` will keep matching on every future sync — we need to not emit a new link event every time. The source of truth is the issue's own event DAG: if it already contains an `IssueCommitLink` for this commit, we skip. The first time a given issue is matched in a sync, we walk its DAG once to build the set, cache it, and use the cache for the rest of the sync. New emissions are inserted into the cache so two trailers pointing at the same issue in one sync don't both try to emit. Net effect: each (issue, commit) pair produces exactly one `issue.commit_link` event, ever, regardless of how many branches contain the commit or how often sync runs. ## Components ### `src/commit_link.rs` (new) ```rust pub fn scan_and_link( repo: &Repository, author: &Author, sk: &SigningKey, ) -> Result<usize, Error>; // returns count of events emitted fn parse_issue_trailers(message: &str) -> Vec<String>; fn collect_linked_shas(repo: &Repository, issue_ref: &str) -> Result<HashSet<String>, Error>; ``` `parse_issue_trailers` is pure and string-only: no git, no I/O. It splits the message on blank lines, takes the final paragraph as the trailer block, iterates its lines, matches `(?i)^\s*issue\s*:\s*(\S.*?)\s*$`, and returns the captured values. Easy to unit-test with a table of fixtures. `collect_linked_shas` walks the DAG for a single issue ref and returns every SHA mentioned in an existing `IssueCommitLink` event. Called lazily on first match per issue; result cached in `scan_and_link`'s local `HashMap`. `scan_and_link` orchestrates: build a `git2::Revwalk`, push every `refs/heads/*` tip, iterate, dedup via `HashSet<Oid>`, per commit call `parse_issue_trailers`, for each trailer value call `state::resolve_issue_ref`, populate the per-issue cache on demand, and emit via `dag::append_event`. ### `src/event.rs` Add one variant to the `Action` enum (currently ending at line 105): ```rust #[serde(rename = "issue.commit_link")] IssueCommitLink { commit: String }, ``` Serialization follows the existing `rename` pattern. No `alias` needed — this is a new type. ### `src/sync.rs` Insert one call after `reconcile_refs(&repo, "patches", ...)` at line 365, before `collect_push_refs`: ```rust match commit_link::scan_and_link(&repo, &author, &sk) { Ok(n) if n > 0 => println!("Linked {} commit(s) to issues.", n), Ok(_) => {} Err(e) => eprintln!("warning: commit link scan failed: {}", e), } ``` Sync proceeds to `collect_push_refs` regardless of the result. Any events that were successfully emitted before a failure still get pushed in the same sync. ### Issue timeline renderer Locate the existing match statement that renders events for `issue show`. Add an arm for `IssueCommitLink` that prints: ``` · linked <short-sha> "<commit subject>" — <author>, <rel-time> ``` - `<short-sha>`: first 7 characters of the commit SHA. - `<commit subject>`: first line of the commit message, truncated to 60 characters (with `…` on truncation). - `<author>` and `<rel-time>`: from the event's existing `author` and `timestamp` fields (not the commit's — we want to show who linked it and when, which is the sync author). If the commit has been GC'd or isn't available locally at render time: ``` · linked <short-sha> (commit not available) ``` ## Trailer Parsing Rules **Trailer block** = the final paragraph of the commit message, where paragraphs are separated by blank lines. This matches git's own trailer convention. If the message contains no blank lines, the entire message is the trailer block. **Line match** = `(?i)^\s*issue\s*:\s*(\S.*?)\s*$`. Only the exact key `Issue`, case-insensitive. Not `Issues`, not `Fixes`, not `Closes`. Scope is deliberately tight; other keys can be added in follow-up work if needed. **One issue prefix per line.** No comma-separated lists. If a contributor wants to link two issues, they write two `Issue:` lines. This avoids ambiguity about where a prefix ends. **Multiple trailers on one commit** are all honored. Each resolves independently; each produces its own `IssueCommitLink` event on its respective issue. **Resolution** goes through `state::resolve_issue_ref`, which already handles prefix matching and ambiguity: - unknown prefix → `warning: commit <sha>: Issue: <val> — no such issue, skipping` - ambiguous prefix → `warning: commit <sha>: Issue: <val> — ambiguous (matches N issues), skipping` - resolved → proceed to dedup and emit **Not parsed**: prose mentions in the commit body like "this fixes issue abc". Only the structured trailer form. That's the point of picking a trailer — no false positives from commit messages that happen to mention an issue ID. ## Error Handling **Philosophy: `scan_and_link` must never break sync.** Commits with trailers are a nice-to-have; losing the ability to push collab state because of a parse hiccup would be a bad trade. **Inside `scan_and_link`** — per-commit and per-issue errors are caught, turned into one-line stderr warnings prefixed `warning:`, and iteration continues: - Commit object fails to load → warn, skip commit. - Trailer resolves to unknown or ambiguous issue prefix → warn with commit SHA and trailer value, skip that trailer (other trailers on the same commit are still processed). - Reading an issue's existing event log fails → warn, mark that issue as poisoned in the cache so we don't retry on every subsequent match in this sync. - `dag::append_event` fails for one issue → warn with issue ID and commit SHA, continue. Other emissions in the same sync are unaffected. `scan_and_link` returns `Result<usize, Error>` but the only errors that actually propagate are "couldn't even start" failures: repo handle invalid, HEAD missing, revwalk construction failed. Everything else is absorbed. **At the sync call site**, the `Err` branch logs and continues. Sync reaches `collect_push_refs` regardless. Any link events that were emitted before the failure still get pushed. **Warnings are one-line, `warning:` prefix, stderr**, matching the existing sync output style. **Deliberately not handled:** - Duplicate-warning suppression. An ambiguous or unknown prefix will warn on every sync until the contributor amends the trailer. That's the right incentive. - Retrying failed dag writes. The next sync naturally retries via a cache miss. ## Testing ### Unit tests — `tests/commit_link_test.rs` (new) Pure parser, table-driven over message → expected trailers: - No trailer block → `[]` - Single `Issue: abc` in final paragraph → `["abc"]` - Case variants (`issue:`, `ISSUE :`, ` Issue: abc `) → `["abc"]` - Two `Issue:` lines in the trailer block → `["abc", "def"]` - `Issue:` in body but not the final paragraph → `[]` - `Issues: abc` (wrong key) → `[]` - Prose mention `"fixes issue abc"` → `[]` - Message with no blank lines, single `Issue: abc` line → `["abc"]` - Empty message → `[]` ### Integration tests — `tests/sync_test.rs` (extending existing patterns) 1. **Happy path** — create issue `foo`, commit with `Issue: foo` trailer, run sync, assert the issue ref has one new `IssueCommitLink` event with the right SHA. 2. **Idempotency** — run sync twice against the same history, assert exactly one link event exists. 3. **Multiple branches sharing an ancestor** — same linked commit reachable from two branches, assert exactly one link event (exercises both dedup sets). 4. **Multi-issue trailer** — commit with two `Issue:` lines, assert both issues get one event each. 5. **Unknown prefix** — `Issue: zzz` where no issue matches, assert sync completes successfully, no event emitted, warning on stderr. 6. **Ambiguous prefix** — two issues sharing a prefix, trailer uses that prefix, assert no event, warning, sync still succeeds. 7. **Prior manual link then re-sync** — seed the issue's DAG with an `IssueCommitLink` for SHA X, add a new commit with `Issue: foo`, sync, assert only the new event is added. ### Renderer Add a case to the existing `issue show` event-rendering test. Assert the one-line format renders as specified. The exact test file will be located during implementation. ### Not tested - Performance on very large histories. The O(history) walk is an accepted cost per the depth-handling decision: parsing is microsecond-cheap, and only commits with actual trailers pay any I/O cost beyond that. - Concurrent syncs. Already covered by the existing `SyncLock` in `sync.rs`. ## Alternatives Considered 1. **Commit-msg hook for immediate linking.** Rejected: requires per-machine install via `git-collab init`, fails open when not installed, and duplicates the sync path for a feature that naturally belongs at publish time. 2. **Persistent scan cursor (last-scanned OID per branch) to bound re-walk work.** Rejected as speculative optimization. Parsing a commit message is microseconds; the per-issue dedup cache is `O(events_on_issue)`; the only real cost is trailer-bearing commits, which are rare. 3. **Cap scan at N commits from each tip.** Rejected: bounded but lossy. Silently misses trailers on older commits for contributors who sync late. 4. **Also parse `Fixes:` / `Closes:` keys and auto-close issues.** Explicitly out of scope. The user asked for link tracking only, and mixing status mutation into the scan would broaden the feature surface considerably.