docs/superpowers/specs/2026-04-12-render-readme-on-overview-design.md
Ref: Size: 9.0 KiB
# Render README on Repo Overview
**Date:** 2026-04-12
**Status:** Approved
## Goal
On the repo overview page (`GET /{repo}`), render the repository's README from
HEAD above the existing patches / issues / recent-commits sections, so visitors
landing on a repo see its description without having to browse the file tree.
## Scope
In scope:
- Lookup of a README file at the root of HEAD's tree.
- Markdown rendering for `README.md` (case-insensitive).
- Plain-text fallback for `README` and `README.txt`.
- HTML sanitization of rendered markdown.
- Size cap with a "too large" notice.
- Graceful no-op when no README exists or anything fails.
Out of scope (deferred, easy to add later):
- README rendering on the tree-root page.
- Relative-link rewriting (e.g. `./screenshot.png`, links to sibling files).
- Syntax highlighting in fenced code blocks.
- `.rst`, `.org`, AsciiDoc.
- Per-branch README selection on the overview page.
## File Lookup
Walk the root tree of `HEAD`'s commit once. Match the first entry whose
lowercased name matches one of, in order:
1. `readme.md`
2. `readme`
3. `readme.txt`
If no match, the overview page renders exactly as it does today — no README
block, no error, no log noise. A missing README is the common case.
The lookup is HEAD-only. Branch / ref selection is out of scope.
## Rendering
- `.md` match → `pulldown-cmark` (with tables, strikethrough, and task-list
extensions enabled) → HTML string → `ammonia` with a tightened policy:
- Start from `ammonia::Builder::default()`.
- **Restrict `<img>` URL schemes to `http` and `https` only** (no `data:`,
no `javascript:`). Done via `Builder::url_schemes` plus targeted
`tag_attribute_values` if needed so the policy applies to image sources.
- Defaults already strip `<script>`, `<style>`, `<iframe>`, inline event
handlers, and `javascript:` URLs in `<a href>`.
- `readme` / `.txt` match → HTML-escape the contents and wrap in `<pre>`.
- Binary blob (per `git2::Blob::is_binary`) → treat as missing, return `None`.
- Decode blob bytes with **strict `std::str::from_utf8`**. Invalid UTF-8 →
treat as missing, return `None`. (No lossy decoding — keeps invariants
simple and avoids rendering replacement-character soup.)
- **Uncompressed blob size** (`git2::Blob::size`) > **512 KiB** → return a
fixed "README too large to render" notice containing a link to the
corresponding blob view. The blob link uses the resolved default branch
name from `head_branch_name(&repo)`, not the literal string `HEAD`.
- **Post-render HTML cap**: after `pulldown-cmark` produces HTML, if the
output exceeds **2 MiB**, discard it and return the same "too large"
notice. Guards against markdown bombs (deeply nested lists, large tables,
reference-link expansion) where a small source produces enormous HTML.
The rendered output is **already-safe HTML** by the time it leaves the
renderer, so the template can mark it `|safe` without further escaping.
## Components
### `src/server/http/repo/readme.rs` (new)
A pure module with no Axum / state dependencies, so it can be unit-tested
against fixture repos.
```rust
pub struct RenderedReadme {
pub html: String,
}
pub fn load_readme(repo: &git2::Repository) -> Option<RenderedReadme>;
```
`load_readme` performs the tree walk, classification (markdown vs plain vs
binary vs oversized), rendering, and sanitization. Every `git2` error inside
this function maps to `None` — README rendering must never break the overview
page.
### `src/server/http/repo/overview.rs` (modified)
- Add `readme: Option<RenderedReadme>` to `OverviewTemplate`.
- Call `readme::load_readme(&repo)` after opening the repo and before building
the template.
### `src/server/http/templates/repo_overview.html` (modified)
Add a single new top section, before the existing patches/issues grid,
wrapped in a bordered card with a small header so a README starting with its
own `<h1>` doesn't visually collide with the page chrome:
```html
{% if let Some(r) = readme %}
<div class="card readme">
<h3 style="margin-top: 0;">README</h3>
<div class="readme-body">{{ r.html|safe }}</div>
</div>
{% endif %}
```
Card styling matches the visual treatment of the existing patches/issues
panels. Minimal inline styles, no new CSS file.
### `src/server/http/repo/mod.rs` (modified)
`pub mod readme;` and re-export `RenderedReadme` if needed by `overview.rs`.
## Dependencies
Add to the root `Cargo.toml`:
- `pulldown-cmark = "0.12"` — implementer to confirm whether default
features are sufficient or `default-features = false` is preferred for
build-time hygiene; HTML output is in the default feature set.
- `ammonia = "4"`
Both are pure-Rust with no system dependencies.
## Error Handling
- Missing README → `None` → template skips the block. **Silent** (common case).
- Invalid UTF-8 in the blob → `None`. Silent (treat as binary).
- Binary blob → `None`. Silent.
- Oversized blob (uncompressed > 512 KiB) → `Some(RenderedReadme { html: <fixed notice> })`.
- Oversized rendered HTML (post-render > 2 MiB) → same fixed notice.
- Markdown parser and ammonia sanitizer are both infallible by API.
- **Unexpected `git2` failure** after we've already resolved HEAD (e.g. tree
walk fails, or a blob OID present in the tree fails to load) → `None`,
but **log once at `warn`** with the repo name and the underlying error.
These should never happen on a healthy repo; silent swallowing here would
be a debugging tax.
The README block must never cause a 500 on the overview page. The rest of
the page renders fine without it.
## Testing
### Unit tests (in `readme.rs`)
Each test creates a tmp `git2::Repository::init`, writes a tree with the
required blobs, commits, and points HEAD at it.
**Lookup**
1. **No README** → `load_readme` returns `None`.
2. **`Readme.MD`** (mixed case) → found and rendered as markdown.
3. **Lookup precedence**: tree contains both `README.md` and `README.txt` →
markdown wins.
4. **Lookup precedence**: tree contains `README` and `README.txt` → `README`
wins (matches the documented order).
5. **Mixed-case precedence**: tree contains `README.md` (uppercase) and
`readme.txt` (lowercase) → markdown still wins.
6. **Nested README is NOT matched**: tree contains `docs/README.md` only
(no root README) → returns `None`. Guards against accidental recursive
walk.
7. **Symlink entry** (mode `0o120000`) named `README.md` → returned as `None`
(do not follow).
**Rendering & sanitization**
8. **`README.md` with `# Title\n\nbody`** → HTML contains `<h1>Title</h1>`.
9. **Plain `README` containing `<script>alert(1)</script>`** → result is a
`<pre>` block with `<script>`, no live tag.
10. **`README.md` containing raw `<script>alert(1)</script>`** → sanitizer
strips the tag; no `<script>` substring in output.
11. **`README.md` containing `<a href="javascript:alert(1)">x</a>`** →
rendered link has no `javascript:` href (ammonia drops the attr or tag).
12. **`README.md` containing `<img src=x onerror=alert(1)>`** → no `onerror`
attribute in the output.
13. **`README.md` containing `<iframe src="https://evil.example/">`** → no
`<iframe>` in the output.
14. **`README.md` containing `<img src="data:image/png;base64,AAAA">`** →
`data:` URL stripped (per the tightened `img` URL-scheme policy);
either the `src` attr is removed or the whole `<img>` tag is dropped.
This is the regression guard for the must-fix from review.
**Size & content edge cases**
15. **Empty `README.md`** (0 bytes) → returns `Some` with an empty (or near-
empty) rendered body. Pick one and assert it; do not return `None`.
16. **`README.md` of 600 KiB** → result is the "too large" notice with a
link to the blob view; markdown parser is NOT invoked.
17. **Markdown bomb**: small source (e.g. a deeply nested list or a long
reference-link expansion) whose rendered HTML exceeds the 2 MiB
post-render cap → result is the "too large" notice. Test must construct
a source that actually trips the cap, not just assert the cap value.
18. **Binary blob named `README.md`** (e.g. `\xff\xfe\x00...`) → `None`.
19. **Invalid UTF-8 (not flagged as binary by git2)**: blob containing valid
text plus a stray `\x80` byte → `None` (strict UTF-8 decode).
### Integration test
If `src/server/http/` already has an HTTP test harness (Axum `Router` with
`tower::ServiceExt::oneshot` against a temp repo), add one test:
- Init a repo with `README.md`, request `GET /{repo}`, assert response body
contains the rendered README block (e.g. an `<h1>` from the source).
If no such harness exists yet, the unit tests are sufficient — leave the
integration test as a follow-up rather than scaffolding a harness for one test.
## Acceptance
- Visiting `/{repo}` on a repo with a `README.md` shows rendered markdown
above patches/issues/commits.
- Visiting `/{repo}` on a repo with no README shows the page exactly as
before.
- A README containing `<script>` cannot execute JS in the browser.
- A 1 MiB README does not crash or hang the page; the user sees the
"too large" notice with a link to the blob view.
- `cargo test` and `cargo clippy` pass.