Benchmarks → Docs Pipeline

This note records the Phase 1 design + Phase 2 implementation work for surfacing Criterion micro-benchmarks inside the mdBook site. It follows the flow described in AGENTS.md: tickets → specs → code/tests → data.

Layout, naming, retention

Raw output lives under target/criterion/... while cargo bench runs. scripts/rust-bench.sh copies only the criterion subtree into data/bench/criterion/ (Git LFS) right after the run so we keep curated JSON without polluting data/ with arbitrary build detritus.
Build junk (data/bench/release, tmp/, .rustc_info.json) stays untracked via .gitignore. If Cargo drops new cache directories, add them here before running benches on a ticket.
Derived tables land in docs/assets/bench/ as <timestamp>_<group>.csv plus a Markdown twin for mdBook. Each CSV gets a .run.json sidecar with git_commit, UTC timestamp, host/rust info, and the row count. Small symlinks (current.csv, current_<group>.csv, current_<group>.md) point at the latest export so docs can {{#include}} a stable path.
Retention: the Python stage keeps the most recent five exports per group by default (value stored in configs/bench/docs_local.json). Older CSV/MD pairs + their .run.json sidecars are deleted to avoid asset bloat while still keeping short-term history for review diffs.

How to run

Make sure the benches you want exist (currently crates/viterbo/benches/poly2_bench.rs) and hydrate Git LFS for data/** if needed.
Run the benches through the wrapper (safe-wrapped):
```
bash scripts/safe.sh --timeout 300 -- bash scripts/rust-bench.sh
# optional envs: BENCH_EXPORT_DIR=/tmp/bench, BENCH_RUN_POSTPROCESS=1 (to immediately run the stage)
```
This writes raw Criterion JSON to target/criterion, rsyncs the curated snapshot into data/bench/criterion, and leaves Cargo caches alone.

Refresh the docs tables via the Python stage (defaults live in configs/bench/docs_local.json):

bash scripts/safe.sh --timeout 120 -- uv run python -m viterbo.bench.stage_docs \
  --config configs/bench/docs_local.json
# add --bench-root /custom/path or --keep 10 if you need overrides

Commit the tiny files in docs/assets/bench/ together with the data/bench/criterion/** snapshot (Git LFS handles the bulk). If you want the wrapper to handle step 3 automatically, export BENCH_RUN_POSTPROCESS=1 when invoking scripts/rust-bench.sh.

scripts/reproduce.sh now runs both steps (bench + docs stage) unconditionally so every thesis build and mdBook render derives from freshly generated measurements. Whenever a ticket adds a new artifact or visualization, update reproduce.sh in the same PR.

Latest snapshot

The Markdown fragment below is generated by python -m viterbo.bench.stage_docs and pulled in verbatim so reviewers always see the freshest numbers without copy/paste.

bench	parameter	samples	min (ns)	mean (ns)	stddev (ns)
halfspace_intersection	0	50	4.571	4.977	0.346
halfspace_intersection	10	50	598.399	671.668	20.930
halfspace_intersection	20	50	1106.233	1155.103	28.151
halfspace_intersection	50	50	2572.787	2707.089	67.563
halfspace_intersection	100	50	5070.971	5218.790	145.902
push_forward_strict	0	50	13.587	13.865	0.199
push_forward_strict	10	50	493.217	516.936	15.231
push_forward_strict	20	50	1710.680	1807.478	70.256
push_forward_strict	50	50	6042.619	6140.683	88.946
push_forward_strict	100	50	14986.763	16127.213	801.691

Updated 2025-11-11 01:41:11Z · commit 585a129 · host ab5b4864ef14 · rustc rustc 1.91.1 (ed61e7d7e 2025-11-07)

Interpretation cheat sheet

Both halfspace_intersection and push_forward_strict scale roughly super-linear with the number of halfspaces m, but the latter is consistently ~3× slower for the same m because it performs an extra affine transform before the set operation.
For tiny polytopes (m ≤ 10) the kernel stays in the sub-microsecond regime, which makes it viable to run exhaustive smoke tests inside CI; by m=100 we are in the ~5 μs (intersection) and ~15 μs (push-forward) range, which is still cheap for batched evaluation.
The stddev columns are low relative to the mean for larger inputs, which indicates the batches are deterministic enough that storing a single snapshot per commit is meaningful.
If you see samples < 100, it means Criterion bailed early because the stage was faster than the configured measurement window; rerun with --significance-level tweaks if you need denser samples.

Script internals (reference)

viterbo.bench.stage_docs parses estimates.json and sample.json for each <group>/<bench>/<param> tuple, computes min, mean, stddev, and copies system metadata from git, platform, and rustc --version.
Output schema matches the CSV header, so CSVs remain diffable while Markdown renders nicely inside mdBook; both variants share the same timestamp + provenance file.
Symlinks are relative so Git diffs stay stable and docs can just use the include macro for the “current” snapshot. For example (shown literally, not executed): {{# include ../assets/bench/current_<group>.md}} (note the space after # prevents mdBook from treating this as a real include).
Use uv run python -m viterbo.bench.stage_docs --config configs/bench/docs_local.json --keep 10 if you need a longer breadcrumb trail before trimming old exports.