Why We Can't Retroactively Benchmark NBA First Round 2026 — And Why That's The Point

A fair question landed within hours of us publishing the NBA Playoffs calibration benchmark:

"NBA playoffs already started — why isn't First Round included?"

The honest answer is the whole reason pre-committed benchmarks exist. Writing it down here so the answer is in our own voice, not someone else's guess.

The technical reason: we can't fairly snapshot games that already tipped

Our pre-committed benchmark specifies a T-60-minute snapshot window: for each eligible game, both our pregame WP and Polymarket's mid-price get captured 60 minutes before tip-off. A JSONL row gets appended, never rewritten.

For any game that tipped off before we committed the manifest, that snapshot simply doesn't exist. We can't conjure it after the fact without two choices that both break the integrity of the benchmark:

Post-hoc imputation — "re-run" our model on historical team state and claim that's what it would have predicted pre-game. This is academically indefensible: any retrospective number benefits from knowledge we didn't have at T-60, and no one watching should trust a number we produced after seeing the outcome.
Cherry-picking stored logs — we do have WP snapshot files from April 19 onward, but the snapshots were captured for internal calibration purposes with different timing semantics (earliest in-quarter snapshot, not T-60), different game-ID keying (Polymarket IDs, not ESPN IDs), and varying sample coverage. Retrofitting those into a "benchmark" after the fact would be dressing internal ops telemetry as a commitment we never made.

Neither option lets us publish a number you should trust. The Conf Semis boundary is where our pre-commit starts being honest.

The philosophical reason: this is exactly what pre-commitment is FOR

The whole purpose of broadcasting our manifest's SHA-256 hash to Polygon was to make it structurally impossible for us to change the rules after the fact.

If we quietly extended scope backwards to include First Round games our internal logs made us look good on, we'd be breaking the very mechanism we're asking you to trust.

The correct response to "why isn't First Round included" is not "good point, let us add it." The correct response is "because the rules are frozen." That's what pre-commitment means.

What we're doing INSTEAD

Three follow-up actions already shipped alongside this post:

1. NHL Playoffs benchmark — same pattern, launching today

We committed a separate NHL Playoffs benchmark covering Conf Semis through Stanley Cup Finals. Same rules, same metric, same on-chain hash pattern, different sport. The NHL First Round is also in progress, and for the same reason, we're not retroactively snapshotting it.

2. MLB June 2026 sample — faster data cycle

For sports with longer windows, waiting for playoffs is impractical. We committed an MLB June regular-season sample benchmark: 100 MLB games across June 2026, pre-committed today with a separate on-chain hash. Gives us a faster cycle than waiting until October World Series, while keeping the same pre-commitment discipline.

3. Persistent pregame logging, going forward

The reason we couldn't cleanly retrofit the NBA First Round was partly a storage design gap: our live WP snapshots are optimized for live in-game recalibration, not T-60 pregame comparison. We're now also writing every pregame prediction to a dedicated benchmark_predictions_{slug}.jsonl whenever a manifest's eligibility window is active. That append-only log, combined with the on-chain hash, is the actual audit surface.

The uncomfortable corollary

You should trust us exactly as far as our pre-commitments go. Any informal claim outside a manifest's window — "our model was hitting 72% through First Round" — is marketing. Even if it's true. Even if we mean it.

The only way for us to be trustworthy about First Round performance would be to have pre-committed to a First Round benchmark before April 19. We didn't. So we won't.

What you can watch from here

The NBA pre-committed benchmark: zenhodl.net/benchmarks/nba-playoffs-2026 — first resolved game ~May 5
The NHL pre-committed benchmark: zenhodl.net/benchmarks/nhl-playoffs-2026 — first resolved game ~May 4
The MLB sample benchmark: zenhodl.net/benchmarks/mlb-june-2026-sample — starts June 1

Three separate pre-commits. Three separate on-chain hashes. Three separate scoreboards. If ZenHodl ever edits any of the three manifests, the hash on the served file will stop matching the hash in the Polygon transaction data field. Anyone can verify.

No retroactive wins. No post-hoc moves. Just three public bets with the rules frozen on a blockchain before the first game tips off.

That's the whole product.