We just published the Transparency Index showing that of 14 sports forecasters tracked, only one publishes a numeric Expected Calibration Error. That's a fair positioning win, but the obvious rebuttal is "transparency is nice — does your model actually beat anyone?"
We're going to find out, in front of you, with no edit path.
The bet
Starting May 5, 2026, every NBA Conference Semifinal, Conference Final, and NBA Finals game tipping before June 25 will appear on a public scoreboard at zenhodl.net/benchmarks/nba-playoffs-2026.
For each game, exactly one minute before tip-off, two predictions are captured:
- ZenHodl's pregame win probability — straight from our production NBA model
- Polymarket's market mid-price — the consensus probability of every smart-money trader actively wagering real capital on the outcome
After the game settles, both predictions are scored on Expected Calibration Error (the metric the entire transparency index is built on), plus Brier score, log loss, and accuracy. Bootstrap 95% confidence intervals included so you can see when the gap is real and when it's noise.
Estimated 30-40 games over ~7 weeks. Win or lose, every prediction is logged before the game ends and never edited.
Why Polymarket is a brutal opponent
A solo operator beating Massey Ratings, KenPom, or 538 would be a real model achievement. Beating the live betting market is the canonical hedge-fund-grade demonstration. The market price isn't a smarter model — it's the aggregate of every smart model, plus every sharp trader's overlay. Closing-line value (CLV) is what professional sports analytics shops are measured against. We're benchmarking the open-line equivalent: T-60-minute consensus.
If we beat the market on calibration, that's the actual value proposition of our API. If we don't, that's also useful information — for you and for us.
How we make this real (not just claimed)
The benchmark page is easy. The trust device is the on-chain commitment.
Before the first eligible game, we do three things:
- Publish the manifest — a frozen JSON document specifying which games count, when snapshots are taken, what metrics we use, and how ties / outages are handled.
- Compute its SHA-256 hash — over a canonical (sorted-key, no-whitespace) UTF-8 serialization, so the hash stays stable across formatting.
- Broadcast that hash to Polygon — as a 0-MATIC self-transfer from our trading wallet (
0xc2a2D9267F13A7eFb2B1B527eCB4D8240a7823a0), with the 32-byte hash sitting in the transaction'sdatafield.
The transaction is permanent. The hash is permanent. If we ever edit the manifest after that point, the live file's hash won't match the on-chain receipt. Anyone can verify by running:
curl -s https://zenhodl.net/benchmarks/nba-playoffs-2026/manifest.json | sha256sum
# compare to the data field of the linked Polygon tx
The Polygon tx URL appears at the top of the benchmark page once broadcast.
What's in the rules — the exact things we're committing to
A few specifics from the manifest:
- Window: Games tipping between 2026-05-05 and 2026-06-25 inclusive
- Snapshot timing: No later than T-60 minutes before tip-off
- Metrics: ECE (10 equal-width bins) headline, plus Brier / log loss / accuracy
- Confidence intervals: 95% bootstrap with 1,000 resamples on ECE
- Excluded games: If either source is unavailable at T-60, the game is excluded from BOTH model's metrics — no opportunistic counting
- Model version disclosure: Each prediction row records the ZenHodl model version. We commit to publishing version IDs alongside per-game results so post-hoc retrains can't quietly invalidate prior numbers
- We publish when we lose: explicit clause. Losing is published in the same row with the same Brier score and the same red/green color coding
What this won't be
- Not a Polymarket trading strategy. Pure forecasting comparison; no execution.
- Not a closing-line value test. We snapshot at T-60, not at the close.
- Not a regular-season measure. NBA Conference Semis and beyond only — the small-sample, high-stakes window is the one that actually matters to people who consume win probabilities.
- Not a sweep across sports. NBA only. Codex-style narrow scope; if we expand, we'll pre-commit each new benchmark separately with its own on-chain hash.
What survives a loss
If our model finishes the window with worse ECE than Polymarket — and it might — three things will be true:
- The loss is published on the same page where any win would have been
- The on-chain hash still proves the rules didn't change mid-window
- We learn exactly where the gap is, by round, by team strength, by score margin
That's worth more than a win we could have rigged.
Where to follow it
- Live scoreboard: zenhodl.net/benchmarks/nba-playoffs-2026 (updates after every game)
- Raw prediction log:
/benchmarks/nba-playoffs-2026/raw.jsonl(append-only) - Resolved results log:
/benchmarks/nba-playoffs-2026/results.jsonl(append-only) - Manifest:
/benchmarks/nba-playoffs-2026/manifest.json
If you want the same NBA pregame WP feed our model serves into this benchmark, the API is at zenhodl.net/pricing. 7-day free trial. Same calibration we're being judged on right here.
If you think the rules are unfair, tell us — before the on-chain broadcast. After that, even we can't change them.