ZenHodl vs Polymarket Consensus
ZenHodl vs Polymarket Consensus — NBA Playoffs 2026 Calibration Benchmark tracks 2026 Conference Semifinals through Finals (inclusive). Same eligible games, same snapshot rule, same metrics. Updated as predictions and results are written.
Pre-commitment proof
Served file hash matches on-chain commit
Reproduce: curl -s /benchmarks/nba-playoffs-2026/manifest.json | sha256sum
Block 85956686 on Polygon. Broadcast 2026-04-24T12:23:34 UTC. The hash above appears in the tx's data field.
Live leaderboard
Reliability diagram
Each marker is one bin's average. Marker size scales with the number of games in the bin. Points above the diagonal mean predictions in that bucket were too pessimistic; below means too confident. The closer the points hug the diagonal across the chart, the better calibrated the model. A tiny y-jitter (±0.012) is applied so ZenHodl (offset up) and Polymarket Consensus (offset down) markers remain distinguishable when both bins share the same observed rate; hover any point for the true value.
Resolved games
| Game | ZenHodl WP | Polymarket Consensus WP | Outcome | ZH Brier | Polymarket Consensus Brier |
|---|---|---|---|---|---|
|
CLE @ NYK
2026-05-20 · 104-115
|
0.478 ✗ | 0.675 ✓ | NYK W | 0.273 | 0.106 |
|
SAS @ OKC
2026-05-19 · 122-115
|
0.294 ✓ | 0.675 ✗ | SAS W | 0.086 | 0.456 |
|
CLE @ DET
2026-05-18 · 125-94
|
0.456 ✓ | 0.635 ✗ | CLE W | 0.207 | 0.403 |
|
SAS @ MIN
2026-05-16 · 139-109
|
0.456 ✓ | 0.325 ✓ | SAS W | 0.208 | 0.106 |
|
DET @ CLE
2026-05-15 · 115-94
|
0.456 ✓ | 0.615 ✗ | DET W | 0.208 | 0.378 |
|
CLE @ DET
2026-05-13 · 117-113
|
0.470 ✓ | 0.615 ✗ | CLE W | 0.221 | 0.378 |
|
MIN @ SAS
2026-05-12 · 97-126
|
0.459 ✗ | 0.485 ✗ | SAS W | 0.292 | 0.265 |
|
OKC @ LAL
2026-05-12 · 115-110
|
0.459 ✓ | 0.175 ✓ | OKC W | 0.211 | 0.031 |
|
DET @ CLE
2026-05-12 · 103-112
|
0.460 ✗ | 0.585 ✓ | CLE W | 0.292 | 0.172 |
|
SAS @ MIN
2026-05-10 · 109-114
|
0.468 ✗ | 0.355 ✗ | MIN W | 0.283 | 0.416 |
|
NYK @ PHI
2026-05-10 · 144-114
|
0.484 ✓ | 0.475 ✓ | NYK W | 0.234 | 0.226 |
|
OKC @ LAL
2026-05-10 · 131-108
|
0.433 ✓ | 0.235 ✓ | OKC W | 0.188 | 0.055 |
|
DET @ CLE
2026-05-09 · 108-116
|
0.480 ✗ | 0.615 ✓ | CLE W | 0.270 | 0.148 |
|
SAS @ MIN
2026-05-08 · 115-108
|
0.484 ✓ | 0.345 ✓ | SAS W | 0.234 | 0.119 |
|
NYK @ PHI
2026-05-08 · 108-94
|
0.484 ✓ | 0.555 ✗ | NYK W | 0.234 | 0.308 |
|
OKC @ LAL
2026-05-07 · 125-107
|
0.433 ✓ | 0.125 ✓ | OKC W | 0.188 | 0.016 |
|
DET @ CLE
2026-05-07 · 107-97
|
0.480 ✓ | 0.395 ✓ | DET W | 0.231 | 0.156 |
|
NYK @ PHI
2026-05-06 · 108-102
|
0.484 ✓ | 0.375 ✓ | NYK W | 0.234 | 0.141 |
|
SAS @ MIN
2026-05-06 · 133-95
|
0.484 ✓ | 0.225 ✓ | SAS W | 0.234 | 0.051 |
|
DET @ CLE
2026-05-05 · 111-101
|
0.480 ✓ | 0.455 ✓ | DET W | 0.231 | 0.207 |
|
OKC @ LAL
2026-05-05 · 108-90
|
0.433 ✓ | 0.115 ✓ | OKC W | 0.188 | 0.013 |
|
NYK @ PHI
2026-05-05 · 137-98
|
0.552 ✗ | 0.285 ✓ | NYK W | 0.305 | 0.081 |
Excluded games (1)
Per the manifest's tie-handling rule, any game where either source was unavailable at snapshot time is excluded from BOTH model's metrics. Listed here so you can see the rule was applied, not silently hidden.
| Game | Tip-off | ZenHodl WP | Polymarket Consensus WP | Reason |
|---|---|---|---|---|
| MIN @ SAS | 2026-05-17T04:00 | 0.450 | — | polymarket_unavailable · midpoint_fetch_failed_home |
Snapshotted, awaiting result (1)
Read the full manifest (the rules) ↓
{
"metrics": {
"auxiliary": [
"Brier score",
"Log loss",
"Accuracy"
],
"confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
"ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
"headline": "Expected Calibration Error (ECE), 10 equal-width bins"
},
"model_versioning": {
"policy": "ZenHodl model weights as deployed at T-60 of each game are what counts. Each prediction row in raw.jsonl includes the model version ID so post-hoc retrains do not invalidate prior predictions.",
"retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
},
"publication": {
"live_url": "https://zenhodl.net/benchmarks/nba-playoffs-2026",
"manifest_file": "https://zenhodl.net/benchmarks/nba-playoffs-2026/manifest.json",
"raw_data_jsonl": "https://zenhodl.net/benchmarks/nba-playoffs-2026/raw.jsonl",
"we_publish_when_we_lose": true
},
"published_at": "2026-05-04T20:00:00Z",
"rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
"scope": {
"first_eligible_game_after": "2026-05-05T00:00:00Z",
"last_eligible_game_before": "2026-06-25T00:00:00Z",
"sport": "NBA",
"window": "2026 Conference Semifinals through Finals (inclusive)"
},
"snapshot": {
"matching": "Each NBA game matched to its Polymarket market by team names + game date. Matching script published in this repo so the join is auditable.",
"polymarket_source": "Polymarket NBA game-winner market YES-side mid price (best bid + best ask) / 2, fetched from clob.polymarket.com",
"tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
"timing": "Both predictions captured no later than T-60 minutes before official tip-off",
"zenhodl_source": "ZenHodl pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027NBA\u0027)"
},
"title": "ZenHodl vs Polymarket Consensus \u2014 NBA Playoffs 2026 Calibration Benchmark",
"version": "1.1",
"why_polymarket": "Polymarket\u0027s market price represents the consensus probability of every smart-money trader actively wagering real capital. Beating it on calibration is the canonical hedge-fund-grade benchmark for any sports forecasting model, equivalent to closing-line value (CLV) in traditional sports analytics."
}
Why this benchmark?
Polymarket's market price represents the consensus probability of every smart-money trader actively wagering real capital. Beating it on calibration is the canonical hedge-fund-grade benchmark for any sports forecasting model, equivalent to closing-line value (CLV) in traditional sports analytics.
The point is to find out transparently. We may win, we may lose, but the rows and scoring rules stay published.
If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.
7-day free trial. Same NBA pregame WP feed. Same calibration we're being judged on right here.