ZenHodl vs Polymarket Consensus
ZenHodl vs Polymarket Consensus — NHL Playoffs 2026 Calibration Benchmark tracks 2026 Stanley Cup Playoffs Conference Semifinals through Stanley Cup Finals (inclusive). Same eligible games, same snapshot rule, same metrics. Updated as predictions and results are written.
Pre-commitment proof
Served file hash matches on-chain commit
Reproduce: curl -s /benchmarks/nhl-playoffs-2026/manifest.json | sha256sum
Block 85958367 on Polygon. Broadcast 2026-04-24T13:19:37 UTC. The hash above appears in the tx's data field.
Live leaderboard
Reliability diagram
Each marker is one bin's average. Marker size scales with the number of games in the bin. Points above the diagonal mean predictions in that bucket were too pessimistic; below means too confident. The closer the points hug the diagonal across the chart, the better calibrated the model. A tiny y-jitter (±0.012) is applied so ZenHodl (offset up) and Polymarket Consensus (offset down) markers remain distinguishable when both bins share the same observed rate; hover any point for the true value.
Resolved games
| Game | ZenHodl WP | Polymarket Consensus WP | Outcome | ZH Brier | Polymarket Consensus Brier |
|---|---|---|---|---|---|
|
VGK @ CAR
2026-06-05 · 3-4
|
0.539 ✓ | 0.585 ✓ | CAR W | 0.212 | 0.172 |
|
VGK @ CAR
2026-06-03 · 5-4
|
0.539 ✗ | 0.595 ✗ | VGK W | 0.291 | 0.354 |
|
MTL @ CAR
2026-05-30 · 1-6
|
0.743 ✓ | 0.695 ✓ | CAR W | 0.066 | 0.093 |
|
MTL @ CAR
2026-05-23 · 2-3
|
0.743 ✓ | 0.645 ✓ | CAR W | 0.066 | 0.126 |
|
VGK @ COL
2026-05-23 · 3-1
|
0.539 ✗ | 0.605 ✗ | VGK W | 0.291 | 0.366 |
|
MTL @ CAR
2026-05-22 · 6-2
|
0.743 ✗ | 0.645 ✗ | MTL W | 0.552 | 0.416 |
|
VGK @ COL
2026-05-21 · 4-2
|
0.539 ✗ | 0.605 ✗ | VGK W | 0.291 | 0.366 |
|
MTL @ BUF
2026-05-18 · 3-2
|
0.539 ✗ | 0.515 ✗ | MTL W | 0.291 | 0.265 |
|
VGK @ ANA
2026-05-15 · 5-1
|
0.201 ✓ | 0.505 ✗ | VGK W | 0.041 | 0.255 |
|
MTL @ BUF
2026-05-14 · 6-3
|
0.538 ✗ | 0.515 ✗ | MTL W | 0.289 | 0.265 |
|
MIN @ COL
2026-05-14 · 3-4
|
0.538 ✓ | 0.665 ✓ | COL W | 0.213 | 0.112 |
|
COL @ MIN
2026-05-12 · 5-2
|
0.484 ✓ | 0.435 ✓ | COL W | 0.235 | 0.189 |
|
VGK @ ANA
2026-05-11 · 3-4
|
0.201 ✗ | 0.515 ✓ | ANA W | 0.638 | 0.235 |
|
COL @ MIN
2026-05-10 · 1-5
|
0.484 ✗ | 0.445 ✗ | MIN W | 0.266 | 0.308 |
|
CAR @ PHI
2026-05-09 · 3-2
|
0.370 ✓ | 0.365 ✓ | CAR W | 0.137 | 0.133 |
|
VGK @ ANA
2026-05-09 · 6-2
|
0.201 ✓ | 0.505 ✗ | VGK W | 0.041 | 0.255 |
|
MTL @ BUF
2026-05-08 · 5-1
|
0.538 ✗ | 0.545 ✗ | MTL W | 0.289 | 0.297 |
|
PHI @ CAR
2026-05-08 · 1-4
|
0.564 ✓ | 0.375 ✗ | CAR W | 0.190 | 0.391 |
|
VGK @ ANA
2026-05-07 · 1-3
|
0.201 ✗ | 0.615 ✓ | ANA W | 0.638 | 0.148 |
|
COL @ MIN
2026-05-06 · 5-2
|
0.484 ✓ | 0.355 ✓ | COL W | 0.235 | 0.126 |
|
VGK @ ANA
2026-05-05 · 3-1
|
0.312 ✓ | 0.385 ✓ | VGK W | 0.098 | 0.148 |
|
CAR @ PHI
2026-05-04 · 3-2
|
0.398 ✓ | 0.305 ✓ | CAR W | 0.158 | 0.093 |
|
COL @ MIN
2026-05-04 · 9-6
|
0.503 ✗ | 0.355 ✓ | COL W | 0.253 | 0.126 |
Snapshotted, awaiting result (1)
Read the full manifest (the rules) ↓
{
"metrics": {
"auxiliary": [
"Brier score",
"Log loss",
"Accuracy"
],
"confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
"ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
"headline": "Expected Calibration Error (ECE), 10 equal-width bins",
"overtime_rule": "Regulation, overtime, and shootout outcomes all count as the final winner. No tie logic."
},
"model_versioning": {
"policy": "ZenHodl model weights as deployed at T-60 of each game are what counts. Each prediction row in raw.jsonl includes the model version ID so post-hoc retrains do not invalidate prior predictions.",
"retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
},
"publication": {
"live_url": "https://zenhodl.net/benchmarks/nhl-playoffs-2026",
"manifest_file": "https://zenhodl.net/benchmarks/nhl-playoffs-2026/manifest.json",
"raw_data_jsonl": "https://zenhodl.net/benchmarks/nhl-playoffs-2026/raw.jsonl",
"we_publish_when_we_lose": true
},
"published_at": "2026-04-24T12:50:00Z",
"rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
"scope": {
"first_eligible_game_after": "2026-05-04T00:00:00Z",
"last_eligible_game_before": "2026-06-25T00:00:00Z",
"sport": "NHL",
"window": "2026 Stanley Cup Playoffs Conference Semifinals through Stanley Cup Finals (inclusive)"
},
"snapshot": {
"matching": "Each NHL game matched to its Polymarket market by team names + game date. Matching script published in this repo so the join is auditable.",
"polymarket_source": "Polymarket NHL game-winner market mid price (best bid + best ask) / 2, fetched from clob.polymarket.com",
"tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
"timing": "Both predictions captured no later than T-60 minutes before official puck drop",
"zenhodl_source": "ZenHodl NHL pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027NHL\u0027)"
},
"title": "ZenHodl vs Polymarket Consensus \u2014 NHL Playoffs 2026 Calibration Benchmark",
"version": "1.0",
"why_polymarket": "Polymarket\u0027s mid-price is the consensus probability of every smart-money trader actively wagering real capital on the game outcome. Beating it on calibration is the canonical hedge-fund-grade benchmark for a sports forecasting model.",
"why_stanley_cup_conf_semis": "Similar structure to the NBA benchmark. Starting at Conference Semifinals (round 2) gives a clean pre-commit boundary after the first round winners are known, while preserving a ~35-game sample."
}
Why this benchmark?
Polymarket's mid-price is the consensus probability of every smart-money trader actively wagering real capital on the game outcome. Beating it on calibration is the canonical hedge-fund-grade benchmark for a sports forecasting model.
The point is to find out transparently. We may win, we may lose, but the rows and scoring rules stay published.
If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.
7-day free trial. Same NHL pregame WP feed. Same calibration we're being judged on right here.