Collecting · Pre-committed · NHL

ZenHodl vs Polymarket Consensus

ZenHodl vs Polymarket Consensus — NHL Playoffs 2026 Calibration Benchmark tracks 2026 Stanley Cup Playoffs Conference Semifinals through Stanley Cup Finals (inclusive). Same eligible games, same snapshot rule, same metrics. Updated as predictions and results are written.

Sibling benchmark on the same NHL games: ZenHodl vs Pinnacle Closing Line →

Pre-commitment proof

Served file hash matches on-chain commit
Manifest SHA-256 (served right now)
ecd611c8b3955f98d25a4c4e5c182b253ef2c8af36f8d636c862f12bc1806b25

Reproduce: curl -s /benchmarks/nhl-playoffs-2026/manifest.json | sha256sum

On-chain commitment
0x1d480ac8e6cbb9ec3f091d0488e9b42cabb7d81fc292bf7c2a7c4a44307cd959 ↗
on-chain SHA: ecd611c8b3955f98d25a4c4e5c182b253ef2c8af36f8d636c862f12bc1806b25

Block 85958367 on Polygon. Broadcast 2026-04-24T13:19:37 UTC. The hash above appears in the tx's data field.

Live leaderboard

n=23 resolved · raw=24 · last refresh 21:05:12 UTC
ZenHodl
Production NHL model · ZenHodl NHL pregame win probability via internal SignalEngine.get_pregame_predictions('NHL')
🟢
ECE (lower is better)
0.232
95% CI [0.129, 0.425]
Brier
0.251
Log loss
0.702
Accuracy
52.2%
POLYMARKET CONSENSUS
Live mid-price · the wisdom of every smart-money trader on the venue
🌀
ECE (lower is better)
0.175
95% CI [0.094, 0.367]
Brier
0.228
Log loss
0.647
Accuracy
52.2%

Reliability diagram

Predicted probability vs actual home-win rate, binned by 10. Diagonal = perfect calibration.

Each marker is one bin's average. Marker size scales with the number of games in the bin. Points above the diagonal mean predictions in that bucket were too pessimistic; below means too confident. The closer the points hug the diagonal across the chart, the better calibrated the model. A tiny y-jitter (±0.012) is applied so ZenHodl (offset up) and Polymarket Consensus (offset down) markers remain distinguishable when both bins share the same observed rate; hover any point for the true value.

Resolved games

Game ZenHodl WP Polymarket Consensus WP Outcome ZH Brier Polymarket Consensus Brier
VGK @ CAR
2026-06-05 · 3-4
0.539 ✓ 0.585 ✓ CAR W 0.212 0.172
VGK @ CAR
2026-06-03 · 5-4
0.539 ✗ 0.595 ✗ VGK W 0.291 0.354
MTL @ CAR
2026-05-30 · 1-6
0.743 ✓ 0.695 ✓ CAR W 0.066 0.093
MTL @ CAR
2026-05-23 · 2-3
0.743 ✓ 0.645 ✓ CAR W 0.066 0.126
VGK @ COL
2026-05-23 · 3-1
0.539 ✗ 0.605 ✗ VGK W 0.291 0.366
MTL @ CAR
2026-05-22 · 6-2
0.743 ✗ 0.645 ✗ MTL W 0.552 0.416
VGK @ COL
2026-05-21 · 4-2
0.539 ✗ 0.605 ✗ VGK W 0.291 0.366
MTL @ BUF
2026-05-18 · 3-2
0.539 ✗ 0.515 ✗ MTL W 0.291 0.265
VGK @ ANA
2026-05-15 · 5-1
0.201 ✓ 0.505 ✗ VGK W 0.041 0.255
MTL @ BUF
2026-05-14 · 6-3
0.538 ✗ 0.515 ✗ MTL W 0.289 0.265
MIN @ COL
2026-05-14 · 3-4
0.538 ✓ 0.665 ✓ COL W 0.213 0.112
COL @ MIN
2026-05-12 · 5-2
0.484 ✓ 0.435 ✓ COL W 0.235 0.189
VGK @ ANA
2026-05-11 · 3-4
0.201 ✗ 0.515 ✓ ANA W 0.638 0.235
COL @ MIN
2026-05-10 · 1-5
0.484 ✗ 0.445 ✗ MIN W 0.266 0.308
CAR @ PHI
2026-05-09 · 3-2
0.370 ✓ 0.365 ✓ CAR W 0.137 0.133
VGK @ ANA
2026-05-09 · 6-2
0.201 ✓ 0.505 ✗ VGK W 0.041 0.255
MTL @ BUF
2026-05-08 · 5-1
0.538 ✗ 0.545 ✗ MTL W 0.289 0.297
PHI @ CAR
2026-05-08 · 1-4
0.564 ✓ 0.375 ✗ CAR W 0.190 0.391
VGK @ ANA
2026-05-07 · 1-3
0.201 ✗ 0.615 ✓ ANA W 0.638 0.148
COL @ MIN
2026-05-06 · 5-2
0.484 ✓ 0.355 ✓ COL W 0.235 0.126
VGK @ ANA
2026-05-05 · 3-1
0.312 ✓ 0.385 ✓ VGK W 0.098 0.148
CAR @ PHI
2026-05-04 · 3-2
0.398 ✓ 0.305 ✓ CAR W 0.158 0.093
COL @ MIN
2026-05-04 · 9-6
0.503 ✗ 0.355 ✓ COL W 0.253 0.126

Snapshotted, awaiting result (1)

TBL @ MTL tip 2026-05-04T00:00 ZH 0.503 PM 0.405
Read the full manifest (the rules) ↓
{
  "metrics": {
    "auxiliary": [
      "Brier score",
      "Log loss",
      "Accuracy"
    ],
    "confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
    "ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
    "headline": "Expected Calibration Error (ECE), 10 equal-width bins",
    "overtime_rule": "Regulation, overtime, and shootout outcomes all count as the final winner. No tie logic."
  },
  "model_versioning": {
    "policy": "ZenHodl model weights as deployed at T-60 of each game are what counts. Each prediction row in raw.jsonl includes the model version ID so post-hoc retrains do not invalidate prior predictions.",
    "retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
  },
  "publication": {
    "live_url": "https://zenhodl.net/benchmarks/nhl-playoffs-2026",
    "manifest_file": "https://zenhodl.net/benchmarks/nhl-playoffs-2026/manifest.json",
    "raw_data_jsonl": "https://zenhodl.net/benchmarks/nhl-playoffs-2026/raw.jsonl",
    "we_publish_when_we_lose": true
  },
  "published_at": "2026-04-24T12:50:00Z",
  "rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
  "scope": {
    "first_eligible_game_after": "2026-05-04T00:00:00Z",
    "last_eligible_game_before": "2026-06-25T00:00:00Z",
    "sport": "NHL",
    "window": "2026 Stanley Cup Playoffs Conference Semifinals through Stanley Cup Finals (inclusive)"
  },
  "snapshot": {
    "matching": "Each NHL game matched to its Polymarket market by team names + game date. Matching script published in this repo so the join is auditable.",
    "polymarket_source": "Polymarket NHL game-winner market mid price (best bid + best ask) / 2, fetched from clob.polymarket.com",
    "tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
    "timing": "Both predictions captured no later than T-60 minutes before official puck drop",
    "zenhodl_source": "ZenHodl NHL pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027NHL\u0027)"
  },
  "title": "ZenHodl vs Polymarket Consensus \u2014 NHL Playoffs 2026 Calibration Benchmark",
  "version": "1.0",
  "why_polymarket": "Polymarket\u0027s mid-price is the consensus probability of every smart-money trader actively wagering real capital on the game outcome. Beating it on calibration is the canonical hedge-fund-grade benchmark for a sports forecasting model.",
  "why_stanley_cup_conf_semis": "Similar structure to the NBA benchmark. Starting at Conference Semifinals (round 2) gives a clean pre-commit boundary after the first round winners are known, while preserving a ~35-game sample."
}

Why this benchmark?

Polymarket's mid-price is the consensus probability of every smart-money trader actively wagering real capital on the game outcome. Beating it on calibration is the canonical hedge-fund-grade benchmark for a sports forecasting model.

The point is to find out transparently. We may win, we may lose, but the rows and scoring rules stay published.

If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.

Try the same model live →

7-day free trial. Same NHL pregame WP feed. Same calibration we're being judged on right here.