ZenHodl vs Polymarket Consensus
ZenHodl vs Polymarket Consensus — MLB June 2026 Regular-Season Sample tracks First 100 MLB regular-season games tipping on or after 2026-06-01 for which both ZenHodl and Polymarket markets are available at T-60. Same eligible games, same snapshot rule, same metrics. Updated as predictions and results are written.
Pre-commitment proof
Served file hash matches on-chain commit
Reproduce: curl -s /benchmarks/mlb-june-2026-sample/manifest.json | sha256sum
Block 85958373 on Polygon. Broadcast 2026-04-24T13:19:48 UTC. The hash above appears in the tx's data field.
Live leaderboard
Reliability diagram
Each marker is one bin's average. Marker size scales with the number of games in the bin. Points above the diagonal mean predictions in that bucket were too pessimistic; below means too confident. The closer the points hug the diagonal across the chart, the better calibrated the model. A tiny y-jitter (±0.012) is applied so ZenHodl (offset up) and Polymarket Consensus (offset down) markers remain distinguishable when both bins share the same observed rate; hover any point for the true value.
Resolved games
| Game | ZenHodl WP | Polymarket Consensus WP | Outcome | ZH Brier | Polymarket Consensus Brier |
|---|---|---|---|---|---|
|
LAA @ LAD
2026-06-07 · 2-9
|
0.676 ✓ | 0.665 ✓ | LAD W | 0.105 | 0.112 |
|
MIL @ COL
2026-06-07 · 7-1
|
0.417 ✓ | 0.375 ✓ | MIL W | 0.174 | 0.141 |
|
CLE @ TEX
2026-06-07 · 6-0
|
0.500 ✗ | 0.575 ✗ | CLE W | 0.250 | 0.331 |
|
CIN @ STL
2026-06-07 · 5-6
|
0.548 ✓ | 0.575 ✓ | STL W | 0.204 | 0.181 |
|
BOS @ NYY
2026-06-07 · 1-6
|
0.535 ✓ | 0.575 ✓ | NYY W | 0.216 | 0.181 |
|
SEA @ DET
2026-06-07 · 4-0
|
0.475 ✓ | 0.515 ✗ | SEA W | 0.226 | 0.265 |
|
BAL @ TOR
2026-06-07 · 4-6
|
0.515 ✓ | 0.555 ✓ | TOR W | 0.235 | 0.198 |
|
CHW @ PHI
2026-06-07 · 6-3
|
0.530 ✗ | 0.595 ✗ | CHW W | 0.281 | 0.354 |
|
PIT @ ATL
2026-06-07 · 3-6
|
0.581 ✓ | 0.575 ✓ | ATL W | 0.175 | 0.181 |
|
LAA @ LAD
2026-06-07 · 2-9
|
0.663 ✓ | 0.755 ✓ | LAD W | 0.113 | 0.060 |
|
MIL @ COL
2026-06-07 · 7-1
|
0.427 ✓ | 0.265 ✓ | MIL W | 0.182 | 0.070 |
|
BOS @ NYY
2026-06-06 · 5-3
|
0.535 ✗ | 0.515 ✗ | BOS W | 0.286 | 0.265 |
|
CLE @ TEX
2026-06-06 · 2-3
|
0.506 ✓ | 0.525 ✓ | TEX W | 0.244 | 0.226 |
|
PIT @ ATL
2026-06-06 · 3-6
|
0.568 ✓ | 0.505 ✓ | ATL W | 0.187 | 0.245 |
|
CHW @ PHI
2026-06-06 · 6-8
|
0.544 ✓ | 0.555 ✓ | PHI W | 0.208 | 0.198 |
|
BAL @ TOR
2026-06-06 · 13-3
|
0.507 ✗ | 0.515 ✗ | BAL W | 0.257 | 0.265 |
|
CIN @ STL
2026-06-06 · 3-10
|
0.536 ✓ | 0.545 ✓ | STL W | 0.215 | 0.207 |
|
SEA @ DET
2026-06-06 · 3-7
|
0.482 ✗ | 0.465 ✗ | DET W | 0.268 | 0.286 |
|
LAA @ LAD
2026-06-06 · 0-1
|
0.650 ✓ | 0.635 ✓ | LAD W | 0.123 | 0.133 |
|
MIL @ COL
2026-06-06 · 9-7
|
0.438 ✓ | 0.415 ✓ | MIL W | 0.192 | 0.172 |
|
CIN @ STL
2026-06-06 · 3-10
|
0.526 ✓ | 0.575 ✓ | STL W | 0.225 | 0.181 |
|
CLE @ TEX
2026-06-06 · 2-3
|
0.498 ✗ | 0.465 ✗ | TEX W | 0.252 | 0.286 |
|
PIT @ ATL
2026-06-05 · 3-6
|
0.555 ✓ | 0.575 ✓ | ATL W | 0.198 | 0.181 |
|
BAL @ TOR
2026-06-05 · 13-3
|
0.514 ✗ | 0.575 ✗ | BAL W | 0.265 | 0.331 |
|
BOS @ NYY
2026-06-05 · 5-3
|
0.551 ✗ | 0.565 ✗ | BOS W | 0.303 | 0.319 |
|
CHW @ PHI
2026-06-05 · 6-8
|
0.533 ✓ | 0.635 ✓ | PHI W | 0.218 | 0.133 |
|
SEA @ DET
2026-06-05 · 3-7
|
0.468 ✗ | 0.465 ✗ | DET W | 0.283 | 0.286 |
|
LAD @ ARI
2026-06-05 · 2-3
|
0.485 ✗ | 0.445 ✗ | ARI W | 0.265 | 0.308 |
|
PIT @ HOU
2026-06-05 · 5-1
|
0.508 ✗ | 0.505 ✗ | PIT W | 0.258 | 0.255 |
|
TOR @ ATL
2026-06-04 · 3-7
|
0.594 ✓ | 0.695 ✓ | ATL W | 0.165 | 0.093 |
|
BAL @ BOS
2026-06-04 · 1-8
|
0.510 ✓ | 0.525 ✓ | BOS W | 0.240 | 0.226 |
|
CLE @ NYY
2026-06-04 · 5-4
|
0.507 ✗ | 0.605 ✗ | CLE W | 0.257 | 0.366 |
|
LAD @ ARI
2026-06-04 · 7-0
|
0.492 ✓ | 0.345 ✓ | LAD W | 0.242 | 0.119 |
|
COL @ LAA
2026-06-04 · 4-11
|
0.506 ✓ | 0.575 ✓ | LAA W | 0.244 | 0.181 |
|
PIT @ HOU
2026-06-04 · 9-11
|
0.500 ✓ | 0.435 ✗ | HOU W | 0.249 | 0.319 |
|
TEX @ STL
2026-06-03 · 7-4
|
0.514 ✗ | 0.515 ✗ | TEX W | 0.264 | 0.265 |
|
TOR @ ATL
2026-06-03 · 3-4
|
0.580 ✓ | 0.575 ✓ | ATL W | 0.176 | 0.181 |
|
CLE @ NYY
2026-06-03 · 9-4
|
0.515 ✗ | 0.565 ✗ | CLE W | 0.265 | 0.319 |
|
BAL @ BOS
2026-06-03 · 4-2
|
0.502 ✗ | 0.575 ✗ | BAL W | 0.253 | 0.331 |
|
NYM @ SEA
2026-06-03 · 3-8
|
0.581 ✓ | 0.565 ✓ | SEA W | 0.176 | 0.189 |
|
CHW @ MIN
2026-06-03 · 4-6
|
0.504 ✓ | 0.575 ✓ | MIN W | 0.246 | 0.181 |
|
LAD @ ARI
2026-06-03 · 6-5
|
0.498 ✓ | 0.475 ✓ | LAD W | 0.248 | 0.226 |
|
NYM @ SEA
2026-06-03 · 3-8
|
0.567 ✓ | 0.565 ✓ | SEA W | 0.187 | 0.189 |
|
COL @ LAA
2026-06-03 · 8-2
|
0.513 ✗ | 0.595 ✗ | COL W | 0.263 | 0.354 |
|
PIT @ HOU
2026-06-03 · 10-6
|
0.507 ✗ | 0.515 ✗ | PIT W | 0.257 | 0.265 |
|
CLE @ NYY
2026-06-02 · 9-4
|
0.524 ✗ | 0.655 ✗ | CLE W | 0.274 | 0.429 |
|
TEX @ STL
2026-06-02 · 2-1
|
0.523 ✗ | 0.495 ✓ | TEX W | 0.274 | 0.245 |
|
CHW @ MIN
2026-06-02 · 6-9
|
0.496 ✗ | 0.465 ✗ | MIN W | 0.254 | 0.286 |
|
TOR @ ATL
2026-06-02 · 3-4
|
0.567 ✓ | 0.525 ✓ | ATL W | 0.188 | 0.226 |
|
BAL @ BOS
2026-06-02 · 4-2
|
0.509 ✗ | 0.555 ✗ | BAL W | 0.259 | 0.308 |
|
COL @ LAA
2026-06-02 · 9-8
|
0.521 ✗ | 0.645 ✗ | COL W | 0.272 | 0.416 |
|
NYM @ SEA
2026-06-02 · 2-3
|
0.554 ✓ | 0.545 ✓ | SEA W | 0.199 | 0.207 |
|
LAD @ ARI
2026-06-02 · 1-4
|
0.489 ✗ | 0.405 ✗ | ARI W | 0.261 | 0.354 |
|
TEX @ STL
2026-06-01 · 2-1
|
0.535 ✗ | 0.465 ✓ | TEX W | 0.286 | 0.216 |
|
CHW @ MIN
2026-06-01 · 6-9
|
0.487 ✗ | 0.565 ✓ | MIN W | 0.263 | 0.189 |
Snapshotted, awaiting result (12)
Read the full manifest (the rules) ↓
{
"metrics": {
"auxiliary": [
"Brier score",
"Log loss",
"Accuracy"
],
"confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
"ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
"extra_innings_rule": "The winning team at the end of the game (regardless of inning count) is the outcome. No ties.",
"headline": "Expected Calibration Error (ECE), 10 equal-width bins"
},
"model_versioning": {
"policy": "ZenHodl MLB model weights as deployed at T-60 of each game are what counts.",
"retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
},
"publication": {
"live_url": "https://zenhodl.net/benchmarks/mlb-june-2026-sample",
"manifest_file": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/manifest.json",
"raw_data_jsonl": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/raw.jsonl",
"we_publish_when_we_lose": true
},
"published_at": "2026-04-24T12:50:00Z",
"rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
"sample_size_justification": "100 games resolves enough of the Polymarket spectrum (20-30%, 40-60%, 70-80% bins) to estimate ECE with a CI of approximately \u00b10.02 at 95% confidence, comparable to the NBA playoffs benchmark sample size.",
"scope": {
"first_eligible_game_after": "2026-06-01T00:00:00Z",
"last_eligible_game_before": "2026-06-30T23:59:59Z",
"max_games": 100,
"sport": "MLB",
"window": "First 100 MLB regular-season games tipping on or after 2026-06-01 for which both ZenHodl and Polymarket markets are available at T-60"
},
"snapshot": {
"matching": "Each MLB game matched to its Polymarket market by team names + game date from slug.",
"polymarket_source": "Polymarket MLB game-winner market mid price (best bid + best ask) / 2, fetched from clob.polymarket.com. Tip-off time extracted from Polymarket event slug pattern mlb-{home}-{away}-YYYY-MM-DD \u2014 not from market endDate (which is market resolution, not game start).",
"tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
"timing": "Both predictions captured no later than T-60 minutes before official first pitch",
"zenhodl_source": "ZenHodl MLB pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027MLB\u0027)"
},
"title": "ZenHodl vs Polymarket Consensus \u2014 MLB June 2026 Regular-Season Sample",
"version": "1.0",
"why_mlb_regular_season": "Regular-season MLB offers a large, liquid Polymarket market for nearly every game. A 100-game June sample provides enough data points for meaningful ECE confidence intervals in roughly 30 days, enabling a faster pre-committed test cycle than waiting for October playoffs."
}
Why this benchmark?
Regular-season MLB offers a large, liquid Polymarket market for nearly every game. A 100-game June sample provides enough data points for meaningful ECE confidence intervals in roughly 30 days, enabling a faster pre-committed test cycle than waiting for October playoffs.
The point is to find out transparently. We may win, we may lose, but the rows and scoring rules stay published.
If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.
7-day free trial. Same MLB pregame WP feed. Same calibration we're being judged on right here.