Collecting · Pre-committed · MLB

ZenHodl vs Polymarket Consensus

ZenHodl vs Polymarket Consensus — MLB June 2026 Regular-Season Sample tracks First 100 MLB regular-season games tipping on or after 2026-06-01 for which both ZenHodl and Polymarket markets are available at T-60. Same eligible games, same snapshot rule, same metrics. Updated as predictions and results are written.

Pre-commitment proof

Served file hash matches on-chain commit
Manifest SHA-256 (served right now)
08c8cb205021af6a16e7f5c57cde9aa651116680bac3afc5720791f35858572c

Reproduce: curl -s /benchmarks/mlb-june-2026-sample/manifest.json | sha256sum

On-chain commitment
0x274aa7d53413aab225913e5f79b1e941b19cf967c4c8b24768f5d60437cf81ec ↗
on-chain SHA: 08c8cb205021af6a16e7f5c57cde9aa651116680bac3afc5720791f35858572c

Block 85958373 on Polygon. Broadcast 2026-04-24T13:19:48 UTC. The hash above appears in the tx's data field.

Live leaderboard

n=55 resolved · raw=97 · last refresh 22:05:10 UTC
ZenHodl
Production MLB model · ZenHodl MLB pregame win probability via internal SignalEngine.get_pregame_predictions('MLB')
🟢
ECE (lower is better)
0.039
95% CI [0.026, 0.186]
Brier
0.230
Log loss
0.653
Accuracy
54.5%
POLYMARKET CONSENSUS
Live mid-price · the wisdom of every smart-money trader on the venue
🌀
ECE (lower is better)
0.082
95% CI [0.048, 0.224]
Brier
0.237
Log loss
0.666
Accuracy
56.4%

Reliability diagram

Predicted probability vs actual home-win rate, binned by 10. Diagonal = perfect calibration.

Each marker is one bin's average. Marker size scales with the number of games in the bin. Points above the diagonal mean predictions in that bucket were too pessimistic; below means too confident. The closer the points hug the diagonal across the chart, the better calibrated the model. A tiny y-jitter (±0.012) is applied so ZenHodl (offset up) and Polymarket Consensus (offset down) markers remain distinguishable when both bins share the same observed rate; hover any point for the true value.

Resolved games

Game ZenHodl WP Polymarket Consensus WP Outcome ZH Brier Polymarket Consensus Brier
LAA @ LAD
2026-06-07 · 2-9
0.676 ✓ 0.665 ✓ LAD W 0.105 0.112
MIL @ COL
2026-06-07 · 7-1
0.417 ✓ 0.375 ✓ MIL W 0.174 0.141
CLE @ TEX
2026-06-07 · 6-0
0.500 ✗ 0.575 ✗ CLE W 0.250 0.331
CIN @ STL
2026-06-07 · 5-6
0.548 ✓ 0.575 ✓ STL W 0.204 0.181
BOS @ NYY
2026-06-07 · 1-6
0.535 ✓ 0.575 ✓ NYY W 0.216 0.181
SEA @ DET
2026-06-07 · 4-0
0.475 ✓ 0.515 ✗ SEA W 0.226 0.265
BAL @ TOR
2026-06-07 · 4-6
0.515 ✓ 0.555 ✓ TOR W 0.235 0.198
CHW @ PHI
2026-06-07 · 6-3
0.530 ✗ 0.595 ✗ CHW W 0.281 0.354
PIT @ ATL
2026-06-07 · 3-6
0.581 ✓ 0.575 ✓ ATL W 0.175 0.181
LAA @ LAD
2026-06-07 · 2-9
0.663 ✓ 0.755 ✓ LAD W 0.113 0.060
MIL @ COL
2026-06-07 · 7-1
0.427 ✓ 0.265 ✓ MIL W 0.182 0.070
BOS @ NYY
2026-06-06 · 5-3
0.535 ✗ 0.515 ✗ BOS W 0.286 0.265
CLE @ TEX
2026-06-06 · 2-3
0.506 ✓ 0.525 ✓ TEX W 0.244 0.226
PIT @ ATL
2026-06-06 · 3-6
0.568 ✓ 0.505 ✓ ATL W 0.187 0.245
CHW @ PHI
2026-06-06 · 6-8
0.544 ✓ 0.555 ✓ PHI W 0.208 0.198
BAL @ TOR
2026-06-06 · 13-3
0.507 ✗ 0.515 ✗ BAL W 0.257 0.265
CIN @ STL
2026-06-06 · 3-10
0.536 ✓ 0.545 ✓ STL W 0.215 0.207
SEA @ DET
2026-06-06 · 3-7
0.482 ✗ 0.465 ✗ DET W 0.268 0.286
LAA @ LAD
2026-06-06 · 0-1
0.650 ✓ 0.635 ✓ LAD W 0.123 0.133
MIL @ COL
2026-06-06 · 9-7
0.438 ✓ 0.415 ✓ MIL W 0.192 0.172
CIN @ STL
2026-06-06 · 3-10
0.526 ✓ 0.575 ✓ STL W 0.225 0.181
CLE @ TEX
2026-06-06 · 2-3
0.498 ✗ 0.465 ✗ TEX W 0.252 0.286
PIT @ ATL
2026-06-05 · 3-6
0.555 ✓ 0.575 ✓ ATL W 0.198 0.181
BAL @ TOR
2026-06-05 · 13-3
0.514 ✗ 0.575 ✗ BAL W 0.265 0.331
BOS @ NYY
2026-06-05 · 5-3
0.551 ✗ 0.565 ✗ BOS W 0.303 0.319
CHW @ PHI
2026-06-05 · 6-8
0.533 ✓ 0.635 ✓ PHI W 0.218 0.133
SEA @ DET
2026-06-05 · 3-7
0.468 ✗ 0.465 ✗ DET W 0.283 0.286
LAD @ ARI
2026-06-05 · 2-3
0.485 ✗ 0.445 ✗ ARI W 0.265 0.308
PIT @ HOU
2026-06-05 · 5-1
0.508 ✗ 0.505 ✗ PIT W 0.258 0.255
TOR @ ATL
2026-06-04 · 3-7
0.594 ✓ 0.695 ✓ ATL W 0.165 0.093
BAL @ BOS
2026-06-04 · 1-8
0.510 ✓ 0.525 ✓ BOS W 0.240 0.226
CLE @ NYY
2026-06-04 · 5-4
0.507 ✗ 0.605 ✗ CLE W 0.257 0.366
LAD @ ARI
2026-06-04 · 7-0
0.492 ✓ 0.345 ✓ LAD W 0.242 0.119
COL @ LAA
2026-06-04 · 4-11
0.506 ✓ 0.575 ✓ LAA W 0.244 0.181
PIT @ HOU
2026-06-04 · 9-11
0.500 ✓ 0.435 ✗ HOU W 0.249 0.319
TEX @ STL
2026-06-03 · 7-4
0.514 ✗ 0.515 ✗ TEX W 0.264 0.265
TOR @ ATL
2026-06-03 · 3-4
0.580 ✓ 0.575 ✓ ATL W 0.176 0.181
CLE @ NYY
2026-06-03 · 9-4
0.515 ✗ 0.565 ✗ CLE W 0.265 0.319
BAL @ BOS
2026-06-03 · 4-2
0.502 ✗ 0.575 ✗ BAL W 0.253 0.331
NYM @ SEA
2026-06-03 · 3-8
0.581 ✓ 0.565 ✓ SEA W 0.176 0.189
CHW @ MIN
2026-06-03 · 4-6
0.504 ✓ 0.575 ✓ MIN W 0.246 0.181
LAD @ ARI
2026-06-03 · 6-5
0.498 ✓ 0.475 ✓ LAD W 0.248 0.226
NYM @ SEA
2026-06-03 · 3-8
0.567 ✓ 0.565 ✓ SEA W 0.187 0.189
COL @ LAA
2026-06-03 · 8-2
0.513 ✗ 0.595 ✗ COL W 0.263 0.354
PIT @ HOU
2026-06-03 · 10-6
0.507 ✗ 0.515 ✗ PIT W 0.257 0.265
CLE @ NYY
2026-06-02 · 9-4
0.524 ✗ 0.655 ✗ CLE W 0.274 0.429
TEX @ STL
2026-06-02 · 2-1
0.523 ✗ 0.495 ✓ TEX W 0.274 0.245
CHW @ MIN
2026-06-02 · 6-9
0.496 ✗ 0.465 ✗ MIN W 0.254 0.286
TOR @ ATL
2026-06-02 · 3-4
0.567 ✓ 0.525 ✓ ATL W 0.188 0.226
BAL @ BOS
2026-06-02 · 4-2
0.509 ✗ 0.555 ✗ BAL W 0.259 0.308
COL @ LAA
2026-06-02 · 9-8
0.521 ✗ 0.645 ✗ COL W 0.272 0.416
NYM @ SEA
2026-06-02 · 2-3
0.554 ✓ 0.545 ✓ SEA W 0.199 0.207
LAD @ ARI
2026-06-02 · 1-4
0.489 ✗ 0.405 ✗ ARI W 0.261 0.354
TEX @ STL
2026-06-01 · 2-1
0.535 ✗ 0.465 ✓ TEX W 0.286 0.216
CHW @ MIN
2026-06-01 · 6-9
0.487 ✗ 0.565 ✓ MIN W 0.263 0.189

Snapshotted, awaiting result (12)

DET @ TBR tip 2026-06-01T22:40 ZH 0.601 PM 0.595
MIA @ WSN tip 2026-06-01T22:45 ZH 0.541 PM 0.575
KCR @ CIN tip 2026-06-01T23:10 ZH 0.517 PM 0.545
SFG @ MIL tip 2026-06-01T23:40 ZH 0.583 PM 0.565
DET @ TBR tip 2026-06-02T22:40 ZH 0.586 PM 0.565
SDP @ PHI tip 2026-06-02T22:40 ZH 0.534 PM 0.585
MIA @ WSN tip 2026-06-02T22:45 ZH 0.533 PM 0.515
KCR @ CIN tip 2026-06-02T23:10 ZH 0.512 PM 0.525
SFG @ MIL tip 2026-06-02T23:40 ZH 0.588 PM 0.695
OAK @ CHC tip 2026-06-03T00:05 ZH 0.514 PM 0.535
MIA @ WSN tip 2026-06-03T17:05 ZH 0.527 PM 0.495
DET @ TBR tip 2026-06-03T17:10 ZH 0.573 PM 0.575
Read the full manifest (the rules) ↓
{
  "metrics": {
    "auxiliary": [
      "Brier score",
      "Log loss",
      "Accuracy"
    ],
    "confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
    "ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
    "extra_innings_rule": "The winning team at the end of the game (regardless of inning count) is the outcome. No ties.",
    "headline": "Expected Calibration Error (ECE), 10 equal-width bins"
  },
  "model_versioning": {
    "policy": "ZenHodl MLB model weights as deployed at T-60 of each game are what counts.",
    "retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
  },
  "publication": {
    "live_url": "https://zenhodl.net/benchmarks/mlb-june-2026-sample",
    "manifest_file": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/manifest.json",
    "raw_data_jsonl": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/raw.jsonl",
    "we_publish_when_we_lose": true
  },
  "published_at": "2026-04-24T12:50:00Z",
  "rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
  "sample_size_justification": "100 games resolves enough of the Polymarket spectrum (20-30%, 40-60%, 70-80% bins) to estimate ECE with a CI of approximately \u00b10.02 at 95% confidence, comparable to the NBA playoffs benchmark sample size.",
  "scope": {
    "first_eligible_game_after": "2026-06-01T00:00:00Z",
    "last_eligible_game_before": "2026-06-30T23:59:59Z",
    "max_games": 100,
    "sport": "MLB",
    "window": "First 100 MLB regular-season games tipping on or after 2026-06-01 for which both ZenHodl and Polymarket markets are available at T-60"
  },
  "snapshot": {
    "matching": "Each MLB game matched to its Polymarket market by team names + game date from slug.",
    "polymarket_source": "Polymarket MLB game-winner market mid price (best bid + best ask) / 2, fetched from clob.polymarket.com. Tip-off time extracted from Polymarket event slug pattern mlb-{home}-{away}-YYYY-MM-DD \u2014 not from market endDate (which is market resolution, not game start).",
    "tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
    "timing": "Both predictions captured no later than T-60 minutes before official first pitch",
    "zenhodl_source": "ZenHodl MLB pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027MLB\u0027)"
  },
  "title": "ZenHodl vs Polymarket Consensus \u2014 MLB June 2026 Regular-Season Sample",
  "version": "1.0",
  "why_mlb_regular_season": "Regular-season MLB offers a large, liquid Polymarket market for nearly every game. A 100-game June sample provides enough data points for meaningful ECE confidence intervals in roughly 30 days, enabling a faster pre-committed test cycle than waiting for October playoffs."
}

Why this benchmark?

Regular-season MLB offers a large, liquid Polymarket market for nearly every game. A 100-game June sample provides enough data points for meaningful ECE confidence intervals in roughly 30 days, enabling a faster pre-committed test cycle than waiting for October playoffs.

The point is to find out transparently. We may win, we may lose, but the rows and scoring rules stay published.

If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.

Try the same model live →

7-day free trial. Same MLB pregame WP feed. Same calibration we're being judged on right here.