Live · Pre-committed · NBA Playoffs 2026

ZenHodl vs Polymarket Consensus

A pre-committed, on-chain-anchored head-to-head calibration benchmark covering every NBA Conference Semifinal, Conference Final, and NBA Finals game from May 5 through June 25, 2026. Our model versus the live betting market. Same games, same instant, same metrics. Updated after every game.

Pre-commitment proof

Manifest SHA-256
08c8cb205021af6a16e7f5c57cde9aa651116680bac3afc5720791f35858572c

Reproduce: curl -s /benchmarks/nba-playoffs-2026/manifest.json | sha256sum

On-chain commitment
0x274aa7d53413aab225913e5f79b1e941b19cf967c4c8b24768f5d60437cf81ec ↗

Block 85958373 on Polygon. Broadcast 2026-04-24T13:19:48 UTC. The hash above appears in the tx's data field.

Live leaderboard

n=0 resolved · last refresh 19:35:01 UTC
ZenHodl
Production NBA model · ECE 4.39% on NCAAMB regular season holdout
🟢
ECE (lower is better)
Brier
Log loss
Accuracy
Polymarket Consensus
Live mid-price · the wisdom of every smart-money trader on the venue
🌀
ECE (lower is better)
Brier
Log loss
Accuracy
Read the full manifest (the rules) ↓
{
  "metrics": {
    "auxiliary": [
      "Brier score",
      "Log loss",
      "Accuracy"
    ],
    "confidence_interval": "95% bootstrap CI on ECE with 1000 resamples, published alongside point estimate",
    "ece_formula": "Sum over bins of |bin_avg_pred - bin_avg_outcome| weighted by bin sample fraction",
    "extra_innings_rule": "The winning team at the end of the game (regardless of inning count) is the outcome. No ties.",
    "headline": "Expected Calibration Error (ECE), 10 equal-width bins"
  },
  "model_versioning": {
    "policy": "ZenHodl MLB model weights as deployed at T-60 of each game are what counts.",
    "retrains_during_window": "Permitted. Disclosed in the per-game row\u0027s model_version field."
  },
  "publication": {
    "live_url": "https://zenhodl.net/benchmarks/mlb-june-2026-sample",
    "manifest_file": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/manifest.json",
    "raw_data_jsonl": "https://zenhodl.net/benchmarks/mlb-june-2026-sample/raw.jsonl",
    "we_publish_when_we_lose": true
  },
  "published_at": "2026-04-24T12:50:00Z",
  "rule_changes": "Once this manifest\u0027s SHA-256 hash is broadcast on Polygon, the rules above are frozen. If ZenHodl edits this file at any later point, the on-chain hash will not match the served file. Anyone can verify by hashing the served manifest.json and comparing to the on-chain transaction data field.",
  "sample_size_justification": "100 games resolves enough of the Polymarket spectrum (20-30%, 40-60%, 70-80% bins) to estimate ECE with a CI of approximately \u00b10.02 at 95% confidence, comparable to the NBA playoffs benchmark sample size.",
  "scope": {
    "first_eligible_game_after": "2026-06-01T00:00:00Z",
    "last_eligible_game_before": "2026-06-30T23:59:59Z",
    "max_games": 100,
    "sport": "MLB",
    "window": "First 100 MLB regular-season games tipping on or after 2026-06-01 for which both ZenHodl and Polymarket markets are available at T-60"
  },
  "snapshot": {
    "matching": "Each MLB game matched to its Polymarket market by team names + game date from slug.",
    "polymarket_source": "Polymarket MLB game-winner market mid price (best bid + best ask) / 2, fetched from clob.polymarket.com. Tip-off time extracted from Polymarket event slug pattern mlb-{home}-{away}-YYYY-MM-DD \u2014 not from market endDate (which is market resolution, not game start).",
    "tie_handling": "If either source is unavailable at T-60, the game is excluded from BOTH model\u0027s metrics. Recorded with status=\u0027polymarket_unavailable\u0027 or \u0027zenhodl_unavailable\u0027 in the public raw.jsonl.",
    "timing": "Both predictions captured no later than T-60 minutes before official first pitch",
    "zenhodl_source": "ZenHodl MLB pregame win probability via internal SignalEngine.get_pregame_predictions(\u0027MLB\u0027)"
  },
  "title": "ZenHodl vs Polymarket Consensus \u2014 MLB June 2026 Regular-Season Sample",
  "version": "1.0",
  "why_mlb_regular_season": "Regular-season MLB offers a large, liquid Polymarket market for nearly every game. A 100-game June sample provides enough data points for meaningful ECE confidence intervals in roughly 30 days, enabling a faster pre-committed test cycle than waiting for October playoffs."
}

Why benchmark against the market?

Polymarket's mid-price is the consensus probability of every smart-money trader actively wagering real capital on the outcome. It's the toughest opponent we could pick — not because it's a smarter model, but because it aggregates every smart model.

A solo operator beating the market on calibration would be the canonical hedge-fund-grade demonstration that the underlying model has edge. We may not. The point is to find out, transparently, in front of you.

If we lose, the loss appears here, in the same row, with the same Brier score. The manifest commits us to publishing that outcome — there is no edit path that changes it without invalidating the on-chain hash.

Try the same model live →

7-day free trial. Same NBA pregame WP feed. Same calibration we're being judged on right here.