How we build independent win probability models for prediction markets
Current live platform coverage: NBA, NCAAMB, NCAAWB, CFB, NFL, NHL, MLB. Research and course extensions are called out separately below.
Current Proof Snapshot
59.3% win rate across 361 validation trades in 10 sports
Methodology explains the model design. Validation shows the current exported backtest snapshot, and results show live production outcomes.
ZenHodl Weekly
One weekly email with live results, one model insight, and product updates.
A short weekly note for builders, traders, and researchers following the model.
Most sports analytics services derive "fair value" by devigging sportsbook lines — averaging Pinnacle, FanDuel, and DraftKings odds. This gives you a consensus probability that tracks the market by construction. It's useful for sports betting (finding +EV against soft books), but it cannot find prediction market mispricings — because the output already agrees with the market.
ZenHodl models are trained on game state only: score differential, seconds remaining, period, Elo ratings, and sport-specific features. No odds, no lines, no market prices are used as inputs. This makes our output genuinely independent from the market — when our fair probability diverges from the Polymarket ask price, that divergence is a real signal, not noise.
The tradeoff: our models can have a worse Brier score than market-derived models in absolute forecasting terms. But independence is what creates trading value when the output is validated against real market prices. See the validation page for the latest exported backtest snapshot and live results for current production performance.
| Market-Derived | ZenHodl (Independent) | |
|---|---|---|
| Inputs | Sportsbook odds/lines | Score, time, Elo only |
| Output | Tracks market by construction | Genuinely independent |
| Brier Score | Better | Worse |
| Trading Value | Zero (agrees with market) | Possible when independently validated |
We scrape ESPN's play-by-play API across the core sports we model. Each game produces hundreds of snapshots — one per score change or significant event.
Sports: NBA, NCAAMB, NCAAWB, CFB, NFL, NHL, MLB
Data is stored as Apache Parquet files. One row = one game state (score, period, clock, ESPN WP, outcome label).
Module 1 of our course teaches you to build this exact scraper.
Each game state is featurized with 13–16 variables. We deliberately keep the feature set small — overfitting to noise destroys trading value.
| Feature | Sports | Description |
|---|---|---|
| score_diff | All | home_score − away_score |
| seconds_remaining | All | Total game seconds left |
| period | All | Current period/half/inning |
| time_fraction | All | Fraction of game elapsed (0→1) |
| elo_diff | All | Home Elo − Away Elo |
| pregame_wp | All | ESPN pre-game win probability (fixed prior) |
| score_diff_x_tf | All | Lead × time elapsed (interaction) |
| score_diff_sq | All | Lead² (quadratic, captures blowouts) |
| is_home_batting | MLB | 1 if home team is batting |
| down, distance | CFB/NFL | Football situation |
| yard_line | CFB/NFL | Field position |
| possession_home | CFB/NFL | 1 if home has the ball |
| pace features | NBA | total_score, ortg_diff, drtg_diff |
We use sport-specific models — no one-size-fits-all approach. The sections below distinguish the current live platform from adjacent research and course material.
The live platform covers NBA, NCAAMB, NCAAWB, CFB, NFL, NHL, MLB with sport-specific models, post-hoc isotonic calibration (ECE ≤ 0.002), and a four-layer prediction pipeline: base model, team/player overlays, isotonic calibration, and live rolling recalibration.
Split-Phase XGBoost with 16 features including team offensive/defensive ratings (ORtg, DRtg), pace, momentum (scoring runs over last 2 and 5 minutes), Elo ratings, and interaction terms (score_diff × time_fraction). NBA Brier: 0.124, ECE: 0.002. Trained on 5,285 games (2021–2026) with walk-forward season-based splits. Post-hoc isotonic recalibration on the 2024–25 season.
Injury overlay: 58 NBA star players tracked in real-time via ESPN's injury API (10-minute cache). Each player has a pre-computed impact factor (e.g., Jokic: 10%, LeBron: 8%, Curry: 9%). When a player is OUT, the model subtracts their impact; QUESTIONABLE halves the adjustment. Total cap: ±15%.
Model selection uses a hybrid criterion: near-best Brier score AND best trading value (c/trade) on a held-out backtest window. This prevents selecting models that look good on calibration metrics but produce worse P&L.
XGBoost with 17 features including all basketball features plus hockey-specific metrics: power play %, penalty kill %, save %, faceoff win %, and penalty minutes differential. NHL Brier: 0.157 (improved 23.6% from 0.205), ECE: 0.002. Trained on 4,225 games.
Injury overlay: 44 NHL skaters tracked via ESPN (goalies excluded — handled by the separate goalie quality adjustment). Star impact factors: McDavid 10%, Draisaitl 8%, MacKinnon 9%, etc.
Goalie & shot overlay: Starting goalie detection adjusts for goalie quality. Expected goals (xG), Corsi, and power play state provide real-time shot quality signals. Combined cap: ±12%.
Ensemble model with starting pitcher ERA, WHIP, K/9 as base features. MLB Brier: 0.151, ECE: 0.002. Three post-model overlays stack on top of the base prediction:
XGBoost ML model trained on 2,949 matches with 56,000 in-game snapshots. Replaced the original analytical Poisson model (9.4% Brier improvement). Features: score differential, time fraction, Elo difference, total goals, second-half flag, and league-specific effects (one-hot encoded). Isotonic calibration on the last 20% of matches. AUC: 0.830.
Both esports are live on the platform with dedicated XGBoost models and active trading.
CS2: 15 features including round differential, Elo, map-specific win rates (per team per map), recent form, head-to-head record, and Elo momentum. Economy data from bo3.gg and HLTV scorebot provides equipment value and consecutive-loss tracking as post-model overlays. Entry restricted to underdogs (20–42c) with 20c+ minimum edge after a live P&L audit found the model overperforms on underdog lines and underperforms on favorites.
LoL: 20 features including gold differential, kills, towers, dragons (absolute count + soul eligibility + soul obtained), barons, inhibitor advantage, Elo, series score, and best-of format. Dragon soul features are critical — the 4th dragon grants a massive team-wide buff that shifts win probability by 15–25%. Brier: 0.123, AUC: 0.861. Trained on 21,636 snapshots across 4,200 games.
Hierarchical analytical model: player-specific serve win rates (by surface) → point probability → game probability → set probability → match probability. Serve rates are computed from 13,174 ATP matches (2020–2024) using the Sackmann dataset, giving us player-specific rates for hard, clay, and grass surfaces instead of tour averages. Elo ratings are surface-aware (4,606 players). The live recalibrator auto-activates after 50 resolved trades.
Every prediction passes through a four-layer correction pipeline after the base model:
Spread and total models require a different setup from moneyline markets: regression on remaining margin or total, then a distributional layer to convert that forecast into cover/over probabilities.
Our backtests are designed to avoid the common mistakes that inflate results.
We believe in showing failures alongside successes.
Read the proof, inspect live results, or go straight to the live platform.