How We Predict NBA Games with Machine Learning

Our NBA prediction model processes live game states and outputs a calibrated win probability — updated every score change, every timeout, every substitution. Here's exactly how it works under the hood.

The Architecture

The prediction pipeline has four layers, each correcting the one above it:

Base XGBoost model — trained on 40,000+ NBA game snapshots (2021-2026)
Team stats overlay — live offensive/defensive ratings (ORtg, DRtg, pace)
Injury overlay — 58 tracked star players via ESPN's injury API
Isotonic calibration — post-hoc probability correction (ECE = 0.002)

When LeBron gets ruled out 30 minutes before tip-off, our system automatically adjusts the Lakers' win probability by his impact factor (8%) before the game even starts.

The Training Data

We train on every NBA game from 2021-22 through 2025-26 — approximately 5,285 games with ~300 snapshots each. Each snapshot captures:

Score differential (home - away)
Time remaining (seconds + period)
Elo rating difference (our continuously-updated team strength metric)
Pregame win probability (pre-tip baseline)
Team stats: offensive rating, defensive rating, pace
Momentum features: scoring runs over last 2 and 5 minutes
Interaction terms: score_diff × time_fraction (a 10-point lead matters more with 2 minutes left than with 40 minutes left)

The model is a Split-Phase XGBoost — it learns different patterns for early-game (where Elo dominance matters most) and late-game (where score differential dominates).

Why Calibration Matters More Than Accuracy

A model can be highly accurate (picks the winner 75% of the time) but terribly calibrated (when it says 70%, teams actually win 55%). For trading on prediction markets like Polymarket, calibration is everything — because you're not picking winners, you're pricing probabilities.

Our calibration metric (Expected Calibration Error) is 0.002 — meaning when the model says 70%, teams win between 69.8% and 70.2% of the time. We achieve this through:

Post-hoc isotonic regression on a held-out 2024-25 season calibration set
Live rolling recalibration that auto-corrects every 25 resolved predictions

If the model starts drifting (maybe a rule change shifts scoring patterns), the live recalibrator catches it within days and corrects automatically. No manual intervention needed.

Real-Time Injury Adjustments

We track 58 NBA star players through ESPN's injury API with a 10-minute cache. Each player has a pre-computed impact factor:

Nikola Jokic: 10% impact (MVP-level)
LeBron James: 8%
Stephen Curry: 9%
Luka Doncic: 9%
Role players: 2% default

When a player is listed as OUT, the model subtracts their impact from the team's win probability. When they're QUESTIONABLE, the adjustment is halved. The total adjustment per team is capped at ±15% to prevent extreme swings.

The Results

Over 175 live moneyline bot trades:

Win rate: 66.3%
Average edge at entry: 8-12 cents
Best performing entry range: 55-70c (slight favorites)
Worst performing range: 45-55c (toss-ups — the model has no edge here)

The model is genuinely better at predicting NBA games than the Polymarket crowd in certain scenarios: injury-adjusted situations, late-game states with large leads, and games involving teams with extreme offensive/defensive rating differentials.

Try It Yourself

The full prediction API is available at zenhodl.net/docs with a 7-day free trial. You get:

Live win probabilities for every NBA game via /v1/games?sport=NBA
Edge detection signals via /v1/edges
Historical prediction accuracy via /v1/predictions/latest

Or build your own from scratch with our 6-module bot course — Module 3 covers the exact XGBoost training pipeline described here.

See our live trading results for verified, on-chain performance across all 11 sports.