← Back to blog

Build an NBA Win Probability Model in Python: From Box Scores to Live Predictions

2026-04-10 nba python machine-learning tutorial xgboost beginner

Every NBA broadcast shows a "win probability" graphic that updates after each play. Behind it is a machine learning model that takes the current game state (score, time remaining, who has the ball) and outputs the probability that each team wins.

In this tutorial, you'll build one from scratch in Python. By the end, you'll have a model that takes any NBA game state and outputs a calibrated win probability that you can use for analysis, betting, or building a live dashboard.

What We're Building

Input:

predict_wp(score_diff=8, seconds_remaining=720, period=3, elo_diff=150)
# → 0.78 (home team has 78% chance of winning)

Output: a number between 0 and 1 that represents the true probability the home team wins from this game state. Not just "who's ahead" but "how likely is the outcome" — calibrated so that when the model says 70%, the team actually wins 70% of the time.

Step 1: Get the Training Data

NBA win probability models are trained on in-game snapshots — thousands of (game_state, did_home_win) pairs from historical games.

Option A: Use ESPN Play-by-Play Data

import requests
import json
from datetime import datetime

def fetch_espn_pbp(game_id: str) -> list:
    """Fetch play-by-play data from ESPN's public API."""
    url = f"https://site.api.espn.com/apis/site/v2/sports/basketball/nba/summary"
    resp = requests.get(url, params={"event": game_id})
    resp.raise_for_status()
    data = resp.json()

    plays = []
    for play in data.get("plays", []):
        plays.append({
            "clock": play.get("clock", {}).get("displayValue", ""),
            "period": play.get("period", {}).get("number", 0),
            "home_score": play.get("homeScore", 0),
            "away_score": play.get("awayScore", 0),
            "text": play.get("text", ""),
        })
    return plays

Option B: Use Pre-Built Training Parquets

If you want to skip the data collection and jump to modeling, ZenHodl publishes training-ready parquet files with 5 seasons of NBA snapshot data (2021-22 through 2025-26, ~2M rows).

import pandas as pd

# Load training data (each row = one game state snapshot)
df = pd.read_parquet("wp_training_NBA_2024-25.parquet")
print(f"Rows: {len(df)}, Columns: {list(df.columns)}")

Step 2: Feature Engineering

The raw play-by-play data needs to be transformed into features the model can learn from:

import numpy as np

def engineer_features(df: pd.DataFrame) -> pd.DataFrame:
    """Transform raw game snapshots into model-ready features."""

    # Core game state
    df["score_diff"] = df["home_score"] - df["away_score"]
    df["total_score"] = df["home_score"] + df["away_score"]

    # Time features
    df["time_fraction"] = 1 - (df["seconds_remaining"] / 2880)  # 48 min = 2880 sec
    df["score_diff_x_tf"] = df["score_diff"] * df["time_fraction"]
    df["score_diff_sq"] = df["score_diff"] ** 2

    # Elo interaction (pre-game strength × current lead)
    df["score_diff_x_elo"] = df["score_diff"] * df["elo_diff"]

    return df

The Most Important Features (by XGBoost importance)

From our production model:

Feature Importance Why it matters
score_diff 28% The lead is the single strongest signal
time_fraction 19% Same lead means different things in Q1 vs Q4
score_diff_x_tf 15% Interaction: a 10-point lead at 90% elapsed is near-certain
elo_diff 12% Better team is more likely to come back from a deficit
score_diff_sq 8% Nonlinear: a 20-point lead is more than 2x as safe as a 10-point lead
ortg_diff 5% Team offensive efficiency differential
pace_diff 4% Faster pace → more possessions → more variance → deficit less safe

Step 3: Train the Model

from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import brier_score_loss, roc_auc_score

FEATURE_COLS = [
    "score_diff", "seconds_remaining", "period", "time_fraction",
    "elo_diff", "score_diff_x_tf", "score_diff_sq", "total_score",
    "score_diff_x_elo",
]

# Prepare data
X = df[FEATURE_COLS].values
y = df["home_win"].values  # 1 if home team won, 0 if away

# Walk-forward split (train on older seasons, test on newest)
# DON'T use random split — it causes temporal leakage
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False  # shuffle=False preserves time order
)

# Train XGBoost
model = XGBClassifier(
    n_estimators=300,
    max_depth=5,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    objective="binary:logistic",
    eval_metric="logloss",
    random_state=42,
)

model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    verbose=50,
)

# Evaluate
y_pred = model.predict_proba(X_test)[:, 1]
brier = brier_score_loss(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred)
print(f"Brier score: {brier:.4f}")
print(f"ROC-AUC:     {auc:.4f}")

A good NBA win probability model should achieve: - Brier score < 0.15 (ours is 0.124 after calibration) - ROC-AUC > 0.85 (ours is 0.897)

Step 4: Calibrate (The Critical Step)

Raw XGBoost outputs are NOT well-calibrated. The model might say "70%" but the team actually wins 75% of the time in that range. This miscalibration costs real money if you're trading on the probabilities.

from sklearn.isotonic import IsotonicRegression

# Split off a calibration set (separate from train and test)
X_cal = X_test[:len(X_test)//2]
y_cal = y_test[:len(y_test)//2]
X_final_test = X_test[len(X_test)//2:]
y_final_test = y_test[len(y_test)//2:]

# Get raw predictions on calibration set
raw_cal_probs = model.predict_proba(X_cal)[:, 1]

# Fit isotonic regression (maps raw probs → calibrated probs)
calibrator = IsotonicRegression(y_min=0.005, y_max=0.995, out_of_bounds="clip")
calibrator.fit(raw_cal_probs, y_cal)

# Apply calibration to test set
raw_test_probs = model.predict_proba(X_final_test)[:, 1]
calibrated_probs = calibrator.transform(raw_test_probs)

# Compare
brier_raw = brier_score_loss(y_final_test, raw_test_probs)
brier_cal = brier_score_loss(y_final_test, calibrated_probs)
print(f"Raw Brier:        {brier_raw:.4f}")
print(f"Calibrated Brier: {brier_cal:.4f}")
print(f"Improvement:      {(brier_raw - brier_cal) / brier_raw * 100:.1f}%")

Measuring Calibration: ECE

Expected Calibration Error (ECE) measures how well your predicted probabilities match reality:

def compute_ece(y_true, y_pred, n_bins=10):
    """Expected Calibration Error — lower is better."""
    bins = np.linspace(0, 1, n_bins + 1)
    ece = 0.0
    for i in range(n_bins):
        mask = (y_pred >= bins[i]) & (y_pred < bins[i + 1])
        if mask.sum() > 0:
            bin_pred = y_pred[mask].mean()
            bin_true = y_true[mask].mean()
            ece += abs(bin_pred - bin_true) * mask.sum() / len(y_true)
    return ece

ece_raw = compute_ece(y_final_test, raw_test_probs)
ece_cal = compute_ece(y_final_test, calibrated_probs)
print(f"Raw ECE:        {ece_raw:.4f}")
print(f"Calibrated ECE: {ece_cal:.4f}")

Target: ECE < 0.01. Our production model achieves 0.002.

Step 5: Save and Use in Production

import pickle

model_package = {
    "model": model,
    "calibrator": calibrator,
    "feature_names": FEATURE_COLS,
    "metrics": {"brier": brier_cal, "auc": auc, "ece": ece_cal},
}

with open("wp_model_NBA.pkl", "wb") as f:
    pickle.dump(model_package, f)

# Load and predict
def predict_wp(score_diff, seconds_remaining, period, elo_diff=0):
    """Predict home team win probability from a game state."""
    with open("wp_model_NBA.pkl", "rb") as f:
        pkg = pickle.load(f)

    time_fraction = 1 - (seconds_remaining / 2880)
    features = {
        "score_diff": score_diff,
        "seconds_remaining": seconds_remaining,
        "period": period,
        "time_fraction": time_fraction,
        "elo_diff": elo_diff,
        "score_diff_x_tf": score_diff * time_fraction,
        "score_diff_sq": score_diff ** 2,
        "total_score": 0,  # approximate
        "score_diff_x_elo": score_diff * elo_diff,
    }

    X = np.array([[features[f] for f in pkg["feature_names"]]])
    raw_prob = pkg["model"].predict_proba(X)[0][1]
    calibrated = pkg["calibrator"].transform([raw_prob])[0]
    return round(calibrated, 4)

# Example
wp = predict_wp(score_diff=8, seconds_remaining=720, period=3, elo_diff=150)
print(f"Home team win probability: {wp:.1%}")

Step 6: Add Live Data Overlays

A static model is good. A model with live adjustments is better. Three overlays that matter:

Injury Adjustment

# If a star player is out, adjust the win probability
STAR_IMPACT = {
    "Nikola Jokic": 0.12,      # Jokic out → team loses ~12% win prob
    "Luka Doncic": 0.11,
    "Giannis Antetokounmpo": 0.11,
    "Jayson Tatum": 0.09,
    # ... 58 players tracked in our production system
}

Team Stats (Offensive/Defensive Efficiency)

# ORtg = points per 100 possessions (offense)
# DRtg = points per 100 possessions allowed (defense)
# Pace = possessions per game
team_stats = {
    "BOS": {"ortg": 117.2, "drtg": 110.0, "pace": 97.7},
    "DET": {"ortg": 114.3, "drtg": 106.5, "pace": 102.9},
}
# A team with higher ORtg and lower DRtg is better
# The differential feeds into the model as additional features

Live Recalibration

# Rolling isotonic refit on the last 500 resolved predictions
# Auto-corrects calibration drift without full retraining
from sklearn.isotonic import IsotonicRegression

class LiveRecalibrator:
    def __init__(self, buffer_size=500):
        self.preds = []
        self.outcomes = []
        self.calibrator = None

    def update(self, predicted_prob, actual_outcome):
        self.preds.append(predicted_prob)
        self.outcomes.append(actual_outcome)
        if len(self.preds) > 500:
            self.preds.pop(0)
            self.outcomes.pop(0)
        if len(self.preds) >= 50:
            self.calibrator = IsotonicRegression(out_of_bounds="clip")
            self.calibrator.fit(self.preds, self.outcomes)

    def calibrate(self, raw_prob):
        if self.calibrator:
            return self.calibrator.transform([raw_prob])[0]
        return raw_prob

What This Model Can Do

Once built, this model powers:

  1. Live dashboards — show real-time win probability curves during games
  2. Trading bots — compare your fair probability to Polymarket/Kalshi prices and trade the gap
  3. Research — analyze which game situations are most commonly mispriced
  4. Content — "the model gives X a 73% chance with 4 minutes left" makes great commentary

Next Steps

Get ZenHodl Weekly

One weekly email with live results, one model insight, and product updates.

Tuesday mornings. No spam.

Want to build this yourself?

The ZenHodl course teaches you to build a complete prediction market bot in 6 notebooks.

Join the community

Discuss strategies, share results, get help.

Join Discord