How to Build a Sports Prediction API with Python in...

If you've ever wanted to predict the outcome of a live NBA game, an MLB matchup, or a CS2 esports match using real data, you're in the right place. In this guide, we'll walk through the core components of a sports prediction API — from data collection to model training to serving live predictions via a REST endpoint.

By the end, you'll understand the architecture behind systems like ZenHodl's prediction API, which serves calibrated win probabilities for 10 sports in real-time.

What a Sports Prediction API Actually Does

A prediction API takes in a game state (score, time remaining, team quality) and returns a probability:

GET /v1/games?sport=NBA

{
  "game_id": "nba_2026041201",
  "home_team": "BOS",
  "away_team": "MIA",
  "home_score": 58,
  "away_score": 45,
  "period": 3,
  "home_win_probability": 0.847,
  "model": "xgboost_v3_calibrated",
  "updated_at": "2026-04-12T02:30:00Z"
}

The key challenge isn't building the API endpoint — it's making the probability estimate accurate and well-calibrated. A calibrated model means: when it says 70%, the team actually wins ~70% of the time.

Step 1: Collect Historical Game Data

You need play-by-play or score-update snapshots with timestamps. The best free sources in 2026:

ESPN API (free, unofficial):

import requests

def get_nba_scoreboard():
    url = "https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard"
    resp = requests.get(url, timeout=10)
    data = resp.json()

    games = []
    for event in data.get("events", []):
        competition = event["competitions"][0]
        home = competition["competitors"][0]
        away = competition["competitors"][1]

        games.append({
            "game_id": event["id"],
            "home_team": home["team"]["abbreviation"],
            "away_team": away["team"]["abbreviation"],
            "home_score": int(home["score"]),
            "away_score": int(away["score"]),
            "period": competition.get("status", {}).get("period", 0),
            "clock": competition.get("status", {}).get("displayClock", ""),
            "status": competition.get("status", {}).get("type", {}).get("name", ""),
        })
    return games

For historical data, you'll want 3-5 seasons of game results to train on. Sources include: - Basketball Reference for NBA - FanGraphs for MLB - Jeff Sackmann's GitHub for tennis

Step 2: Engineer Features

Raw scores aren't enough. You need features that capture game context:

def build_features(game_state, team_stats, elo_ratings):
    home = game_state["home_team"]
    away = game_state["away_team"]

    return {
        "score_diff": game_state["home_score"] - game_state["away_score"],
        "seconds_remaining": game_state["seconds_remaining"],
        "period": game_state["period"],
        "time_fraction": 1 - (game_state["seconds_remaining"] / 2880),  # NBA = 48 min
        "elo_diff": elo_ratings.get(home, 1500) - elo_ratings.get(away, 1500),

        # Team quality features
        "ortg_diff": team_stats[home]["ortg"] - team_stats[away]["ortg"],
        "drtg_diff": team_stats[home]["drtg"] - team_stats[away]["drtg"],
        "pace_diff": team_stats[home]["pace"] - team_stats[away]["pace"],

        # Interaction features
        "score_diff_x_tf": score_diff * time_fraction,  # Lead matters MORE late
        "score_diff_sq": score_diff ** 2,                # Blowouts are decisive
    }

The most important features, ranked by typical XGBoost importance: 1. score_diff (~25%) — the current lead 2. elo_diff (~15%) — pre-game team quality gap 3. time_fraction (~12%) — how much game is left 4. score_diff × time_fraction (~10%) — interaction: a 10-point lead matters differently at halftime vs 2 minutes left 5. offensive/defensive ratings (~8% each) — team efficiency metrics

Step 3: Train the Model

XGBoost is the standard for tabular sports prediction. Here's a minimal training pipeline:

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import brier_score_loss, roc_auc_score
from sklearn.isotonic import IsotonicRegression

# Load your snapshot data (one row per score-change per game)
# Columns: all features + "home_won" (0 or 1)
X = df[feature_columns].values
y = df["home_won"].values

# Walk-forward split (NEVER random split for time-series!)
X_train, X_test, y_train, y_test = X[:split_idx], X[split_idx:], y[:split_idx], y[split_idx:]

# Train
model = xgb.XGBClassifier(
    n_estimators=300,
    max_depth=5,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric="logloss",
)
model.fit(X_train, y_train)

# Evaluate
raw_probs = model.predict_proba(X_test)[:, 1]
print(f"Brier score: {brier_score_loss(y_test, raw_probs):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, raw_probs):.4f}")

Critical: use walk-forward splits, not random splits. Random splits leak future game information into training (a snapshot from Q3 of tonight's game in training, while a Q1 snapshot from the same game is in test). Walk-forward means all training data is older than all test data.

Step 4: Calibrate the Probabilities

Raw XGBoost outputs are often overconfident. Isotonic regression fixes this:

# Hold out a calibration set (between train and test chronologically)
cal_probs = model.predict_proba(X_cal)[:, 1]

calibrator = IsotonicRegression(y_min=0.01, y_max=0.99, out_of_bounds="clip")
calibrator.fit(cal_probs, y_cal)

# Now use calibrated predictions
calibrated = calibrator.transform(raw_probs)
print(f"Brier AFTER calibration: {brier_score_loss(y_test, calibrated):.4f}")

After calibration, when your model says 70%, teams should actually win ~70% of the time. This is called Expected Calibration Error (ECE) — you want it under 0.01.

For reference, ZenHodl's production models achieve ECE of 0.002 across NBA, NHL, MLB, and LoL after isotonic calibration.

Step 5: Serve via FastAPI

from fastapi import FastAPI, HTTPException
import pickle

app = FastAPI()

# Load model at startup
with open("wp_model_NBA.pkl", "rb") as f:
    model_data = pickle.load(f)
model = model_data["model"]
calibrator = model_data["calibrator"]
feature_names = model_data["feature_names"]

@app.get("/v1/predict")
def predict(home_team: str, away_team: str, home_score: int,
            away_score: int, period: int, seconds_remaining: int):

    features = build_features(home_team, away_team, home_score,
                              away_score, period, seconds_remaining)
    X = [[features[f] for f in feature_names]]

    raw = model.predict_proba(X)[0][1]
    calibrated = calibrator.transform([raw])[0]

    return {
        "home_team": home_team,
        "away_team": away_team,
        "home_win_probability": round(float(calibrated), 4),
        "model_version": "v3_calibrated",
    }

Step 6: Add Real-Time Overlays

A static model isn't enough for live prediction. You need overlays that adjust probabilities based on real-time information:

Injury adjustments — if a star player is ruled out mid-game, shift the probability
Live recalibration — rolling isotonic refit on the last 500 predictions to catch model drift
Momentum features — scoring runs in basketball, power plays in hockey

These overlays stack on top of the base XGBoost prediction and are capped at ±20% total adjustment.

What You Can Build With This

Once your API is serving calibrated probabilities, you can:

Build a trading bot for prediction markets like Polymarket or Kalshi
Create a live dashboard showing real-time win probabilities as games unfold
Offer an API service to other developers and researchers

If you want to skip the model training and just use calibrated predictions directly, ZenHodl's API provides real-time win probabilities across 11 sports with a 7-day free trial.

How to Build a Sports Prediction API with Python in 2026

What a Sports Prediction API Actually Does

Step 1: Collect Historical Game Data

Step 2: Engineer Features

Step 3: Train the Model

Step 4: Calibrate the Probabilities

Step 5: Serve via FastAPI

Step 6: Add Real-Time Overlays

What You Can Build With This

Further Reading

Related reading

Get ZenHodl Weekly

Want to build this yourself?