Build an Elo Rating System from Scratch in Python

Elo ratings are the most underrated feature in sports prediction. One number per team, updated after every game, capturing relative strength without box scores or advanced stats. Originally invented by Arpad Elo for chess and later adapted by FiveThirtyEight for sports predictions. This tutorial builds a complete Elo system from scratch in Python.

The Math

Given two teams with ratings $R_A$ and $R_B$, team A's expected win probability is:

$$E_A = \frac{1}{1 + 10^{(R_B - R_A) / 400}}$$

After the game, ratings update: $R_A' = R_A + K \times (S_A - E_A)$, where $S_A$ is 1 for a win, 0 for a loss.

Step 1: Core Functions

from collections import defaultdict

def expected_win_prob(rating_a: float, rating_b: float) -> float:
    return 1.0 / (1.0 + 10 ** ((rating_b - rating_a) / 400))

def update_ratings(rating_a, rating_b, a_won, k=20.0):
    exp_a = expected_win_prob(rating_a, rating_b)
    score_a = 1.0 if a_won else 0.0
    new_a = rating_a + k * (score_a - exp_a)
    new_b = rating_b + k * ((1 - score_a) - (1 - exp_a))
    return new_a, new_b

The system is zero-sum: points gained by the winner equal points lost by the loser.

Step 2: Home Court Advantage

Home teams win ~58% of NBA games. Add a bonus when computing expected score — but do not alter the stored rating:

def update_with_hca(rating_home, rating_away, home_won, k=20.0, hca=70.0):
    exp_home = expected_win_prob(rating_home + hca, rating_away)
    score = 1.0 if home_won else 0.0
    new_home = rating_home + k * (score - exp_home)
    new_away = rating_away + k * ((1 - score) - (1 - exp_home))
    return new_home, new_away

A hca of 70 points gives equally-rated teams ~58% home win probability. Tune per sport: NBA ~70, college basketball ~100, NFL ~50.

Step 3: Season Resets

Rosters change between seasons. Without a reset, stale ratings poison predictions:

def season_reset(ratings, regression=0.75):
    return {team: 1500 + regression * (elo - 1500) for team, elo in ratings.items()}

A 1700-rated team becomes 1650 after reset — keeping 75% of its distance from average.

Step 4: Full Pipeline

import numpy as np

def build_elo(seasons_data, k=20.0, hca=70.0, regression=0.75):
    """
    seasons_data: dict of season_id -> list of (home, away, home_won)
    Returns final ratings and a prediction log for evaluation.
    """
    ratings = defaultdict(lambda: 1500.0)
    log = []

    for season in sorted(seasons_data.keys()):
        for home, away, home_won in seasons_data[season]:
            prob = expected_win_prob(ratings[home] + hca, ratings[away])
            log.append({"home": home, "away": away, "prob": prob, "won": home_won})

            new_h, new_a = update_with_hca(ratings[home], ratings[away], home_won, k, hca)
            ratings[home] = new_h
            ratings[away] = new_a

        ratings = defaultdict(lambda: 1500.0, season_reset(dict(ratings), regression))

    return dict(ratings), log

Games must be sorted chronologically — each update depends on all previous games.

Step 5: Evaluate

def evaluate(log):
    probs = np.array([g["prob"] for g in log])
    outcomes = np.array([float(g["won"]) for g in log])
    brier = np.mean((probs - outcomes) ** 2)
    accuracy = np.mean((probs > 0.5) == outcomes)
    print(f"Brier: {brier:.4f}  Accuracy: {accuracy:.1%}")

On NBA data, a tuned Elo system hits Brier ~0.22-0.24 and accuracy ~65-67%.

Step 6: Tune K-Factor

K is the most important hyperparameter. Grid search it:

def tune_k(seasons_data, k_values):
    for k in k_values:
        _, log = build_elo(seasons_data, k=k)
        probs = np.array([g["prob"] for g in log])
        outcomes = np.array([float(g["won"]) for g in log])
        brier = np.mean((probs - outcomes) ** 2)
        print(f"K={k:5.1f}  Brier={brier:.4f}")

Typical optimal values: NBA K=20, NCAAMB K=28-32, NFL K=24-28. Fewer games per season means higher K.

Using Elo for Live Betting

Pre-game Elo gives a baseline. For live prediction, Elo difference becomes a feature alongside score differential, time remaining, and period:

features = {
    "elo_diff": ratings[home] - ratings[away],
    "score_diff": home_score - away_score,
    "time_remaining": seconds_left,
    "period": current_period,
}

In our backtests, adding elo_diff improved trading profit by 2-3 cents per trade. It captures strength-of-schedule and league standing that raw score cannot.

Common Mistakes

Not sorting chronologically. Out-of-order games leak future data into ratings.

Skipping season resets. Three-year-old ratings should not drive today's predictions.

Same K for all sports. NFL has 17 games per season, NBA has 82. Tune K per sport.

This is the foundation of Module 2 in the ZenHodl course. The full module adds margin-of-victory adjustments and conference-aware resets. Module 1 is free — start there.