Elo ratings are the most underrated feature in sports prediction. One number per team, updated after every game, capturing relative strength without box scores or advanced stats. Originally invented by Arpad Elo for chess and later adapted by FiveThirtyEight for sports predictions. This tutorial builds a complete Elo system from scratch in Python.
The Math
Given two teams with ratings $R_A$ and $R_B$, team A's expected win probability is:
$$E_A = \frac{1}{1 + 10^{(R_B - R_A) / 400}}$$
After the game, ratings update: $R_A' = R_A + K \times (S_A - E_A)$, where $S_A$ is 1 for a win, 0 for a loss.
Step 1: Core Functions
from collections import defaultdict
def expected_win_prob(rating_a: float, rating_b: float) -> float:
return 1.0 / (1.0 + 10 ** ((rating_b - rating_a) / 400))
def update_ratings(rating_a, rating_b, a_won, k=20.0):
exp_a = expected_win_prob(rating_a, rating_b)
score_a = 1.0 if a_won else 0.0
new_a = rating_a + k * (score_a - exp_a)
new_b = rating_b + k * ((1 - score_a) - (1 - exp_a))
return new_a, new_b
The system is zero-sum: points gained by the winner equal points lost by the loser.
Step 2: Home Court Advantage
Home teams win ~58% of NBA games. Add a bonus when computing expected score — but do not alter the stored rating:
def update_with_hca(rating_home, rating_away, home_won, k=20.0, hca=70.0):
exp_home = expected_win_prob(rating_home + hca, rating_away)
score = 1.0 if home_won else 0.0
new_home = rating_home + k * (score - exp_home)
new_away = rating_away + k * ((1 - score) - (1 - exp_home))
return new_home, new_away
A hca of 70 points gives equally-rated teams ~58% home win probability. Tune per sport: NBA ~70, college basketball ~100, NFL ~50.
Step 3: Season Resets
Rosters change between seasons. Without a reset, stale ratings poison predictions:
def season_reset(ratings, regression=0.75):
return {team: 1500 + regression * (elo - 1500) for team, elo in ratings.items()}
A 1700-rated team becomes 1650 after reset — keeping 75% of its distance from average.
Step 4: Full Pipeline
import numpy as np
def build_elo(seasons_data, k=20.0, hca=70.0, regression=0.75):
"""
seasons_data: dict of season_id -> list of (home, away, home_won)
Returns final ratings and a prediction log for evaluation.
"""
ratings = defaultdict(lambda: 1500.0)
log = []
for season in sorted(seasons_data.keys()):
for home, away, home_won in seasons_data[season]:
prob = expected_win_prob(ratings[home] + hca, ratings[away])
log.append({"home": home, "away": away, "prob": prob, "won": home_won})
new_h, new_a = update_with_hca(ratings[home], ratings[away], home_won, k, hca)
ratings[home] = new_h
ratings[away] = new_a
ratings = defaultdict(lambda: 1500.0, season_reset(dict(ratings), regression))
return dict(ratings), log
Games must be sorted chronologically — each update depends on all previous games.
Step 5: Evaluate
def evaluate(log):
probs = np.array([g["prob"] for g in log])
outcomes = np.array([float(g["won"]) for g in log])
brier = np.mean((probs - outcomes) ** 2)
accuracy = np.mean((probs > 0.5) == outcomes)
print(f"Brier: {brier:.4f} Accuracy: {accuracy:.1%}")
On NBA data, a tuned Elo system hits Brier ~0.22-0.24 and accuracy ~65-67%.
Step 6: Tune K-Factor
K is the most important hyperparameter. Grid search it:
def tune_k(seasons_data, k_values):
for k in k_values:
_, log = build_elo(seasons_data, k=k)
probs = np.array([g["prob"] for g in log])
outcomes = np.array([float(g["won"]) for g in log])
brier = np.mean((probs - outcomes) ** 2)
print(f"K={k:5.1f} Brier={brier:.4f}")
Typical optimal values: NBA K=20, NCAAMB K=28-32, NFL K=24-28. Fewer games per season means higher K.
Using Elo for Live Betting
Pre-game Elo gives a baseline. For live prediction, Elo difference becomes a feature alongside score differential, time remaining, and period:
features = {
"elo_diff": ratings[home] - ratings[away],
"score_diff": home_score - away_score,
"time_remaining": seconds_left,
"period": current_period,
}
In our backtests, adding elo_diff improved trading profit by 2-3 cents per trade. It captures strength-of-schedule and league standing that raw score cannot.
Common Mistakes
Not sorting chronologically. Out-of-order games leak future data into ratings.
Skipping season resets. Three-year-old ratings should not drive today's predictions.
Same K for all sports. NFL has 17 games per season, NBA has 82. Tune K per sport.
This is the foundation of Module 2 in the ZenHodl course. The full module adds margin-of-victory adjustments and conference-aware resets. Module 1 is free — start there.