Course Preview

First 8 cells of each module. Real teaching, real code — decide for yourself.

6 modules First 8 cells from each notebook · Real Jupyter notebooks

01 Scraping Espn

45 cells · 23 code · 1159 lines

Module 1: Scraping ESPN Play-by-Play Data at Scale

**Build a Polymarket Prediction Bot from Scratch**

---

What We're Building

In this module, we build a production-grade scraper that pulls **game-state snapshots** from ESPN's public API. By the end, you'll have a dataset of hundreds of thousands of rows — each one a snapshot of a game at a specific moment (score, period, time remaining, ESPN's own win probability) — with the final outcome attached as a label.

This dataset is the foundation for everything else in the course: training win-probability models, calibrating edge thresholds, backtesting strategies, and ultimately running a live bot on Polymarket.

Why ESPN?

**Free** — No API key, no authentication, no rate-limit headers. Just HTTP GET requests.

**Real-time** — The same endpoints power ESPN's live scoreboard, so they update within seconds of real events.

**Comprehensive** — Covers NBA, NCAAMB, NHL, MLB, NFL, CFB, soccer leagues, and more.

**Win probability included** — For basketball and football, ESPN returns a `winprobability` array alongside the play-by-play, giving us a free baseline model to benchmark against.

What the Output Looks Like

Each row in our final dataset represents one play/moment in a game:

|---------|-------|-----------|-----------|------------|------------|--------|-------------------|------------|---------------|--------------|-----------|

| 401584793 | NBA | BOS | MIA | 28 | 22 | 2 | 1380.0 | 6 | 0.479 | 0.712 | 1 |

| 401584793 | NBA | BOS | MIA | 30 | 25 | 2 | 1320.0 | 5 | 0.458 | 0.695 | 1 |

| 401584793 | NBA | BOS | MIA | 30 | 28 | 2 | 1260.0 | 2 | 0.437 | 0.621 | 1 |

**Key columns:**

`score_diff` — Home score minus away score (positive = home leading)

`time_fraction` — Fraction of game remaining (1.0 = start, 0.0 = end)

`espn_home_wp` — ESPN's own win probability for the home team (our baseline)

`home_wins` — Ground truth label (1 = home won, 0 = away won)

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install aiohttp pandas numpy matplotlib pyarrow tqdm

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

Setup

Install dependencies if needed:

# Uncomment and run if you need to install packages
# !pip install aiohttp pandas pyarrow nest_asyncio

import aiohttp
import asyncio
import json
import time
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple
from pathlib import Path

import pandas as pd
import nest_asyncio

# Allow running async code in Jupyter (which already has an event loop)
nest_asyncio.apply()

print("All imports OK")

Why async?

ESPN has thousands of games across multiple seasons. A normal `requests.get()` loop waits for each response before sending the next — painfully slow.

**Async** sends multiple requests at once. Think of it like ordering 10 pizzas by calling 10 restaurants simultaneously, instead of calling one, waiting for delivery, then calling the next. We'll scrape 60,000+ games in minutes instead of hours.

> **New to async?** Don't worry. Paste any async code cell into Claude or ChatGPT and ask "explain this line by line." The pattern is always the same: `async with session.get(url) as resp`.

---

1. ESPN API Discovery

ESPN exposes a public JSON API that powers their website and mobile app. No authentication is needed — you just hit a URL and get JSON back.

Scoreboard Endpoints

Each sport has a scoreboard endpoint that returns **all games for a given day**:

| Sport | Endpoint |

|-------|----------|

| NBA | `https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard` |

| NCAAMB | `https://site.api.espn.com/apis/site/v2/sports/basketball/mens-college-basketball/scoreboard?groups=50&limit=300` |

| NHL | `https://site.api.espn.com/apis/site/v2/sports/hockey/nhl/scoreboard` |

| NFL | `https://site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard` |

| CFB | `https://site.api.espn.com/apis/site/v2/sports/football/college-football/scoreboard` |

| MLB | `https://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard` |

Key Parameters

**`?dates=YYYYMMDD`** — Fetch a specific date's games (default = today)

**`?groups=50&limit=300`** — For NCAAMB, fetches Division I games (group 50) with enough limit to get them all

Summary Endpoints

For a single game's play-by-play and win probability:

```

https://site.api.espn.com/apis/site/v2/sports/basketball/nba/summary?event={game_id}

```

Let's start by fetching one day of NBA games to see the raw structure.

This is 8 of 45 cells. The full module continues with hands-on exercises and working code.

Get This Module Free

02 Elo Ratings

51 cells · 25 code · 892 lines

Module 2: Building Elo Ratings from Scratch

**Course: Build a Polymarket Prediction Bot from Scratch**

---

In Module 1, we scraped game data from ESPN. Now we need a way to measure **how good each team is** — a single number that captures team strength. That number is the **Elo rating**.

Why Elo Matters

Elo ratings were invented by Arpad Elo in the 1960s for chess, but they work beautifully for any head-to-head competition. The core idea:

Every team starts with a **base rating** (typically 1500)

When you **beat a strong team**, your rating goes up a lot

When you **beat a weak team**, your rating goes up a little

When you **lose to a weak team**, your rating drops a lot

The beauty of Elo is that it's **self-correcting**. A team that keeps winning will see their rating rise until the model correctly predicts they'll win — then the updates shrink. It converges on the true strength.

For prediction markets, Elo gives us something concrete: if Team A has Elo 1650 and Team B has Elo 1450, we can compute the **exact probability** that Team A wins. If the market says 55% but our Elo model says 70%, that's a 15-cent edge.

By the end of this module, you'll have:

1. A working Elo rating system from scratch

2. Ratings for every NBA team (and the framework for any sport)

3. Validation that higher Elo actually predicts wins

4. Saved ratings ready to use in the win-probability model (Module 3)

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install pandas numpy matplotlib

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

# ── Quick Start: Sample Data ──────────────────────────────────────────────────
# If you haven't completed the previous module, uncomment and run this cell
# to load sample data so you can follow along without being blocked.
#
# Generates synthetic game results so you can build Elo without running Module 1

# import pandas as pd, numpy as np
# np.random.seed(42)
# teams = [f"Team_{i}" for i in range(30)]
# games = []
# for season in ['2023-24', '2024-25']:
#     for _ in range(500):
#         h, a = np.random.choice(teams, 2, replace=False)
#         home_win = np.random.random() < 0.58  # ~58% home win rate
#         games.append({'season': season, 'home_team': h, 'away_team': a, 'home_wins': int(home_win)})
# game_results = pd.DataFrame(games)
# print(f"Loaded {len(game_results)} synthetic games")

---

1. The Elo Formula

The entire Elo system comes down to two equations.

Expected Score

Given two teams with ratings $R_A$ and $R_B$, the expected score (win probability) for team A is:

$$E_A = \frac{1}{1 + 10^{(R_B - R_A) / 400}}$$

This is a **logistic function** scaled so that a 400-point Elo advantage gives you a ~91% win probability. Some intuition:

| Elo Difference | Win Probability |

|:-:|:-:|

| 0 | 50.0% |

| +100 | 64.0% |

| +200 | 75.9% |

| +400 | 90.9% |

| -100 | 36.0% |

| -200 | 24.1% |

Rating Update

After the game, we update the rating:

$$R_{\text{new}} = R_{\text{old}} + K \times (S - E)$$

Where:

$S$ = actual result (1 for win, 0 for loss)

$E$ = expected score (from the formula above)

$K$ = **K-factor** — controls how much a single game matters

The K-Factor

The K-factor is the single most important tuning parameter:

**K=20**: Standard for established leagues (NBA, NFL). Ratings are stable.

**K=32**: Better for volatile leagues (college basketball with 907 teams, high roster turnover).

**K=40+**: Very reactive. Good for new leagues with no history.

Think of K as a **learning rate**. Too low and the system is slow to recognize a team got better. Too high and one upset throws everything off.

**Analogy:** Think of Elo like a GPA for sports teams. Beating a top-ranked team is like acing a hard class — your GPA jumps. Losing to a weak team is like failing an easy one — it drops a lot. Over time, every team's "GPA" converges to their true strength level.

The magic number **400** in the formula controls sensitivity. With 400, a team rated 200 points higher wins ~76% of the time. This was tuned for chess and works surprisingly well for sports too.

def update_elo(rating_a: float, rating_b: float, winner: str, k: float = 20.0) -> tuple:
    """
    Update Elo ratings after a game.
    
    Parameters
    ----------
    rating_a : float
        Current Elo rating of Team A.
    rating_b : float
        Current Elo rating of Team B.
    winner : str
        'A' if Team A won, 'B' if Team B won.
    k : float
        K-factor (learning rate). Default 20.
    
    Returns
    -------
    tuple
        (new_rating_a, new_rating_b)
    """
    # Step 1: Expected score for Team A
    expected_a = 1.0 / (1.0 + 10.0 ** ((rating_b - rating_a) / 400.0))
    expected_b = 1.0 - expected_a
    
    # Step 2: Actual result
    actual_a = 1.0 if winner == 'A' else 0.0
    actual_b = 1.0 - actual_a
    
    # Step 3: Update
    new_a = rating_a + k * (actual_a - expected_a)
    new_b = rating_b + k * (actual_b - expected_b)
    
    return new_a, new_b

Worked Example

Team A (1500) beats Team B (1600). Team B is the stronger team on paper, so this is an upset.

This is 8 of 51 cells. The full module continues with hands-on exercises and working code.

Get All 6 Modules — $49

03 Wp Models

45 cells · 26 code · 1044 lines

Module 3: Training Win Probability Models

**Build a Polymarket Prediction Bot from Scratch**

---

What You'll Build

This is the core ML module. By the end, you'll have a trained win probability (WP) model that:

1. Takes the current game state (score, time remaining, team strength)

2. Outputs **P(home team wins)** as a calibrated probability

3. Compares that probability to the Polymarket price to find **edges**

Why This Is Valuable

On Polymarket, moneyline contracts trade between $0.01 and $0.99. If a contract for "Lakers win" is trading at $0.63 (63 cents), the market is saying the Lakers have a 63% chance of winning.

But what if your model says 72%? That's a **9-cent edge**. You buy at 63 cents, and over many trades, you collect that edge.

The math:

**Your model's fair price:** 72 cents

**Market asking price:** 63 cents

**Edge:** 9 cents per share

**If you're right on average:** you profit ~9 cents per share over time

That's the entire business model. Build a better model than the market, exploit the difference, and let the law of large numbers work for you.

Prerequisites

Module 1: ESPN game data (parquet files with play-by-play snapshots)

Module 2: Elo ratings (team strength estimates)

Python: pandas, numpy, sklearn, xgboost, matplotlib

**What is a win probability model?**

Imagine you're watching a basketball game. Home team leads by 12 with 8 minutes left in Q3. What's their chance of winning? You intuitively estimate ~80%.

A WP model does this mathematically. It takes the game state (score difference, time remaining, period, team strength) and outputs a probability between 0% and 100%.

**Why this matters for trading:** Prediction markets sell contracts at a price that reflects the market's estimate (e.g., 72 cents = 72% probability). If your model says 82%, that's a 10-cent edge. Buy at 72c, hold to settlement, win at 82% rate = profit.

**Analogy:** The WP model is like a calculator that converts "game situation" into "fair price." The market gives you the store price. When your calculator says the item is worth more than the store price, you buy.

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install pandas numpy scikit-learn xgboost matplotlib

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

# ── Quick Start: Sample Data ──────────────────────────────────────────────────
# If you haven't completed the previous module, uncomment and run this cell
# to load sample data so you can follow along without being blocked.
#
# Generates synthetic WP training data so you can train models without running Modules 1-2

# import pandas as pd, numpy as np
# np.random.seed(42)
# n = 50000
# df = pd.DataFrame({
#     'game_id': np.repeat(range(n//50), 50),
#     'home_team': np.random.choice(['TeamA','TeamB','TeamC','TeamD','TeamE'], n),
#     'away_team': np.random.choice(['TeamF','TeamG','TeamH','TeamI','TeamJ'], n),
#     'score_diff': np.random.normal(0, 10, n),
#     'seconds_remaining': np.random.uniform(0, 2880, n),
#     'period': np.random.choice([1,2,3,4], n),
#     'home_score': np.random.randint(0, 120, n),
#     'away_score': np.random.randint(0, 120, n),
#     'elo_diff': np.random.normal(0, 100, n),
#     'espn_home_wp': np.random.uniform(0.05, 0.95, n),
# })
# df['time_fraction'] = 1.0 - df['seconds_remaining'] / 2880
# df['home_wins'] = (df['score_diff'] + np.random.normal(0, 5, n) > 0).astype(int)
# print(f"Loaded {len(df)} synthetic training rows across {df['game_id'].nunique()} games")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
import pickle

warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

# We'll focus on NBA for this module — the same approach works for all sports
SPORT = 'NBA'

print(f"Module 3: Training WP Models for {SPORT}")
print(f"numpy {np.__version__}, pandas {pd.__version__}")

---

1. Load the Data

In Module 1 we scraped ESPN play-by-play data and saved snapshots as parquet files. Each row is a **moment in a game** — a snapshot of the score, time remaining, and ESPN's own win probability.

If you followed Module 1, your data should be at `espn_wp_data_{SPORT}.parquet`.

> **Want more data?** This module scraped a sample dataset. The full training dataset has **25.6 million rows** across 7 sports (2020-2026). Available at [api.zenhodl.net/products](https://api.zenhodl.net/products).

This is 8 of 45 cells. The full module continues with hands-on exercises and working code.

Get All 6 Modules — $49

04 Backtesting

45 cells · 25 code · 1132 lines

Module 4: Backtesting for Sports Betting

**Build a Polymarket Prediction Bot from Scratch**

---

This is where most people blow up. They build a model that looks incredible in testing, backtest it, see 75% win rates, go live, and proceed to lose money for three straight weeks.

The problem is almost never the model. It's the backtest.

A bad backtest doesn't just give you wrong numbers -- it gives you *confidence* in wrong numbers. You size up, you run it longer, you double down when the losses start because "the backtest said 73% win rate." By the time you realize the backtest was flawed, you've lost real money.

This module covers:

1. How sports betting backtests differ from standard ML evaluation

2. The hold-to-settlement strategy and its economics

3. Building a rigorous backtester from scratch

4. The five mistakes that make every backtest look amazing (and lose money live)

5. Analyzing results the right way

6. Parameter sensitivity -- finding the real sweet spot vs. overfitting

7. Execution cost modeling -- what your backtest forgets

How Sports Betting Backtesting Differs from ML Evaluation

In ML, you care about accuracy, precision, recall, AUC, Brier score. You split train/test, evaluate, done.

In sports betting, **a model with worse Brier score can make more money**. This is not a paradox -- it's because you don't bet on every game. You only bet when your model disagrees with the market by enough to cover costs. The question isn't "how accurate is the model overall?" It's "how accurate is the model *on the subset of games where it disagrees with the market?*"

A model that's slightly miscalibrated but identifies genuine edges will crush a perfectly calibrated model that agrees with the market on everything.

This means:

**Train/test split isn't enough.** You need to simulate the actual trading logic: filters, position limits, edge thresholds.

**Accuracy on the full dataset is irrelevant.** Only accuracy on *triggered trades* matters.

**You must model costs.** A 3c edge is real in a backtest and fake in production after fees + slippage.

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install pandas numpy scikit-learn xgboost matplotlib

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import SplineTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.isotonic import IsotonicRegression
import warnings
warnings.filterwarnings('ignore')

plt.rcParams['figure.dpi'] = 120
plt.rcParams['font.size'] = 11
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

print("Module 4: Backtesting for Sports Betting")
print("Imports loaded.")

---

1. The Hold-to-Settlement Strategy

On Polymarket, sports contracts resolve to **$0.00 or $1.00**. There is no partial payout. This is fundamentally different from stock trading where you exit at some intermediate price.

**The economics are simple:**

You BUY a contract at some price (the "entry"), e.g., **63 cents**

If the team wins: contract resolves to $1.00, you profit **(100 - 63) = 37 cents**

If the team loses: contract resolves to $0.00, you lose **63 cents**

No exit fees, no slippage on resolution -- settlement is free and exact

**Expected Value:**

$$\text{EV} = p_{\text{fair}} \times (1 - \text{price}) - (1 - p_{\text{fair}}) \times \text{price} = p_{\text{fair}} - \text{price}$$

If your model says the true probability is **72%** and you buy at **63 cents**:

$$\text{EV} = 0.72 - 0.63 = +0.09 = +9\text{c per share}$$

That's the edge. It's additive and it's the only number that matters.

Why Hold-to-Settlement?

You could try to trade in and out -- buy at 63c, sell at 70c when the game swings. But this introduces:

Exit slippage (orderbook may be thin)

Taker fee on exit (~2c)

Timing risk (when do you sell?)

The need to model *price movement*, not just *outcome probability*

Hold-to-settlement removes all of these. You only need to answer one question: **"What is the true probability that this team wins?"** If you're right more often than the market, you make money. Period.

# Demonstrate the hold-to-settlement EV math

def calculate_ev(fair_wp, entry_price):
    """Calculate expected value in cents.
    
    fair_wp: model's estimated probability (0-1)
    entry_price: what we pay in cents (0-100)
    """
    entry_frac = entry_price / 100
    ev = fair_wp - entry_frac
    profit_if_win = 100 - entry_price
    loss_if_lose = -entry_price
    ev_check = fair_wp * profit_if_win + (1 - fair_wp) * loss_if_lose
    return ev * 100, ev_check  # both in cents, should match

# Example scenarios
scenarios = [
    (0.72, 63, "Model says 72%, buy at 63c"),
    (0.72, 72, "Model says 72%, buy at 72c (no edge)"),
    (0.72, 80, "Model says 72%, buy at 80c (negative EV!)"),
    (0.55, 45, "Model says 55%, buy at 45c"),
    (0.85, 75, "Model says 85%, buy at 75c"),
]

print("Hold-to-Settlement Economics")
print("=" * 70)
print(f"{'Scenario':<42} {'EV (c)':>8}  {'Win P/L':>8}  {'Lose P/L':>9}")
print("-" * 70)

for fair, entry, desc in scenarios:
    ev, _ = calculate_ev(fair, entry)
    win_pl = 100 - entry
    lose_pl = -entry
    marker = " <-- edge" if ev > 0 else (" <-- NO edge" if ev == 0 else " <-- LOSING")
    print(f"{desc:<42} {ev:>+7.1f}c  {win_pl:>+7.1f}c  {lose_pl:>+8.1f}c{marker}")

# Visualize: How edge scales with fair_wp - market_price

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: EV surface
fair_wps = np.arange(0.50, 0.95, 0.01)
entry_prices = np.arange(40, 85, 1)
FW, EP = np.meshgrid(fair_wps, entry_prices)
EV = (FW - EP / 100) * 100  # in cents

ax = axes[0]
c = ax.contourf(FW * 100, EP, EV, levels=np.arange(-30, 35, 5), cmap='RdYlGn')
plt.colorbar(c, ax=ax, label='EV (cents/share)')
ax.contour(FW * 100, EP, EV, levels=[0], colors='black', linewidths=2)
ax.set_xlabel('Model Fair WP (cents)')
ax.set_ylabel('Entry Price (cents)')
ax.set_title('Expected Value per Share')
ax.annotate('Break-even line\n(fair_wp = entry)', xy=(70, 70), fontsize=9,
            ha='center', bbox=dict(boxstyle='round', fc='white', alpha=0.8))

# Right: Profit distribution for a realistic edge
ax = axes[1]
np.random.seed(42)
n_trades = 500
fair_wp_sim = 0.72
entry_sim = 63
outcomes = np.random.binomial(1, fair_wp_sim, n_trades)
profits = np.where(outcomes == 1, 100 - entry_sim, -entry_sim)

ax.hist(profits, bins=[-65, -60, 35, 40], color=['#d32f2f', '#d32f2f', '#388e3c', '#388e3c'],
        edgecolor='white', rwidth=0.6)
ax.set_xlabel('Profit per Trade (cents)')
ax.set_ylabel('Count')
ax.set_title(f'Profit Distribution (fair=72c, entry=63c, n={n_trades})')
ax.axvline(x=np.mean(profits), color='blue', linestyle='--', linewidth=2, 
           label=f'Avg: {np.mean(profits):.1f}c')
ax.legend()

plt.tight_layout()
plt.show()

print(f"\nSimulation: {n_trades} trades at 72% fair / 63c entry")
print(f"  Wins: {outcomes.sum()} ({outcomes.mean()*100:.1f}%)")
print(f"  Total PnL: {profits.sum():.0f}c ({profits.mean():.1f}c/trade)")
print(f"  Theoretical EV: {(fair_wp_sim - entry_sim/100)*100:.1f}c/trade")

---

2. Building the Backtester

The backtester simulates exactly what the live bot does:

1. For each in-game snapshot, predict the fair win probability

2. Compare model's estimate to the "market price" (we use ESPN WP as a proxy)

3. If the edge exceeds our threshold AND passes all filters, log a trade

4. The trade resolves to +profit or -loss based on who actually won

**Key design decisions:**

We check **both sides** (home AND away) on every snapshot. If the model says home is 72%, that also means away is 28%. If the market says home is 80%, we have edge on away (28c fair vs 20c market = 8c edge).

We limit trades per game (`max_per_game`) to avoid overconcentration

We only trade from `min_period` onward (Period 1 data is too noisy in most sports)

We apply `min_fair_wp` to avoid underdog bets (more on why later)

Using ESPN WP as a Market Proxy

We don't have historical Polymarket orderbook data at second-level granularity. But ESPN publishes a real-time win probability for every game, and Polymarket prices track ESPN WP closely (correlation ~0.95 for NBA/NCAAMB). It's not perfect, but it's the best available proxy for backtesting.

This is 8 of 45 cells. The full module continues with hands-on exercises and working code.

Get All 6 Modules — $49

05 Live Bot

39 cells · 15 code · 1227 lines

Module 5: Building the Live Trading Bot

**Build a Polymarket Prediction Bot from Scratch**

---

This is where everything comes together. In the previous modules, we:

1. **Module 1**: Scraped live game data from ESPN's public API

2. **Module 2**: Trained a win probability model on historical game states

3. **Module 3**: Backtested our edge — proving the model finds mispriced contracts

4. **Module 4**: Evaluated performance metrics and refined our filters

Now we build the **live bot** — the system that runs in real-time during games, detects edges, and places orders on Polymarket.

Architecture Overview

The bot is a continuous loop with four stages:

```

┌─────────────┐ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐

│ ESPN API │────▶│ WP Model │────▶│ Edge Filter │────▶│ Polymarket │

│ (scores) │ │ (predict) │ │ (min 8c) │ │ (GTC order) │

└─────────────┘ └──────────────┘ └───────────────┘ └──────────────┘

▲ │

│ ┌──────────────┐ │

└────────────────────│ Sleep 5-15s │◀─────────────────────────┘

└──────────────┘

```

Hold-to-Settlement Strategy

This is the key insight: **we never sell**. Every contract on Polymarket resolves to either $0.00 or $1.00 when the game ends. If our model says a contract is worth $0.72 and the market is selling it for $0.62, we buy it and hold until the game finishes.

**No exit timing** — no need to predict when to sell

**No exit fees** — settlement is free on Polymarket

**Binary outcome** — we either win the full payoff or lose our entry cost

**Expected value** — if our model is calibrated, buying at 62c when fair value is 72c yields +10c expected profit per share

This eliminates an entire category of risk. The only question is: **is our model better than the market?**

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install websockets aiohttp requests python-dotenv

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

---

1. Polymarket Basics

Before we write any code, let's understand the platform we're trading on.

Prediction Markets

Polymarket is a **prediction market** — a marketplace where you buy and sell contracts on the outcome of real-world events. For sports:

**YES shares** pay $1.00 if the outcome happens, $0.00 if it doesn't

**NO shares** pay $1.00 if the outcome does NOT happen, $0.00 if it does

Prices reflect the market's implied probability (a YES share at $0.65 = 65% implied probability)

CLOB (Central Limit Order Book)

Polymarket uses an on-chain **Central Limit Order Book**, similar to a stock exchange:

| Concept | Description |

|---------|-------------|

| **Bid** | Highest price someone will pay (buy order) |

| **Ask** | Lowest price someone will sell at (sell order) |

| **Spread** | Ask minus Bid — the gap between buyers and sellers |

| **GTC** | Good Till Cancel — order stays on the book until filled or cancelled |

| **FOK** | Fill Or Kill — must fill entirely and immediately, or cancel |

Token IDs

Each side of each market has a **unique token ID** — a long hex string that identifies the contract on-chain:

```

Event: "Lakers vs Celtics"

├── Lakers YES token: 0x1a2b3c... (pays $1 if Lakers win)

└── Celtics YES token: 0x4d5e6f... (pays $1 if Celtics win)

```

In **neg-risk markets** (which cover all sports games), YES + NO always sum to $1.00 at resolution. This means:

If Lakers YES = $0.60, then Lakers NO (= Celtics YES) = ~$0.40

You can check both sides for edge on every game

API Architecture

Polymarket exposes two APIs:

| API | Purpose | URL |

|-----|---------|-----|

| **Gamma API** | Event/market discovery, metadata | `https://gamma-api.polymarket.com` |

| **CLOB API** | Order placement, orderbook, positions | `https://clob.polymarket.com` |

| **WebSocket** | Real-time price updates | `wss://ws-subscriptions-clob.polymarket.com/ws/market` |

We use Gamma to discover which games are tradeable, CLOB to get prices and place orders, and optionally the WebSocket for faster price updates.

---

2. Setting Up the Polymarket API

Authentication

Polymarket uses **API key authentication** via the CLOB API. You need:

1. A funded Polymarket wallet (deposit USDC on Polygon)

2. API credentials from the CLOB (key, secret, passphrase)

3. Your wallet address as the funder

Store these in a `.env` file (never commit this to git):

```bash

.env

POLY_API_KEY=your-api-key

POLY_API_SECRET=your-api-secret

POLY_API_PASSPHRASE=your-passphrase

POLY_FUNDER=0xYourWalletAddress

```

import os
import json
import time
import hmac
import hashlib
import base64
import requests
from pathlib import Path
from urllib.parse import urlencode


# ── Load environment variables ──────────────────────────────────────────────
def load_env(env_path=".env"):
    """Load .env file into os.environ."""
    p = Path(env_path)
    if not p.exists():
        print(f"Warning: {env_path} not found. Set POLY_* env vars manually.")
        return
    with open(p) as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith("#") and "=" in line:
                if line.startswith("export "):
                    line = line[7:]
                key, _, val = line.partition("=")
                val = val.strip().strip('"').strip("'")
                os.environ.setdefault(key.strip(), val)

load_env()

# ── CLOB API client ─────────────────────────────────────────────────────────
POLY_BASE = "https://clob.polymarket.com"
GAMMA_BASE = "https://gamma-api.polymarket.com"


def clob_auth_headers(method: str, path: str, body: str = "") -> dict:
    """Build HMAC-signed authentication headers for the CLOB API."""
    timestamp = str(int(time.time()))
    api_key = os.environ["POLY_API_KEY"]
    secret = os.environ["POLY_API_SECRET"]
    passphrase = os.environ["POLY_API_PASSPHRASE"]

    # Signature: HMAC-SHA256(timestamp + method + path + body)
    message = timestamp + method.upper() + path + body
    sig = hmac.new(
        base64.b64decode(secret),
        message.encode("utf-8"),
        hashlib.sha256,
    ).digest()
    signature = base64.b64encode(sig).decode("utf-8")

    return {
        "POLY-ADDRESS": os.environ.get("POLY_FUNDER", ""),
        "POLY-SIGNATURE": signature,
        "POLY-TIMESTAMP": timestamp,
        "POLY-NONCE": "0",
        "POLY-API-KEY": api_key,
        "POLY-PASSPHRASE": passphrase,
        "Content-Type": "application/json",
    }


print("API setup complete.")
print(f"  Funder: {os.environ.get('POLY_FUNDER', 'NOT SET')[:10]}...")

Reading the Orderbook

The orderbook tells us the current best bid and best ask for any token. This is how we know what price the market is offering.

def get_orderbook(token_id: str) -> dict:
    """Fetch the orderbook for a given token ID.
    
    Returns:
        {
            'bids': [{'price': '0.62', 'size': '150.0'}, ...],
            'asks': [{'price': '0.65', 'size': '200.0'}, ...],
            'best_bid': 0.62,
            'best_ask': 0.65,
            'mid': 0.635,
            'spread': 0.03,
        }
    """
    resp = requests.get(f"{POLY_BASE}/book", params={"token_id": token_id})
    resp.raise_for_status()
    book = resp.json()

    bids = book.get("bids", [])
    asks = book.get("asks", [])

    best_bid = float(bids[0]["price"]) if bids else 0.0
    best_ask = float(asks[0]["price"]) if asks else 1.0

    return {
        "bids": bids,
        "asks": asks,
        "best_bid": best_bid,
        "best_ask": best_ask,
        "mid": (best_bid + best_ask) / 2,
        "spread": best_ask - best_bid,
    }


# Example: fetch orderbook for a token (replace with a real token ID)
# book = get_orderbook("0x1234...")
# print(f"Best bid: {book['best_bid']:.2f}, Best ask: {book['best_ask']:.2f}")
# print(f"Spread: {book['spread']*100:.1f}c")
print("get_orderbook() ready.")

This is 8 of 39 cells. The full module continues with hands-on exercises and working code.

Get All 6 Modules — $49

06 Deployment

44 cells · 15 code · 685 lines

Module 6: Deployment, Monitoring, and Scaling

**Build a Polymarket Prediction Bot from Scratch**

---

What You'll Learn

| Topic | Outcome |

|---|---|

| FastAPI serving | Expose your WP model as a REST API |

| Cloudflare Tunnel | Free HTTPS, no port forwarding, works behind NAT |

| Discord alerts | Real-time edge notifications to your phone |

| Cron + systemd | Automated daily predictions, always-on API |

| Trade logging | JSONL trade log with daily/weekly P&L reports |

| Monitoring | Health checks, stale-data alerts, uptime |

| Scaling | Adding more sports, models, and markets |

| Monetization | Turning your edge into a business |

**Prerequisites:** Modules 1–5 completed. You have a trained WP model, a working Polymarket bot, and historical backtest results.

Why FastAPI?

|-----------|-------|-----------|-------|---------------|

| Flask | Slow | No | No | No |

| **FastAPI** | **Fast** | **Yes** | **Yes** | **Yes** |

FastAPI is the modern standard for Python APIs. It's async-native (your bot already uses async), generates API docs automatically, and validates request data with type hints. For a trading API that needs to respond in milliseconds, it's the right choice.

**Analogy:** Flask is like a bicycle — simple, gets the job done. FastAPI is like an electric bike — same simplicity, but faster and with more features built in.

# ── Install required packages (run this cell first!) ──────────────────────────
# Uncomment the line below and run if you haven't installed these yet:
# !pip install fastapi uvicorn aiohttp requests

How to use AI with this notebook

**New to Python? No problem.** Every cell in this notebook is designed to work with AI coding assistants.

If you get stuck on any cell:

1. **Copy the cell** into Claude, ChatGPT, or any AI assistant

2. **Ask:** "Explain this code line by line"

3. **To customize:** "Help me modify this for soccer instead of NBA"

4. **To debug:** Paste the error message and ask "How do I fix this?"

5. **To extend:** "Add a feature that tracks home/away win streaks"

Think of the AI as a patient tutor sitting next to you. The notebooks give you working code — the AI helps you understand and extend it.

> **Pro tip:** If a cell is confusing, ask the AI: "Explain this to me like I've never written Python before." It will break down every line.

---

6.1 Introduction: From Laptop Script to Production System

You have a bot that works on your laptop. It scrapes ESPN, computes fair probabilities, and places trades on Polymarket. That is genuinely impressive — most people never get this far.

But a laptop script is fragile. Your Wi-Fi drops. Your laptop sleeps. You close the terminal by accident. A single uncaught exception kills the process and you miss the best edge of the week.

Production deployment rests on **three pillars**:

1. Availability — Always Running

Your bot needs to be up when games are on. That means:

A process manager that restarts on crash (systemd, supervisor, pm2)

A server that doesn't sleep (VPS, cloud VM, or a dedicated mini PC)

Graceful handling of transient errors (network timeouts, API rate limits)

2. Observability — Know When It Breaks

Silent failures are the worst kind. Your bot should scream when something goes wrong:

Discord/Slack alerts on errors, edge signals, and trades

Health check endpoints that external monitors can ping

Structured logs (JSONL) that you can query and aggregate

3. Iteration — Keep Improving

The market evolves. Your models need to evolve with it:

Daily P&L reports that surface degradation early

A/B testing new models in shadow mode before going live

Clean separation between model, data, and execution logic

This module builds all three pillars. By the end, you will have a production-grade system that runs itself.

---

6.2 Serving Your Model as a REST API with FastAPI

Wrapping your WP model in an API has two benefits:

1. **Decoupling** — The model runs as a service. Your bot, your dashboard, your mobile app, and your paying subscribers all hit the same endpoint.

2. **Monetization** — An API with an auth key is a product you can sell.

Why FastAPI?

Automatic OpenAPI docs at `/docs`

Type validation via Pydantic (bad requests get rejected with clear errors)

Async support (handles many concurrent requests without threads)

Battle-tested in production by Netflix, Uber, Microsoft

Why FastAPI?

|-----------|-------|-----------|-------|---------------|

| Flask | Slow | No | No | No |

| **FastAPI** | **Fast** | **Yes** | **Yes** | **Yes** |

**Analogy:** Flask is like a bicycle — simple, gets the job done. FastAPI is like an electric bike — same simplicity, but faster and with more features built in.

# Install dependencies (run once)
# !pip install fastapi uvicorn

This is 8 of 44 cells. The full module continues with hands-on exercises and working code.

Get All 6 Modules — $49