11 Sports, One API: Designing a Unified Schema for Multi-Sport Win Probabilities

Building a sports prediction API for one sport is a focused engineering problem. Building one that covers eleven sports is a schema design problem. The difference is not size — it is that every sport has its own state model, its own clock semantics, its own outcome space, and its own edge cases.

This post walks through how we designed our schema to handle NBA, NHL, MLB, NCAAMB, NCAAWB, CFB, NFL, soccer, tennis, CS2, and LoL through a single API surface, without forcing each sport into a procrustean lowest common denominator. The schema is the foundation that makes everything else possible.

The Naive Approach Falls Apart at Sport Three

The first version of our API was NBA-only. The natural response object looked like this:

{
  "game_id": "401705412",
  "home_team": "LAL",
  "away_team": "BOS",
  "score_home": 67,
  "score_away": 71,
  "quarter": 3,
  "time_remaining_sec": 442,
  "fair_prob_home": 0.418
}

Clean, intuitive, easy to consume. Adding NHL was straightforward — change "quarter" to "period" — and the response stayed clean.

Then came soccer. Soccer does not have periods or quarters. It has elapsed minutes, plus stoppage time displayed in a non-numeric format ("45'+3'"). It has three outcomes (home / draw / away), not two. Substitutions and red cards matter as state. The cardinality of the outcome space is wrong, the clock semantics are wrong, and the state model needs new fields.

The natural temptation is to add union types. quarter becomes period_or_quarter_or_minute_or_set. fair_prob_home becomes fair_prob keyed by something. The response object grows tentacles. Three sports in, the schema is unrecognizable.

The clean solution is a layered design.

The Unified Schema

We split the response into three layers:

Common metadata — fields that exist for every sport: game_id, sport, home_team, away_team, start_time, as_of, model_version, ece. These are the fields any consumer can rely on without knowing the sport.

Outcome probabilities — a list of named outcomes with probabilities. Two-outcome sports return two entries (home / away). Soccer returns three (home / draw / away). Tennis returns two (player_a / player_b). The structure is identical across sports; the cardinality varies.

Sport-specific state — a nested object whose schema is sport-dependent. Basketball state has quarter, score_home, score_away, time_remaining_sec. Soccer state has minute, score_home, score_away, red_cards_home, red_cards_away. Tennis state has set, game, point, server, tiebreak_active. Each sport defines its own state schema, documented in the API docs.

The response object now looks like:

{
  "sport": "soccer",
  "game_id": "ENG-EPL-1234567",
  "home_team": "Arsenal",
  "away_team": "Manchester City",
  "start_time": "2026-05-11T14:30:00Z",
  "as_of": "2026-05-11T15:18:42Z",
  "model_version": "soccer_wp_v4.2",
  "ece": 0.041,
  "outcomes": [
    {"name": "home", "prob": 0.412, "prob_calibrated": 0.408},
    {"name": "draw", "prob": 0.298, "prob_calibrated": 0.301},
    {"name": "away", "prob": 0.290, "prob_calibrated": 0.291}
  ],
  "state": {
    "minute": "48'+2'",
    "score_home": 1,
    "score_away": 1,
    "red_cards_home": 0,
    "red_cards_away": 1
  }
}

For NBA the same structure produces:

{
  "sport": "NBA",
  "game_id": "401705412",
  "home_team": "LAL",
  "away_team": "BOS",
  "start_time": "2026-05-11T02:30:00Z",
  "as_of": "2026-05-11T03:14:33Z",
  "model_version": "wp_v3.4",
  "ece": 0.029,
  "outcomes": [
    {"name": "home", "prob": 0.617, "prob_calibrated": 0.604},
    {"name": "away", "prob": 0.383, "prob_calibrated": 0.396}
  ],
  "state": {
    "quarter": 3,
    "score_home": 67,
    "score_away": 71,
    "time_remaining_sec": 442,
    "possession": "away"
  }
}

A consumer that only cares about win probabilities reads the outcomes array and ignores state. A consumer that wants to display the live game can render state based on the sport. The outcomes shape is uniform; the state shape is sport-aware. Both consumers get what they need without paying the complexity cost of the other.

Lessons Learned

Several design choices that paid off, and a few we got wrong the first time.

Make outcomes a list, not a fixed pair. The single biggest design improvement was switching from fair_prob_home / fair_prob_away to a list of named outcomes. It accommodates two-way moneylines, three-way soccer, multi-class futures, and prop markets without schema changes.

Always return both raw and calibrated probabilities. Some consumers (research) want the raw model output. Most (trading) want the calibrated version. Returning both costs a few bytes and avoids endless support questions about which one to use.

Document state schemas separately per sport. We tried to write one combined "state object reference" page; it was unreadable. Per-sport pages are clearer, and they signal honestly to consumers that state is sport-dependent.

Use ISO 8601 timestamps everywhere. Mixing UTC seconds, milliseconds, ISO strings, and "X minutes ago" caused parsing bugs across consumers. Pick one — we picked ISO 8601 in UTC — and use it for every timestamp field.

Surface as_of even when "live" implies fresh. We learned the hard way that consumers cache responses. An as_of field lets them know how stale the cached response is. Without it, they treat a 30-second-old prediction the same as a 2-hour-old one.

Things we got wrong the first time: abbreviating sport codes inconsistently (we now use NBA, NHL, MLB, NCAAMB, NCAAWB, CFB, NFL, soccer, tennis, CS2, LoL — mixed-case where the original is mixed-case), trying to embed orderbook prices into the prediction response (separated to a different endpoint), and using minute-of-day for tip-off times instead of full ISO datetimes (broke around midnight UTC).

Endpoint Design Mirrors the Schema

The endpoint structure mirrors the schema structure. /v1/predict/{sport}/live returns the unified response for all live games of a sport. /v1/predict/{sport}/{game_id} returns the unified response for one game. /v1/predictions/{date} returns a list of unified responses for a date.

A consumer can write a single response handler and call any of the three endpoints. The shape is the same.

For sport-specific consumers, we offer typed Pydantic models in the Python SDK so basketball-only callers get strong types on the basketball state object. Each sport has its own model. The SDK chooses the right one based on the response's sport field.

The Bottom Line

A multi-sport prediction API is a schema design problem first and a modeling problem second. Get the schema right and you can add sport number twelve in a week. Get it wrong and adding sport number four breaks every existing client.

The pattern is general. Common metadata, polymorphic outcomes, sport-specific state. It works for sports; it works for any other domain where you serve multiple structurally similar but not identical entities through a single API.

Full unified schema documented at zenhodl.net/docs. Live across 11 sports at zenhodl.net/v1/try. Free seven-day API trial at zenhodl.net/pricing.