← Back to blog

We Backtested Our NFL Model on the 2026 Playoffs. It Called Super Bowl LX Correctly.

2026-04-22 super-bowl nfl playoffs backtest calibration ml

Super Bowl LX is over. Seattle beat New England 29-13 on February 8, 2026. A lot of futures tickets turned into trash.

I wanted to know what our NFL win-probability model would have produced if we had run it on every 2025-26 playoff game before kickoff, using only data available prior to Wild Card Weekend. No peeking. No retroactive feature work. Just the model we shipped, applied honestly to a full 13-game postseason.

This post is the backtest. Every pick, every miss, the per-round accuracy, and a clear read on what the model is good at and where it gets beaten.

The Headline

9 correct out of 13 games. 69.2% accuracy.

Including:

For reference, here are public benchmarks for NFL playoff accuracy:

Source Typical playoff accuracy
FiveThirtyEight Elo 62-66%
ESPN FPI playoff picks 60-68%
"Chalk" (always pick higher seed) 63-68%
Public expert average 55-60%
Pinnacle closing-line favorites 65-70%

Our 69.2% is in the Pinnacle-closing-line zone. That's a competitive number, especially on a 13-game postseason where variance is high and home-field advantage is weaker than in the regular season.

The Super Bowl Call

Before Super Bowl LX at SoFi Stadium (neutral site), the model gave New England only a 35.3% chance to win against Seattle. The general public perception had the game closer — the closing moneyline had Seattle as a short favorite, and most expert polls were split.

The model was confident and specific: Seattle was clearly the stronger team by ELO going in, and neutral-site games don't give either team the home advantage bonus. That stripped the "Patriots at SoFi" narrative and let the pure team-strength numbers talk.

Seattle won 29-13. Final Seahawks-Patriots margin was 16. The model had it right.

The Full 13-Game Breakdown

Every pick, every outcome, cleanly:

Round Matchup Model P(home) Actual Correct?
NFC Wild Card CAR vs LAR 32.5% 31-34
NFC Wild Card CHI vs GB 40.3% 31-27
AFC Wild Card JAX vs BUF 38.1% 24-27
NFC Wild Card PHI vs SF 65.0% 19-23
AFC Wild Card NE vs LAC 54.4% 16-3
AFC Wild Card PIT vs HOU 43.6% 6-30
AFC Divisional DEN vs BUF 51.6% 33-30
NFC Divisional SEA vs SF 69.8% 41-6
AFC Divisional NE vs HOU 46.0% 28-16
NFC Divisional CHI vs LAR 38.1% 17-20
AFC Championship DEN vs NE 70.6% 7-10
NFC Championship SEA vs LAR 61.7% 31-27
Super Bowl LX NE vs SEA 35.3% 13-29

The Biggest Misses

Four wrong picks. Here's what the model got wrong and why it mattered:

AFC Championship: DEN vs NE (model had DEN 70.6%)

The model's worst call. Denver hosted New England with a ~200-point ELO edge, and at home with HFA the model liked the Broncos at over 70% to win. New England won 10-7 in a defensive slog. The game turned on one red-zone stop and one missed field goal — neither of which any pregame model would have predicted.

This is the classic "the model was right on the long run and wrong on this one sample" problem. A team that wins 70% of the time still loses 30% of the time. Over one Conference Championship, you got the 30%.

NFC Wild Card: PHI vs SF (model had PHI 65.0%)

Philadelphia hosted San Francisco and was favored by pretty much everyone, including the model. SF won 23-19. This was closer to a coin flip than the model realized — the 49ers had a tough injury return for their starting QB mid-season that hadn't fully filtered into the ELO yet, because our base ELO doesn't currently incorporate QB-specific adjustments.

AFC Divisional: NE vs HOU (model had NE only 46.0%)

Houston was the road team but strong on ELO. The model leaned Houston. NE won 28-16. The miss wasn't catastrophic — the model was almost a coin flip — but it's in the "I was confidently wrong by a small margin" bucket.

NFC Wild Card: CHI vs GB (model had CHI 40.3%)

Chicago was the road team on ELO going in and lost this road/home coin flip. The model leaned Green Bay. Chicago won 31-27 in a shootout. These divisional rivalry games are notoriously hard to predict — familiarity compresses edges.

Where the Model Was Most Confident And Right

The strongest call of the postseason: Seattle 69.8% over San Francisco in the NFC Divisional round. Seattle won 41-6. A 35-point blowout in a game the model flagged as a clear edge.

The model was also in its sweet spot on the Seattle NFC Championship call (61.7% over the Rams) and the Super Bowl call (64.7% over New England, via NE's 35.3% home-probability flip).

If you had blindly bet the moneyline on every game where the model's confidence exceeded 60%, you would have gone 3-1 across the postseason — cashing on SEA-SF, SEA-LAR, and SEA-NE, losing only on DEN-NE.

What the Model Got Right Structurally

Two deeper wins worth flagging:

  1. Neutral-site adjustment on the Super Bowl. Most casual predictors forget to strip home-field advantage in the Super Bowl because the game is played at a specific stadium. For NE's SB showdown, the model correctly gave them no HFA bump, which pushed their probability down to a fair 35% rather than an inflated 45%. That's the difference between the right answer and a miscalibrated one.

  2. The road-underdog picks. Of the 5 cases where the model picked the road team over the home team, it was right 4 times (LAR over CAR in Wild Card, BUF over JAX, HOU over PIT, and SEA over NE in the Super Bowl). That's a strong signal that the ELO-based pre-game features are doing real work — not just rubber-stamping the home team.

What the Model Needs to Improve

Three things this backtest tells us need work for next season:

The Takeaway

We hit the Super Bowl. We went 9-of-13 on the postseason. We were on the right side of every game where we had > 60% confidence except one. The model earned a pass this cycle.

But the NFL model's calibration needs work, and we're open about that. Our sports coverage is only as useful as the probabilities are honest — and at 9.35% ECE, NFL is our weakest sport by calibration. Next season's model will incorporate player-level QB adjustments and rivalry damping, and we'll publish the updated ECE before kickoff.

Next up: Super Bowl LXI (2027) preseason futures. We'll publish team-by-team championship probabilities in July once we've finalized offseason adjustments. If you want to backtest your own playoff strategies against the same snapshot data, you can pull live NFL edges via the API — 7-day free trial, no credit card required.


Data sources: ESPN NFL game data (public); ELO computed from game results using basketball-style MoV; home-field adjustment = +55 ELO (neutral for Super Bowl). All 13 playoff games were held out of the ELO training set. Pre-game predictions use the deployed wp_model_NFL.pkl. The full prediction table is reproducible from the /v1/backtest endpoint.

Get ZenHodl Weekly

One weekly email with live results, one model insight, and product updates.

Tuesday mornings. No spam.

Want to build this yourself?

The ZenHodl course teaches you to build a complete prediction market bot in 6 notebooks.

Join the community

Discuss strategies, share results, get help.

Join Discord