We Audited 21 Sports Prediction Sources. Only 1 Publishes Their Calibration Error.

We audited 21 sports prediction sources on the one number a probability product should never hide: calibration. Only one publishes a numeric Expected Calibration Error.

That source is us. We don't think that's a flex. We think it's an indictment of an industry that sells "70% AI-powered picks" without telling you what 70% actually means in their data.

So we built the Transparency Index. It's a live scorecard of every notable sports forecaster, scored on five publicly verifiable dimensions, with one new column we added today: their self-reported ECE.

The headline number

Of 14 sources that claim to be sports forecasters (excluding 7 venues, aggregators, and closing-line providers):

1 publishes a numeric ECE — ZenHodl, on a 5,345-game NCAAMB holdout, with the script you can re-run
1 makes ECE derivable — FiveThirtyEight, by publishing every raw forecast and outcome on GitHub before they archived in 2023
5 publish partial metrics — Bart Torvik (reliability tables), MoneyPuck (log loss), BetQL (accuracy claims), Manifold (leaderboard Brier), Massey (per-sport accuracy)
7 publish nothing at all — KenPom, Action Network, Covers, SportsLine, Sportradar, Stats Perform, SportsDataIO

Read that list carefully. Sportradar serves hedge funds. Stats Perform sells institutional probability feeds. KenPom is the gold standard for college basketball ratings. None of them will tell you their calibration in a number you can verify.

Why this is the only metric that matters

Accuracy is the wrong question. A model that predicts the home team wins 51% of the time and gets it right 51% of the time is "accurate." So is one that predicts the home team wins 99% of the time and gets it right 51% of the time. They produce identical bet records and very different P&Ls.

Expected Calibration Error answers the only question that actually matters for someone who wants to use the predictions: when you say 70%, do teams win 70% of the time? When you say 85%, do they hit 85%?

If a provider's "70% confidence" picks actually win 60% of the time, every trade you make at that confidence level is a loser. You can't size positions, you can't time your bankroll, you can't compound. The dollar value of a probability product is bounded above by its calibration. Without ECE, you're flying blind.

We wrote a longer post on why this killed an NBA model of ours that was 65% accurate and losing money. The fix wasn't a better model. It was admitting we had a calibration bug.

What "publishing ECE" actually requires

A score of "Numeric" on our index requires four things:

A real number — not "high accuracy" or "industry-leading" or a star rating
A defined holdout — which games, which seasons, which sport
A verifiable methodology — how were predictions made, how was ECE computed
A reproducible artifact — code, data, or both

ZenHodl is at 4.39% ECE on 5,345 NCAAMB games from the 2025-26 regular season. The data is in our season report, the model weights are deployed in production, the script that computes ECE is in our course. You can re-run the math.

FiveThirtyEight didn't publish a single ECE number, but they did publish every raw forecast plus every outcome to GitHub. We score that as "Derivable" — the work to compute ECE is left to the reader, but the input data is honest and complete. That's still better than 19 of the 21 sources we tracked.

The other 19 either ignore the question, replace it with accuracy claims, or hide behind enterprise NDAs.

What hiding ECE actually communicates

Every "AI-powered sports prediction" company has computed their ECE internally. They have to — you can't tune a model without measuring it. The question isn't whether they know. It's whether they'll tell you.

When a provider won't publish:

If their ECE is good, they'd publish it as a marketing weapon. Nobody hides good numbers.
If their ECE is bad, they have a strong incentive to keep it dark and lean on accuracy theater.
If they don't know, they shouldn't be selling a probability product.

There's no fourth option that flatters them.

Where we don't score ourselves a perfect 5

We're 4/5 on methodology, not 5/5. The reason: our model training code isn't open-source. The pipeline is documented in our blog and course, the live model weights ship in pickled form, but a third party couldn't fork our repo and reproduce our weights from scratch today. That's a real gap and we list it on our own row.

We commit to publishing per-sport ECE every season, including the seasons where it gets worse. NCAAMB went from 2.2% in our 2024-25 holdout to 4.39% in our live 2025-26 deployment — that gap is real and we're publishing it not despite our preferences but because of them.

What would change the index

If you run a sports prediction service and we have you misclassified, tell us. The criteria for moving from "Silent" to "Numeric" are listed above and they're the same criteria for everyone, including us. We re-verify the index monthly.

If you want to compare us model-to-model, we're game. We'll commit to a pre-announced sport, a pre-announced metric set, a pre-announced timeframe, and we'll publish the loss alongside the win. The first provider to do that with us in writing gets the spotlight.

Until then: transparency-index ↗.