Statistical Model Validation — Bootstrap CIs, Calibration

KEY RESULTS

At a Glance

361

Backtested Trades

+3.9c

Avg c/Trade

59.3%

Win Rate

1/7

Statistically Significant

8/7

Profitable Sports

All results reflect the current exported validation snapshot for the deployed strategy configuration, including any pregame filters baked into the backtest.

This snapshot covers 7 sports (ATP, CS2, LOL, MLB, NBA, NHL, SOCCER) and 361 trades from model v4. For current production outcomes, see live results. For access to the live edge feed, see pricing.

Companion: 78-point CLV gap whitepaper

Backtests show what the model would have done. The CLV gap shows whether it's beating the market now. Across 950 trades in our public gap-analysis subset, CLV-positive entries win 89.9%; CLV-negative entries win 11.2%. Two-proportion Z = 24.27, p ≈ 10⁻¹³⁰. Reproducible from the public dataset.

Read the whitepaper → CLV scorecard → Repair status →

What does each page show?

/validation — exported backtest snapshot. What the model would have done on a defined sample with current filters baked in. Numbers shown above.
/results — live ledger. Every trade fired in production, including trades that wouldn't pass current filters. The honest aggregate.
/clv-evidence — empirical skill test. The 78-pp CLV gap; a direct test of whether the model carries information beyond luck on the trades where we captured the closing line.

Performance Summary

Trades	30
Win Rate	60.0%
Avg c/Trade	+11.3c
Total P&L	$3.38
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+11.3c
95% CI	[-5.9c, +27.5c]
99% CI	[-11.2c, +32.5c]
p-value	0.0941
Interpretation	Not significant

Risk Metrics

3.81

Sharpe Ratio

-134.2c

Max Drawdown

1.69

Profit Factor

8

Best Streak

4

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	10	90.0%	+34.4c
10-15c	9	22.2%	-19.4c
15-20c	3	66.7%	+14.7c
20+c	8	62.5%	+15.6c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+8.9c	WIN	+38.0c
? vs ?	BUY	?-?	0.0c	0.0c	+9.1c	WIN	+50.0c
? vs ?	BUY	?-?	0.0c	0.0c	+9.1c	WIN	+40.0c
? vs ?	BUY	?-?	0.0c	0.0c	+9.7c	WIN	+63.0c
? vs ?	BUY	?-?	0.0c	0.0c	+18.1c	WIN	+49.0c

Performance Summary

Trades	82
Win Rate	42.7%
Avg c/Trade	-6.2c
Total P&L	$-5.07
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	-6.2c
95% CI	[-16.0c, +3.7c]
99% CI	[-18.9c, +7.1c]
p-value	0.8893
Interpretation	Not significant

Risk Metrics

-2.12

Sharpe Ratio

-821.0c

Max Drawdown

0.74

Profit Factor

4

Best Streak

9

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	16	50.0%	-11.8c
10-15c	15	46.7%	-12.5c
15-20c	22	50.0%	-4.0c
20+c	29	31.0%	-1.5c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+13.5c	WIN	+36.0c
? vs ?	BUY	?-?	0.0c	0.0c	+14.0c	WIN	+30.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.3c	WIN	+0.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.5c	WIN	+47.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.6c	WIN	+60.0c

Performance Summary

Trades	38
Win Rate	52.6%
Avg c/Trade	+7.6c
Total P&L	$2.90
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+7.6c
95% CI	[-6.0c, +21.1c]
99% CI	[-10.7c, +25.1c]
p-value	0.1368
Interpretation	Not significant

Risk Metrics

2.79

Sharpe Ratio

-134.0c

Max Drawdown

1.48

Profit Factor

4

Best Streak

3

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	7	57.1%	+3.4c
10-15c	9	55.6%	+14.6c
15-20c	8	75.0%	+24.4c
20+c	14	35.7%	-4.3c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+13.6c	WIN	+28.0c
? vs ?	BUY	?-?	0.0c	0.0c	+13.8c	WIN	+44.0c
? vs ?	BUY	?-?	0.0c	0.0c	+13.8c	WIN	+53.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.3c	WIN	+40.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.9c	WIN	+32.0c

Performance Summary

Trades	105
Win Rate	68.6%
Avg c/Trade	+6.9c
Total P&L	$7.23
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+6.9c
95% CI	[-2.6c, +15.6c]
99% CI	[-5.3c, +18.5c]
p-value	0.0724
Interpretation	Not significant

Risk Metrics

2.28

Sharpe Ratio

-327.9c

Max Drawdown

1.36

Profit Factor

11

Best Streak

4

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	54	72.2%	+7.0c
10-15c	23	69.6%	+6.3c
15-20c	5	20.0%	-41.4c
20+c	23	69.6%	+17.7c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+11.5c	WIN	+25.0c
? vs ?	BUY	?-?	0.0c	0.0c	+12.1c	WIN	+42.0c
? vs ?	BUY	?-?	0.0c	0.0c	+12.1c	WIN	+47.0c
? vs ?	BUY	?-?	0.0c	0.0c	+12.4c	WIN	+35.0c
? vs ?	BUY	?-?	0.0c	0.0c	+12.8c	WIN	+55.0c

Performance Summary

Trades	15
Win Rate	46.7%
Avg c/Trade	-14.7c
Total P&L	$-2.21
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	-14.7c
95% CI	[-37.9c, +8.4c]
99% CI	[-43.9c, +14.3c]
p-value	0.8970
Interpretation	Not significant

Risk Metrics

-4.97

Sharpe Ratio

-336.0c

Max Drawdown

0.51

Profit Factor

2

Best Streak

4

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	5	60.0%	-7.2c
10-15c	4	25.0%	-40.0c
15-20c	4	50.0%	-7.2c
20+c	2	50.0%	+2.0c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+8.2c	WIN	+45.0c
? vs ?	BUY	?-?	0.0c	0.0c	+10.1c	WIN	+22.0c
? vs ?	BUY	?-?	0.0c	0.0c	+17.2c	WIN	+31.0c
? vs ?	BUY	?-?	0.0c	0.0c	+17.4c	WIN	+33.0c
? vs ?	BUY	?-?	0.0c	0.0c	+24.3c	WIN	+39.0c

Model Architecture

Architecture	Split-Phase XGBoost (early-game + clutch-time models)
Features	14 engineered features
Calibration	Isotonic regression
Training Data	5,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades	66
Win Rate	66.7%
Avg c/Trade	+5.8c
Total P&L	$3.84
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+5.8c
95% CI	[-5.3c, +16.5c]
99% CI	[-8.6c, +19.6c]
p-value	0.1529
Interpretation	Not significant

Risk Metrics

2.03

Sharpe Ratio

-302.0c

Max Drawdown

1.31

Profit Factor

9

Best Streak

4

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	15	66.7%	+4.0c
10-15c	20	65.0%	+0.9c
15-20c	17	82.4%	+20.5c
20+c	14	50.0%	-3.1c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+12.5c	WIN	+36.0c
? vs ?	BUY	?-?	0.0c	0.0c	+13.5c	WIN	+39.0c
? vs ?	BUY	?-?	0.0c	0.0c	+13.8c	WIN	+28.0c
? vs ?	BUY	?-?	0.0c	0.0c	+14.6c	WIN	+25.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.5c	WIN	+0.0c

Model Architecture

Architecture	XGBoost + Isotonic calibration
Features	12 engineered features
Calibration	Isotonic regression
Training Data	4,225 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_ice, pregame_wp, score_diff_x_tf, score_diff_sq, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades	11
Win Rate	45.5%
Avg c/Trade	+0.6c
Total P&L	$0.07
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+0.6c
95% CI	[-25.6c, +28.6c]
99% CI	[-32.7c, +38.0c]
p-value	0.5028
Interpretation	Not significant

Risk Metrics

0.20

Sharpe Ratio

-123.0c

Max Drawdown

1.03

Profit Factor

2

Best Streak

3

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	4	50.0%	+6.8c
10-15c	3	33.3%	-2.3c
15-20c	2	50.0%	+15.0c
20+c	2	50.0%	-21.5c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
? vs ?	BUY	?-?	0.0c	0.0c	+8.2c	WIN	+58.0c
? vs ?	BUY	?-?	0.0c	0.0c	+8.5c	WIN	+62.0c
? vs ?	BUY	?-?	0.0c	0.0c	+12.8c	WIN	+66.0c
? vs ?	BUY	?-?	0.0c	0.0c	+15.9c	WIN	+58.0c
? vs ?	BUY	?-?	0.0c	0.0c	+22.5c	WIN	+0.0c

Benchmark Comparisons

How our model performs vs naive strategies. A model that can't beat simple baselines isn't worth using.

ATP

Strategy	Win Rate	c/Trade
Our Model	60.0%	+11.3c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

CS2

Strategy	Win Rate	c/Trade
Our Model	42.7%	-6.2c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

LOL

Strategy	Win Rate	c/Trade
Our Model	52.6%	+7.6c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

MLB

Strategy	Win Rate	c/Trade
Our Model	68.6%	+6.9c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

NBA

Strategy	Win Rate	c/Trade
Our Model	46.7%	-14.7c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

NCAAMB

Strategy	Win Rate	c/Trade
Our Model	100.0%	+27.2c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

NCAAWB

Strategy	Win Rate	c/Trade
Our Model	75.0%	+20.0c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

NHL

Strategy	Win Rate	c/Trade
Our Model	66.7%	+5.8c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

SOCCER

Strategy	Win Rate	c/Trade
Our Model	45.5%	+0.6c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

TENNIS

Strategy	Win Rate	c/Trade
Our Model	100.0%	+35.0c
Random (50/50)	50.0%	-2.0c
Market-Efficient	0.0%	+0.0c

Academic Foundation

Our approach is grounded in peer-reviewed research on sports prediction markets and probabilistic forecasting.

Beating the bookies with their own numbers - and how the online sports betting market is rigged

Kaunitz, Zhong, Kreiner (2017)

CLV validation - demonstrates that a positive closing line value strategy yields positive long-term returns

Verification of forecasts expressed in terms of probability

Brier, Glenn W. (1950)

Foundation for calibration analysis - Brier score measures probabilistic prediction accuracy

Using random forests to estimate win probability before each play of an NFL game

Lock, Dennis; Nettleton, Dan (2014)

In-game WP modeling methodology - random forests on game state features for real-time prediction

Why are gambling markets organised so differently from financial markets?

Levitt, Steven D. (2004)

Market efficiency analysis - sports markets exhibit inefficiencies exploitable by informed bettors

Optimal betting odds against insider traders

Shin, Hyun Song (1991)

Theoretical foundation for bookmaker pricing models and adverse selection in betting markets

A Brownian motion model for the progress of sports scores

Stern, Hal (1994)

Score-diff as Brownian motion - theoretical underpinning for WP models based on score differential and time

Methodology & Anti-Overfitting Safeguards

Training / test split — Models are trained on historical ESPN game-state snapshots (multiple seasons), then tested on held-out recent-season data using real Polymarket prices the model never saw during training. No future data leaks into features.

Realistic backtesting — Poly-price backtests use actual market snapshots and modeled execution costs rather than idealized fills. Entry prices reflect live market conditions at the time of the backtest snapshot.

Bootstrap confidence intervals — 10,000 resamples with replacement. The p-value is the fraction of bootstrap means ≤ 0, testing H0: "the model has no edge." A p < 0.05 means we're 95%+ confident the edge is real.

Calibration — Predictions are bucketed into 5%-wide bins (min 5 trades each). A well-calibrated model's dots land on the diagonal; points below the line indicate overconfidence.

Sharpe ratio — Annualized (sqrt(252) scaling) on per-trade P&L. Values above 1.0 indicate strong risk-adjusted returns; above 3.0 is exceptional.

Profit factor — Gross wins / gross losses. Above 1.25 = profitable. Above 1.5 = strong. Above 2.0 = excellent.

Fee assumptions — Results are shown net of the execution-cost assumptions used when this validation export was generated. Live exchange fees and market microstructure can change over time.

Pregame filter — For NBA and NHL, the deployed strategy requires the pregame market price to agree with the model's bet side at ≥55c. This filters out trades where the model disagrees with market consensus, reducing adverse selection.