Model Validation

Statistical proof that our edge is real — not luck. Bootstrap confidence intervals, calibration analysis, and risk metrics across 7 sports.

Generated 2026-04-13 | Model v4

KEY RESULTS

At a Glance

361
Backtested Trades
+3.9c
Avg c/Trade
59.3%
Win Rate
1/7
Statistically Significant
8/7
Profitable Sports

All results reflect the current exported validation snapshot for the deployed strategy configuration, including any pregame filters baked into the backtest.

This snapshot covers 7 sports (ATP, CS2, LOL, MLB, NBA, NHL, SOCCER) and 361 trades from model v4. For current production outcomes, see live results. For access to the live edge feed, see pricing.

Performance Summary

Trades30
Win Rate60.0%
Avg c/Trade+11.3c
Total P&L$3.38
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+11.3c
95% CI[-5.9c, +27.5c]
99% CI[-11.2c, +32.5c]
p-value0.0941
InterpretationNot significant

Risk Metrics

3.81
Sharpe Ratio
-134.2c
Max Drawdown
1.69
Profit Factor
8
Best Streak
4
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 10 90.0% +34.4c
10-15c 9 22.2% -19.4c
15-20c 3 66.7% +14.7c
20+c 8 62.5% +15.6c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +8.9c WIN +38.0c
? vs ? BUY ?-? 0.0c 0.0c +9.1c WIN +50.0c
? vs ? BUY ?-? 0.0c 0.0c +9.1c WIN +40.0c
? vs ? BUY ?-? 0.0c 0.0c +9.7c WIN +63.0c
? vs ? BUY ?-? 0.0c 0.0c +18.1c WIN +49.0c

Performance Summary

Trades82
Win Rate42.7%
Avg c/Trade-6.2c
Total P&L$-5.07
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean-6.2c
95% CI[-16.0c, +3.7c]
99% CI[-18.9c, +7.1c]
p-value0.8893
InterpretationNot significant

Risk Metrics

-2.12
Sharpe Ratio
-821.0c
Max Drawdown
0.74
Profit Factor
4
Best Streak
9
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 16 50.0% -11.8c
10-15c 15 46.7% -12.5c
15-20c 22 50.0% -4.0c
20+c 29 31.0% -1.5c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +13.5c WIN +36.0c
? vs ? BUY ?-? 0.0c 0.0c +14.0c WIN +30.0c
? vs ? BUY ?-? 0.0c 0.0c +15.3c WIN +0.0c
? vs ? BUY ?-? 0.0c 0.0c +15.5c WIN +47.0c
? vs ? BUY ?-? 0.0c 0.0c +15.6c WIN +60.0c

Performance Summary

Trades38
Win Rate52.6%
Avg c/Trade+7.6c
Total P&L$2.90
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+7.6c
95% CI[-6.0c, +21.1c]
99% CI[-10.7c, +25.1c]
p-value0.1368
InterpretationNot significant

Risk Metrics

2.79
Sharpe Ratio
-134.0c
Max Drawdown
1.48
Profit Factor
4
Best Streak
3
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 7 57.1% +3.4c
10-15c 9 55.6% +14.6c
15-20c 8 75.0% +24.4c
20+c 14 35.7% -4.3c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +13.6c WIN +28.0c
? vs ? BUY ?-? 0.0c 0.0c +13.8c WIN +44.0c
? vs ? BUY ?-? 0.0c 0.0c +13.8c WIN +53.0c
? vs ? BUY ?-? 0.0c 0.0c +15.3c WIN +40.0c
? vs ? BUY ?-? 0.0c 0.0c +15.9c WIN +32.0c

Performance Summary

Trades105
Win Rate68.6%
Avg c/Trade+6.9c
Total P&L$7.23
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+6.9c
95% CI[-2.6c, +15.6c]
99% CI[-5.3c, +18.5c]
p-value0.0724
InterpretationNot significant

Risk Metrics

2.28
Sharpe Ratio
-327.9c
Max Drawdown
1.36
Profit Factor
11
Best Streak
4
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 54 72.2% +7.0c
10-15c 23 69.6% +6.3c
15-20c 5 20.0% -41.4c
20+c 23 69.6% +17.7c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +11.5c WIN +25.0c
? vs ? BUY ?-? 0.0c 0.0c +12.1c WIN +42.0c
? vs ? BUY ?-? 0.0c 0.0c +12.1c WIN +47.0c
? vs ? BUY ?-? 0.0c 0.0c +12.4c WIN +35.0c
? vs ? BUY ?-? 0.0c 0.0c +12.8c WIN +55.0c

Performance Summary

Trades15
Win Rate46.7%
Avg c/Trade-14.7c
Total P&L$-2.21
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean-14.7c
95% CI[-37.9c, +8.4c]
99% CI[-43.9c, +14.3c]
p-value0.8970
InterpretationNot significant

Risk Metrics

-4.97
Sharpe Ratio
-336.0c
Max Drawdown
0.51
Profit Factor
2
Best Streak
4
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 5 60.0% -7.2c
10-15c 4 25.0% -40.0c
15-20c 4 50.0% -7.2c
20+c 2 50.0% +2.0c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +8.2c WIN +45.0c
? vs ? BUY ?-? 0.0c 0.0c +10.1c WIN +22.0c
? vs ? BUY ?-? 0.0c 0.0c +17.2c WIN +31.0c
? vs ? BUY ?-? 0.0c 0.0c +17.4c WIN +33.0c
? vs ? BUY ?-? 0.0c 0.0c +24.3c WIN +39.0c

Model Architecture

ArchitectureSplit-Phase XGBoost (early-game + clutch-time models)
Features14 engineered features
CalibrationIsotonic regression
Training Data5,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades66
Win Rate66.7%
Avg c/Trade+5.8c
Total P&L$3.84
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+5.8c
95% CI[-5.3c, +16.5c]
99% CI[-8.6c, +19.6c]
p-value0.1529
InterpretationNot significant

Risk Metrics

2.03
Sharpe Ratio
-302.0c
Max Drawdown
1.31
Profit Factor
9
Best Streak
4
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 15 66.7% +4.0c
10-15c 20 65.0% +0.9c
15-20c 17 82.4% +20.5c
20+c 14 50.0% -3.1c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +12.5c WIN +36.0c
? vs ? BUY ?-? 0.0c 0.0c +13.5c WIN +39.0c
? vs ? BUY ?-? 0.0c 0.0c +13.8c WIN +28.0c
? vs ? BUY ?-? 0.0c 0.0c +14.6c WIN +25.0c
? vs ? BUY ?-? 0.0c 0.0c +15.5c WIN +0.0c

Model Architecture

ArchitectureXGBoost + Isotonic calibration
Features12 engineered features
CalibrationIsotonic regression
Training Data4,225 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_ice, pregame_wp, score_diff_x_tf, score_diff_sq, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades11
Win Rate45.5%
Avg c/Trade+0.6c
Total P&L$0.07
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+0.6c
95% CI[-25.6c, +28.6c]
99% CI[-32.7c, +38.0c]
p-value0.5028
InterpretationNot significant

Risk Metrics

0.20
Sharpe Ratio
-123.0c
Max Drawdown
1.03
Profit Factor
2
Best Streak
3
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2500

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 4 50.0% +6.8c
10-15c 3 33.3% -2.3c
15-20c 2 50.0% +15.0c
20+c 2 50.0% -21.5c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
? vs ? BUY ?-? 0.0c 0.0c +8.2c WIN +58.0c
? vs ? BUY ?-? 0.0c 0.0c +8.5c WIN +62.0c
? vs ? BUY ?-? 0.0c 0.0c +12.8c WIN +66.0c
? vs ? BUY ?-? 0.0c 0.0c +15.9c WIN +58.0c
? vs ? BUY ?-? 0.0c 0.0c +22.5c WIN +0.0c

Benchmark Comparisons

How our model performs vs naive strategies. A model that can't beat simple baselines isn't worth using.

ATP

Strategy Win Rate c/Trade
Our Model 60.0% +11.3c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

CS2

Strategy Win Rate c/Trade
Our Model 42.7% -6.2c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

LOL

Strategy Win Rate c/Trade
Our Model 52.6% +7.6c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

MLB

Strategy Win Rate c/Trade
Our Model 68.6% +6.9c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

NBA

Strategy Win Rate c/Trade
Our Model 46.7% -14.7c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

NCAAMB

Strategy Win Rate c/Trade
Our Model 100.0% +27.2c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

NCAAWB

Strategy Win Rate c/Trade
Our Model 75.0% +20.0c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

NHL

Strategy Win Rate c/Trade
Our Model 66.7% +5.8c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

SOCCER

Strategy Win Rate c/Trade
Our Model 45.5% +0.6c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

TENNIS

Strategy Win Rate c/Trade
Our Model 100.0% +35.0c
Random (50/50) 50.0% -2.0c
Market-Efficient 0.0% +0.0c

Academic Foundation

Our approach is grounded in peer-reviewed research on sports prediction markets and probabilistic forecasting.

Beating the bookies with their own numbers - and how the online sports betting market is rigged
Kaunitz, Zhong, Kreiner (2017)
CLV validation - demonstrates that a positive closing line value strategy yields positive long-term returns
Verification of forecasts expressed in terms of probability
Brier, Glenn W. (1950)
Foundation for calibration analysis - Brier score measures probabilistic prediction accuracy
Using random forests to estimate win probability before each play of an NFL game
Lock, Dennis; Nettleton, Dan (2014)
In-game WP modeling methodology - random forests on game state features for real-time prediction
Why are gambling markets organised so differently from financial markets?
Levitt, Steven D. (2004)
Market efficiency analysis - sports markets exhibit inefficiencies exploitable by informed bettors
Optimal betting odds against insider traders
Shin, Hyun Song (1991)
Theoretical foundation for bookmaker pricing models and adverse selection in betting markets
A Brownian motion model for the progress of sports scores
Stern, Hal (1994)
Score-diff as Brownian motion - theoretical underpinning for WP models based on score differential and time

Methodology & Anti-Overfitting Safeguards

Training / test split — Models are trained on historical ESPN game-state snapshots (multiple seasons), then tested on held-out recent-season data using real Polymarket prices the model never saw during training. No future data leaks into features.

Realistic backtesting — Poly-price backtests use actual market snapshots and modeled execution costs rather than idealized fills. Entry prices reflect live market conditions at the time of the backtest snapshot.

Bootstrap confidence intervals — 10,000 resamples with replacement. The p-value is the fraction of bootstrap means ≤ 0, testing H0: "the model has no edge." A p < 0.05 means we're 95%+ confident the edge is real.

Calibration — Predictions are bucketed into 5%-wide bins (min 5 trades each). A well-calibrated model's dots land on the diagonal; points below the line indicate overconfidence.

Sharpe ratio — Annualized (sqrt(252) scaling) on per-trade P&L. Values above 1.0 indicate strong risk-adjusted returns; above 3.0 is exceptional.

Profit factor — Gross wins / gross losses. Above 1.25 = profitable. Above 1.5 = strong. Above 2.0 = excellent.

Fee assumptions — Results are shown net of the execution-cost assumptions used when this validation export was generated. Live exchange fees and market microstructure can change over time.

Pregame filter — For NBA and NHL, the deployed strategy requires the pregame market price to agree with the model's bet side at ≥55c. This filters out trades where the model disagrees with market consensus, reducing adverse selection.

Turn proof into a next step

Use the live API, review current trading results, or learn how the models are built.