Backtesting Polymarket Strategies: Tools, Datasets, and the Depth Data Problem

The single biggest reason most Polymarket backtests overstate live P&L: Polymarket does not expose historical orderbook depth. The free CLOB API gives you mid-prices, which are enough to backtest strategy logic but not enough to backtest execution. Without real depth data you cannot know how a given order would have actually filled — you can only estimate, and bad estimates produce backtests that look great until the strategy goes live and bleeds money on every fill.

This post walks through what you can do with the free price-history data, how to approximate slippage when you cannot measure it, and when the approximation is no longer enough — at which point you either capture depth data yourself going forward, or buy a dataset that already captured it.

What You Need for a Useful Polymarket Backtest

A real backtest has four ingredients:

A strategy. The signal logic, the entry/exit rules, the sizing.
Historical prices for the markets the strategy would have traded.
A model of transaction costs — fees, slippage, latency penalties.
A ground-truth outcome for every market traded (did YES or NO win?).

For Polymarket, ingredients 1 and 4 are entirely yours. Ingredient 2 is partially free via the CLOB API. Ingredient 3 is the part most amateur backtests get wrong, and it is the part that produces the wide gap between backtest results and live results.

The Free Option: CLOB Price-History API

Polymarket's CLOB exposes a historical price endpoint:

GET https://clob.polymarket.com/prices-history?market={token_id}&interval=max&fidelity={N}

{token_id} is the specific YES or NO token you want to price. {interval} controls how far back you can go; max gives you the entire history of the market. {fidelity} is the resampling granularity in seconds — 5 gives you 5-second bars, 60 gives you 1-minute bars, etc.

What you get back is a time series of mid-prices: timestamps and the corresponding midpoint between bid and ask at each sample. No volume, no orderbook depth, no fill records.

This is enough to do basic strategy backtesting. You walk the time series forward, evaluate your signal at each timestamp, and record what your strategy would have decided. For each "trade" your strategy generates, you compare your entry price to the market's eventual resolution and compute P&L.

For our own bots — backtest_moneyline_wp.py, backtest_cs2_combined.py, backtest_soccer_wp.py, backtest_tennis_wp.py, backtest_spread_total_wp.py — this is the primary data source. Each pulls price history with fidelity=5 and replays strategy logic against it.

What the Free Data Cannot Tell You

The fundamental limit of mid-price-only backtesting: you do not know how your order would have executed.

A mid-price of 55c does not mean you could have bought at 55c. The orderbook at that moment might have been 53c bid / 57c ask — a 4-cent spread, half of which you would have eaten as a taker. Or it might have been 54.5c / 55.5c — a 1-cent spread costing you essentially nothing. The midpoint is the same in both cases; your actual fill cost is very different.

If your strategy depends on small edges (anything under 5 cents of expected value per trade), the spread-cost uncertainty can flip your backtested P&L from profitable to unprofitable. This is the single biggest source of backtest-vs-live divergence for serious quant strategies.

A second problem: book depth. Even if the inside spread is tight, depth might be thin. Your hypothetical 500-share order at 55c might fill 50 shares at 55c and then move the price to 58c for the next 450 shares. A backtest that assumes 500 shares at 55c is overstating your edge by a meaningful amount.

There are three ways to handle this gap, in increasing order of fidelity.

Approach 1: Assume Conservative Slippage

The cheapest approach: add a fixed slippage cost to every trade in your backtest. Common values:

1c per trade on liquid markets (top-volume NFL/NBA games, major political markets)
2-3c per trade on mid-volume markets (most regular-season MLB, NHL, NCAAMB games)
4-6c per trade on thin markets (esports, less-watched soccer matches, niche politics)

Our backtest scripts use a SLIPPAGE_C constant (typically 1c, with sport-specific overrides where book thinness justifies more) and add a latency penalty (0.5-2c) on top. The total transaction cost per trade is taker_fee_c + slippage_c + latency_penalty_c. We covered this formula in our Polymarket fees explained post.

Pros: Simple, computationally cheap, no extra data required.

Cons: It is an average, not a fill-level truth. Some trades would have paid 0 cents of slippage (you got filled at the bid); others would have paid 10 cents (you crossed a thin book). The average is right in aggregate but wrong on individual trades. For strategies where individual trade sizing matters, this is too coarse.

This is the right approach for early-stage strategy research where you are answering "does this idea have any edge at all?" — the slippage approximation will not change the directional answer.

Approach 2: Estimate Slippage From Price-Move Patterns

A more sophisticated approach: use the price history itself to estimate what slippage would have been. The intuition is that if a market moves a lot in the seconds after a trade, the book was probably thin; if it stays steady, the book was probably deep.

You can build a per-market slippage estimator by looking at standard deviation of mid-price changes over short windows. Markets with high local volatility get higher assumed slippage; markets with low local volatility get lower. This is approximate but better than a flat constant.

Pros: No extra data cost. Captures some per-market variation.

Cons: Still an estimator. You are inferring book depth from price stability, which is correlated but not equivalent. Misses fast-moving markets where the price stayed apparently steady because both sides had deep liquidity at the same price.

This is the right approach for mid-stage strategy refinement — you have an idea you believe has edge and you want a better estimate of how that edge survives execution.

Approach 3: Use Actual Historical Orderbook Depth

The honest answer for serious quant work: you need historical bid/ask depth, not just mid-prices. Polymarket does not expose this through any free API. If you want it, you have to either capture it yourself going forward (months of WebSocket logging) or buy a dataset that captured it for you.

We captured one. The Polymarket & Kalshi Orderbook Archive contains 136M tick-level snapshots — 25.7M from Polymarket and 110.5M from Kalshi — across 30+ continuous days, with bid/ask/spread/volume captured at each tick and time-synced with live game state. Polymarket has no historical orderbook API, so this is the only commercial source for backtest-ready depth data across both venues. Launch price is $150 (first 50 buyers), reverting to $499.

We use this internally for the questions free price-history data cannot answer:

What did the order book actually look like at the moment a strategy generated a signal?
How would a 500-share order have filled — at one price, or walked across multiple levels?
What was realized slippage in the trade window relative to what mid-price alone would have estimated?
How does the answer differ between liquid markets (NFL games) and thin ones (niche soccer leagues)?

If you are still at the "does this idea have any edge" stage, approach 1 or 2 is enough. If the strategy is past that and you are about to commit real capital, the depth data is the missing piece that closes the gap between backtest P&L and live P&L.

Choosing Your Strategy's Ground Truth

The fourth ingredient — outcome data — is straightforward for Polymarket: every resolved market has a definitive YES or NO outcome from the UMA oracle, accessible through the market metadata.

The harder version of this question for sports strategies: what is the fair probability the outcome happens, independent of what the market thinks? This is what our calibrated win probability API is built to provide — across 11 sports and tens of thousands of historical games, we publish post-calibration fair probabilities that you can use as the truth column in a backtest.

If your strategy is "buy when fair prob > market price + N cents," you need both: the market price (from Polymarket data) and a fair probability source (from a model). Backtests of trading strategies without a fair-probability anchor are really just market-following strategies, which is a very different thing.

A Worked Backtest Outline

A skeleton of how our Polymarket backtests are structured:

Pull resolved markets for the sport and date range you want to test. Get token IDs, settlement outcomes, game start/end times.
For each market, pull /prices-history?fidelity=5 — 5-second bars of mid-price from game start to settlement.
Pull the corresponding fair probability series from your model (or our API) — one estimate per timestep, aligned to the price series.
Walk the joint series forward. At each timestep, compute edge = fair_prob - market_price. If edge crosses your threshold and other entry criteria are met, record a trade with timestamp, entry price, fair probability, and edge.
Apply transaction costs to each trade entry: effective_entry = market_price + slippage_c + taker_fee_c + latency_penalty_c. Hold to settlement.
Compute P&L at settlement: pnl = ($1 if won else $0) - effective_entry. Aggregate across all trades for win rate, ROI, edge realized.
Compare to expected: was your average realized edge close to your average expected edge? If not, your slippage model is off and your live P&L will differ from your backtest by that gap.

A backtest that reports +20c per trade with no slippage model is not the same as a backtest that reports +8c per trade with realistic slippage. The second is closer to live reality and is the number to make deployment decisions against.

What to Watch For in Your Own Backtest

Common pitfalls when backtesting Polymarket strategies:

Survivorship bias. Only including markets that resolved cleanly skips ones that got disputed or rescheduled, which often have specific characteristics.
Look-ahead leakage. Make sure your "fair probability" at time T was actually computable using only data available at time T. It is easy to accidentally use post-game stats to score a pre-game trade.
Ignoring trading hours. Some markets have thin overnight liquidity. A backtest that lets your strategy "trade" at 3am may not be executable in practice.
Single-market overfit. If your strategy works only on Lakers-Celtics games it does not work, it has just curve-fit to that pairing.
Backtest-to-live edge degradation. Realized live edge is typically smaller than backtested edge once you factor in execution costs your model did not see — for our own deployments, a backtested 8c edge has often translated into needing a 15-20c live edge gate to actually realize the strategy in production. Strategies that look great in backtest and terrible live are almost always slippage-model failures, not strategy-logic failures.

Bottom Line

For research-stage Polymarket strategies, the free /prices-history API plus a conservative slippage model is enough to know whether your idea has edge in principle. For production-stage strategies where the slippage assumption matters to the deployment decision, you need actual orderbook depth, which Polymarket does not expose historically — you capture it yourself going forward or buy a dataset that did.

The most common failure mode we see in other people's Polymarket backtests is over-confident slippage assumptions: assuming 1c per trade when the actual market would have charged 3-5c on thin books. The backtest looks great, the live deployment loses money, and the trader blames the strategy when the real problem was the cost model.

If you want to skip that failure mode, the Polymarket & Kalshi Orderbook Archive is what we use internally and what we sell publicly. 136M snapshots, score-synced, 30+ days, $150 founding-buyer launch price.

Related deeper reads: - Polymarket Fees Explained — the cost side of every trade. - Polymarket Paper Trading — when to backtest vs shadow trade vs live test. - Polymarket API Documentation — the API surfaces that produce backtest data. - How We Profit on Polymarket — backtest-to-live realization on real trades.