How to Get Historical Polymarket Order Book Data

If you're building or backtesting a Polymarket strategy, you eventually hit the same wall: you can get the current order book, and you can get trade history, but you cannot get the historical order book — the full depth, over time, as each market moved.

This guide explains exactly why that data is missing from the public API, why it's harder to get than it looks, and what your actual options are.

What the Polymarket API actually gives you

Polymarket's CLOB API is genuinely good for live trading. What it returns:

The current order book — GET /book gives you the live bid/ask ladder right now.
Trade history — the prints that executed, with price and size.
Price history — time series of the last/mid price (effectively OHLC-style candles).

What it does not return:

The order book as it was at 7:14 PM in the third quarter — the depth, the sizes at each level, the spread — at an arbitrary past moment.

This is the same gap on Kalshi: the API exposes current depth and trades, not a historical depth archive. (We cover the broader landscape in our complete guide to prediction market APIs and the practical Polymarket API guide.)

Why you can't just backfill it

Here's the part that trips people up: historical order book data is not reconstructable retroactively. You cannot write a script today that fetches what the book looked like last month, because the book is ephemeral — it only exists in the moment, and neither venue stores and serves the full historical depth.

A trade tape can sometimes be reassembled after the fact. A full-depth order book over time cannot — if nobody recorded the snapshots as they happened, that information is simply gone. So the only way to have January's order book is to have been capturing it, snapshot by snapshot, in January.

That single fact defines your options.

Your three options

1. Record it yourself, going forward. Stand up a websocket client against both venues, snapshot the book on every update, and store it. This works — it's exactly how the data gets created — but it has two costs: you get zero history before the day you start, and capturing clean, gap-free, multi-venue depth at scale (with reconnection handling and clock discipline) is more engineering than it first appears.

2. Scrape last price / OHLC. Easy, but it throws away the depth. If you're modeling fills, slippage, or liquidity, last-price candles will quietly lie to you — you'll assume you got filled at a price that had no size behind it.

3. Buy a recorded archive. If someone already captured the depth, you can skip the year-long wait and the capture engineering. The thing to check is what kind of archive it is.

What separates a good archive from a price dump

Most "Polymarket tick data" products sell you prices. For research that holds up, you want three properties most of them lack:

Full depth, not OHLC. Bid/ask sizes across levels, so you can model real fills and slippage instead of assuming you hit the last print.
Both venues on one timeline. Polymarket and Kalshi side-by-side lets you study cross-venue price discovery — which book moves first after new information.
Score-synced (for sports). Every snapshot joined to the live game state — score, period, game clock — at the moment of the tick. This is the difference between "the price was 0.78" and "the price was 0.78 with 4:10 left in the second quarter, up 13–7." Only the second lets you measure where the market lagged your model.

That last one matters more than it sounds. If you run a win-probability model, the entire value of historical data is comparing your fair value to the market's price at the same instant of game state. Without the score-join, you're guessing at alignment.

A concrete example

Here's one row from a real captured NFL game (Cardinals at Bengals), straight from the sample data:

datetime_utc                    home away score period clock yes_bid yes_ask mid
2025-12-28 18:51:29+00:00       CIN  ARI  13   7      2      10:05  0.78    0.80    0.79

Every row carries the order book and the game state. Load the sample in DuckDB and watch the book reprice as the score changes:

SELECT period, time_remaining, home_score, away_score, yes_bid, yes_ask, mid
FROM 'orderbook_archive_sample.csv'
ORDER BY datetime_utc;

You can see the spread tighten and the mid jump in the seconds after a score — exactly the window where a fast model finds edge.

Where to get it

We capture tick-by-tick order book depth from both Polymarket and Kalshi, score-synced, and sell it as ready-to-query Parquet:

Polymarket + Kalshi order book archive — full L2 depth, both venues, score-synced. The flagship.
Kalshi historical order book & tick data — depth, open interest and burst snapshots for Kalshi sports markets.
Cross-venue top-of-book archive — best bid/ask across both venues on one timeline.
Prediction-market microstructure dataset — tick-level depth for price-discovery and liquidity research.

Before you buy anything, look at the real sample rows — load them, check the schema, confirm the score-join is there. Or browse all the data archives.

The data was recorded live, as the games happened. It is the one thing you can't go back and get — which is exactly why it's worth having.