Open notebook · Options research · In progress

IV Crush, Quantified.

A working paper on the implied-volatility crush trade. The setup is simple. The conventional wisdom is loud. Whether the trade actually pays after costs is a real question, and I'm not the first person to find that the loud version isn't quite right.

Brad O'Haire Status: methodology firm · 2 of 5 cycles measured Stack: Python · CBOE chains · vectorized backtest

Why this work

Most public IV-crush write-ups quote a P&L number that excludes the things that kill the trade.

The pitch is clean: implied vol spikes into earnings, drops the morning after, sell the spike, collect the difference. The pitch is also incomplete. It almost never accounts for the move in the underlying eating the credit, transaction costs on multi-leg structures, or the asymmetry between the days IV crushes and the days the stock prints a 12% surprise. This notebook is the version of that backtest where I refuse to skip those parts.

Craft on display

Vectorized backtesting
Options chain parsing
Variance risk premium
Transaction cost modeling
Position sizing
Bias auditing
Scientific writing
Pre-registration

Working summary

The trade exists. Implied vol does drop sharply after earnings. The variance risk premium around earnings is real and well-documented in the academic literature.
The retail version of the trade is overstated. Most public write-ups quote gross P&L on a single ticker over a handful of cycles. That isn't a strategy, it's a story.
What I'm measuring. Five earnings cycles across the S&P 100, defined-risk structures only, transaction costs separated as a line item, with the underlying move accounted for honestly.
What's measured so far. Cycles 1 and 2 (Q3 and Q4 2025). Edge after costs is positive but smaller than the gross IV drop suggests. The headline win rate hides a meaningful left tail.
What's not measured yet. Cycles 3 through 5. Position-sizing sensitivity. Comparison to a naive short-straddle baseline. The full results post will replace this section once those land.

01 / The trade I'm trying to falsifyThe pitch, and what it leaves out

The IV-crush trade in its retail form is one of the most repeated setups on options Twitter. The structure is usually some flavor of short premium going into an earnings announcement: short straddle, short strangle, iron condor, or a calendar spread. The thesis is that implied volatility is overpriced into the event because everyone wants to hedge, and then collapses the next morning once uncertainty resolves. Sell the overpriced premium, buy it back cheaper, pocket the difference.

The thesis is not wrong. The variance risk premium around scheduled earnings has been documented in finance literature since the late nineties. The problem with the retail version is everything around the thesis. Three things go missing in almost every public backtest I've seen:

One. The move in the underlying. IV can crush 40% and you can still lose money on a short straddle if the stock moves more than the breakeven implied by the premium you sold. The whole point of the IV being elevated is that the market is pricing in a real possibility of a big move. Sometimes that move shows up.

Two. Transaction costs on multi-leg structures. A short iron condor on a $200 stock typically pays a credit of $0.80 to $1.40 per share. Spread cost on each leg can eat 15 to 30% of that gross credit on a single round-trip. If your backtest doesn't separate that out, you're publishing a number that won't replicate.

Three. The asymmetry of the loss distribution. Most cycles make a small amount of money. A small number of cycles make a large amount of money. A very small number of cycles lose multiples of the average win. The headline average obscures this. Sharpe and percentile-based summary stats are required, not optional.

"Edge after costs" and "edge before costs" are different trades. Most retail backtests describe the second and label it the first.

02 / The setupUniverse, structure, and what's in scope

The backtest is intentionally narrow so the result means something. Wider universes are cheap to add later. Sloppy methodology is expensive to clean up.

Universe

S&P 100 constituents that reported earnings during the test window, excluding any with material corporate actions (M&A, splits, special dividends) inside the holding window. This screen is mechanical, not discretionary.

Window

Five consecutive earnings cycles starting Q3 2025. One position per ticker per cycle. Entry at the close of the trading day before earnings. Exit at the close of the first trading day after earnings.

Structure

Short iron condor, defined risk. Strikes selected by delta, not by dollar width: short legs at the 20-delta call and 20-delta put, long legs 5 strikes further out. This standardizes the trade across underlyings of very different prices and vols. The structure is intentionally not a short straddle. Undefined risk on a five-cycle, hundred-name backtest is the kind of mistake that turns a paper trade into a real account event.

Position sizing

Equal-risk, not equal-notional. Each position is sized so that the maximum loss (width of the wing minus the credit received) is the same fixed dollar amount across every position. This is the part most retail backtests get wrong: they size by contracts or by notional, then complain that one bad day on a high-vol name wiped out the whole month.

Cost model

Per-leg commission of $0.50, plus a slippage assumption of half the bid-ask spread on each leg, applied at both entry and exit. That's eight legs of friction per trade (four legs in, four legs out). The cost model is held constant across all cycles so the headline edge number is comparable cycle to cycle.

03 / The math, in codeWhat the backtest actually does

The core of the backtest is small. The complexity is in the data plumbing, not the strategy logic. Here's the position sizing function in skeleton form:

# size each position so max loss == TARGET_RISK_USD
def size_iron_condor(short_call_strike, long_call_strike,
                     short_put_strike, long_put_strike,
                     credit, target_risk_usd):
    # widths are equal by construction (delta-symmetric wings)
    width = short_call_strike - long_call_strike
    max_loss_per_contract = (width - credit) * 100
    if max_loss_per_contract <= 0:
        raise ValueError("credit exceeds width — bad fill or bad data")
    contracts = target_risk_usd // max_loss_per_contract
    return int(contracts)

The cost layer is similarly mechanical. The thing worth pausing on is the realized P&L calculation, because this is where retail backtests usually cheat:

# realized P&L per contract, post-event close
def realized_pnl_per_contract(credit, exit_value,
                              spread_in, spread_out,
                              commission_per_leg=0.50):
    gross = (credit - exit_value) * 100
    slippage = (spread_in + spread_out) * 0.5 * 4 * 100
    commission = commission_per_leg * 8  # 4 legs in + 4 legs out
    return gross - slippage - commission

Two things to call out. First, exit_value is the actual mid-quote of the iron condor at the post-event close, not a model price. This is the line that demands real options-chain data, not just an IV time series. Second, the slippage and commission are separated in the output, so the cycle-level table can show gross edge and net edge side by side.

04 / What's measured so farCycles 1 and 2, preliminary

Two cycles is a small sample. I'm publishing the table anyway because the shape of the result is more useful than the headline average at this stage, and waiting on it would mean publishing nothing for another two quarters. Read these as preliminary.

Cycle	Names traded	Win rate	Gross edge / trade	Cost / trade	Net edge / trade
Q3 2025	71	73%	$58	$22	$36
Q4 2025	68	66%	$41	$23	$18
Q1 2026	—	—	in progress	in progress	in progress
Q2 2026	—	—	queued	queued	queued
Q3 2026	—	—	queued	queued	queued

The shape: net edge is positive and meaningfully smaller than gross edge. Costs are eating between 38% and 56% of the gross number. That's the part the retail story doesn't tell, and it's one of the reasons the trade has the reputation of being either obvious money or obviously broken depending on whose post you read.

The other thing the table hides: in Q4 2025 there were three names where the realized move dwarfed the implied move and the position took the full max loss. The 66% headline win rate is technically correct and meaningfully misleading. The full results post will lead with the loss distribution, not the win rate.

Why I'm publishing partials

If I waited for all five cycles to publish anything, the methodology section above would sit in a private notebook for another six months. The methodology is the part recruiters and other operators actually want to read. The numbers will follow.

05 / What I haven't measured yetThe roadmap

These are next, in order:

Cycles 3 through 5. Q1, Q2, Q3 2026. The full sample is what makes the headline number defensible. Q1 is currently being processed.

Position-sizing sensitivity. The current backtest sizes by max loss. A version that sizes by Kelly fraction (using cycle-1-and-2 win rate and average win/loss) is the obvious comparison. Likely outcome: Kelly is too aggressive given the left tail, but I want the number on paper before saying that.

Naive baseline. Short straddle, equal-risk, same universe and window. The condor is more conservative; I expect lower gross edge and lower variance. The right framing is condor edge as a fraction of straddle edge after risk adjustment.

Tail-event audit. Pull every trade that took max loss and write a one-paragraph note on what happened. If three of five tail events are the same setup (post-AI-bubble guidance miss, say), the strategy has a structural exposure I should price in.

06 / Caveats I want statedThings that could be wrong

Where this could break

Sample size. Five cycles is the floor for a useful read, not the ceiling. A defensible production version of this would want 20+.
Slippage assumption. Half the bid-ask is a reasonable mid-market fill in normal liquidity. Names with thin chains will fill worse. The cost line is probably understated for the bottom-quartile-volume names in the universe.
Survivorship and event filtering. The exclusion screen removes names with corporate actions in the window, which is methodologically correct but introduces a small look-ahead bias on the inclusion side.
Single structure. 20-delta condor is one trade among many. The result doesn't generalize to short straddles, calendars, or wider/narrower wings without re-running the test.
Regime. The five-cycle window is a single vol regime. A real production strategy would want to see how the edge behaves in a 2020-style or a 2022-style environment.
This is research, not a recommendation. Personal capital only. No one should size a position off a five-cycle preliminary read.

Pre-registered open questions

What I committed to test before I started looking at the data, written down so future-me can't move the goalposts.

Is the average net edge positive across all five cycles after costs? Cycles 1 and 2 say yes. Pre-registered fail condition: net edge per trade negative on the full sample.
Is the median trade positive, not just the mean? Mean can be dragged positive by a few outliers. Median is the honest number for a strategy you'd run repeatedly.
What fraction of total P&L comes from the top 10% of trades? If the answer is >80%, this is a tail-driven edge dressed up as a process edge. That's a different strategy with different risk.
How does the edge survive a 2x cost assumption? If the trade only works at retail-friendly cost levels, that's a meaningful caveat for anyone who'd run it at a less-friendly broker.