Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Time Spread Backtesting 2022 Q1 (Part 1)

Ironically, while developing a Python backtester over the last few months (e.g. here, here, and here), I have completely gotten away from time spread backtesting. Today, I will revisit the manual backtesting realm by looking at time spreads in the first three months of 2022.

As seen in previous posts on the subject (e.g. here, here, and here), time spreads may be approached in a variety of ways. In the current mini-series, I will address a number of different details and tweaks. Rather than get confused, distracted, and drawn off course by manually backtesting one at a time, my ultimate hope for the Python backtester is to be able to algorithmically run through a large sample size of each variant and compare pros versus cons.

For now, my base strategy is as follows:

With SPX at 4799, the first trade begins on 1/4/22 at the 4800 strike for $6,688: TD 20, IV 10.6%, horizontal skew -1.1%, NPV 291, and theta 15.6.

The very next trading day, a 2.57 SD move down brings us to the first adjustment point with PnL -7%:

time spread backtesting 2022 Q1 image 1 (6-20-22)

A 2.10 SD move down eight days later brings us to the second adjustment point with PnL -15%:

time spread backtesting 2022 Q1 image 2 (6-20-22)

Max loss is hit five days later on a 2.44 SD move lower:

time spread backtesting 2022 Q1 image 3 (6-20-22)

SPX cratering 2.31 SD in 14 days has resulted in a loss of 20.4%. Tough to overcome that! Although horizontal skew increases, it still remains negative while IV spikes ~90%. This suggests IV increase as a partial hedge in this trade.

I will continue next time.

Time Spread Backtesting Concepts (Part 1)

I previously introduced my recent manual backtesting of time spreads that I hope to automate going forward in Python. Let’s back up a bit and discuss some general concepts pertaining to time spreads themselves.

A time spread is composed of a short option in one expiration and a long option at the same strike price in a more distant expiration. I proclaim the primary profit mechanism is differential time decay with short option decaying faster than the long.

Time spreads have delta risk, vega risk, and term-structure risk.

Delta risk refers to the positional tendency to maximize profit over time as the underlying market trades toward the strike price. The [roughly] triangular risk graph illustrates this. At expiration, profit is greatest at the strike price. Above and below, expiration profit wanes until it becomes zero or negative with the market trading sufficiently far beyond the strike.

For a directional time spread placed with current market significantly above or below the strike price, delta risk refers to loss accumulation as the underlying moves farther from the strike or as the underlying remains sufficiently far from the strike as short-option expiration approaches.

Time spreads placed ATM are sometimes referred to as nondirectional. The position can [eventually] profit regardless of whether the underlying moves up or down as long as it remains within a limited range. The ATM time spread still has delta risk because enough market movement away from the strike price either today or by expiration will drive PnL negative.

Vega risk—tendency of a position to lose money due to IV change—exists because the long option, which is farther from expiration, has more extrinsic value than the short. Time spreads are therefore vega positive: as IV increases, the long leg will increase more in value than the short leg will decrease. As IV decreases, the time spread loses value and the expiration curve, which always indicates PnL at short-option expiration, moves down. This is true assuming all else remains the same.

Term structure—the difference in IV between expiration months—often does not remain the same.

Term-structure risk for time spreads refers to an increase in horizontal skew. Horizontal skew is short-option IV minus long-option IV. Horizontal skew is generally negative, which makes positive horizontal skew advantageous for entry.* Should skew revert lower, the short option can lose more extrinsic value than the long, which biases the spread toward profitability. Once a time spread has already been opened, however, increased horizontal skew biases the spread toward loss because short-option extrinsic value increases more than long.

I will continue next time.

*—Always question why horizontal skew is positive as part of your trade assessment.

Time Spread Backtesting in Python (Part 1)

Recently, I did a blog mini-series manually backtesting the COVID-19 crash with time spreads. I left off suggesting backtesting of a slightly rather than extremely bullish time spread as more representative of the long-term market.

One benefit to manual backtesting is a closer look at the day-by-day PnL of each and every trade. Seeing this is a closer approximation to live trading than only seeing the trade result.

Ultimately though, Python is where I want to be for automation, for efficiency, and for all of my backtesting needs. Manual backtesting takes much longer. It also requires me to go back and do many other calculations whenever I want a slightly different look at the data, whenever I want to calculate additional trade statistics, etc.

For me as a beginner, Python has a big temporal cost. That won’t change unless I work consistently to improve my skills.

I want to try and organize my thoughts about this backtesting and maybe come up with a flow chart before I actually try writing any code. I’ve been advised this can help prevent me from getting stranded in the weeds spending lots of time ironing out bugs that aren’t all that important.

This might be a decent time to start transitioning to Python since I’ve been backtesting in ONE and have a fresh sense of what the process entails.

Part of the process I won’t have with Python includes risk-graph management, which is prominently displayed in ONE. Without programming Black-Scholes (way too difficult for my current proficiency), I won’t have any ability to model the trade in Python. I therefore won’t see a profit tent, day steps, or PnL breakevens. I can’t reject the possibility this affects my [manually backtested] Practice Trades, but no risk graph information is called by trade guidelines so hopefully this isn’t an issue.

The .csv data file includes greeks, which will allow for some common techniques of trade management.

To start, I need to think about what I want the program to output and what it will take as input. Here is a snippet of the option data file that gets purchased as .csv archives:

Option data file snippet (12-30-21)

The actual data file has more columns. For time spreads, I will need all greeks but gamma along with IV.

The spreadsheet column headers I have been using in recent manual backtesting provide a starting point for the kind of data I’m looking to collect. Reading down:

Date
Trade #
ONE Trade #
DTE
SPX
MR
Theta
TD
IV
Horiz Skew %
NPD
NPV
MDD %
DTE
Max MR
# Adj
Exit DTE
PnL
ROI
SD Chg
IV Chg
DIT
Horiz Skew %
TD
Comments

I will continue next time.

Calendar Backtesting the COVID-19 Crash (Part 7)

I have been discussing impressions from calendar trade backtesting the 2020 COVID-19 crash.

One loose end from Part 6 was directional bias in my second backtest (top graph):

Directional bias check (12-27-21)

I calculated strike price minus SPX price (“Diff”) for 46 cases of trade entry or adjustments. In the top row, positive (negative) numbers indicate a bullish (bearish) bias. 3.41 points represents a mean bullish bias of 0.1% based on the mean SPX price 3654: hardly anything at all. Standard deviation for positive/negative differences ranges from 7-12 points, which suggests 3.41 will not be statistically significant.

My backtesting guidelines for the bullish single calendar are as follows:

Here are the ONE results:

Single bullish calendar ONE results (12-27-21)

Profit factor is 3.9 with 88% winning trades. This is really too good to be true, and you know what that means…

Directional bias here is ridiculous. The market won’t always be rallying like it did from Mar 2020 through Oct 2021:

Directional bias check bullish single only (12-27-21)

Mean bias is 3% bullish with the average DIT 32. This implies an average annual SPX return of 34% when over decades this is said to be closer to 9-10%. For the longer term, I can see leaning 1% bullish. Perhaps that is my next backtest to run.

This strategy is dangerous because it lacks downside management. Trades beginning on 7/26/19 and 2/21/20 have maximum drawdowns of 90% and 84%, respectively. Being so close to -100%, this does not leave much room for a stop and it can destroy account equity (based on three losing trades, which is a very small sample size). The risk graph shows when leaning this bullish, losses can be hit even with the market slightly down:

Single calendar risk graph (12-27-21)

I’m also including this graph to get additional perspective on why DCs are not as advertised. Note how this tent peaks around +191%. With a DC in place, even if one of the two calendars suffers huge losses (e.g. 50-100%), the potential for +191% is enough to offset that. Unfortunately, +191% is not available 32 days into the trade. At that point the highlighted line, which peaks at +27%, is a more realistic estimate (assuming no IV changes). If one calendar is up 27% and the other is down 54% then we will still have major losses to deal with. Holding until 14 DTE gets the highlighted line closer to +68%, which still falls short if the second calendar is facing drawdowns akin to those mentioned two paragraphs above.

As option traders, we would probably benefit by hiding the expiration curve entirely since it’s usually too good to be true.

Calendar Backtesting the COVID-19 Crash (Part 6)

Today I continue discussing my impressions from calendar trade backtesting the 2020 COVID-19 crash.

I left off showing results for a single-only calendar strategy, which is a contrast to guidelines from Part 1. When I said these results are not encouraging, I was comparing with backtest #2 (summarized in Part 3) whose ONE results look like this:

2020 COVID-19 crash Cal backtest No 2 (12-21-21)

PF is 2.2 and average win is 2.5x average loss. Net profit is much better. I would certainly consider trading this. The DC allows me to hold a losing calendar longer because it also includes one that is winning.

I’m still not totally sold on DCs. Perhaps single calendar results can be improved if I hold a losing trade longer.

This is really suggesting a directional bet. Holding a losing calendar longer means dropping the TD > 4 criterion. This usually happens when the market is above or below the calendar strike, which implies a bearish or bullish bet, respectively.

Given directional choice, I would certainly bet bullish. I mentioned the upward bias to equities in the fourth paragraph here, in the last paragraph here, and in the tenth paragraph here.

Before I decide how to implement this, I want to consider how even the ONE results shown above might be improved. Number of losers still exceeds number of winners, which is not ideal. Also, profitability is concentrated between Apr 2020 through Feb 2021 (10 months). The rest is equity curve chop, which makes me question consistency. A larger sample size of years leading up to 2020 would broaden the context.

I should check to see what directional bias I already have in the trade. I did not intend any, but starting each trade at the nearest 25-point strike implies a tiny bit. Number of times I leaned bearish and bullish should roughly balance out.

I will continue next time with backtesting a bullish calendar trade.

Incidentally, I clicked [the lower box] to see the following ONE results:

Single cal 2020-1 backtest results without TF (12-21-21)

Compare this to the graph shown above. This also has a PF 2.2 and number of winners exceeds number of losers. This appears to be a more consistent strategy.

Unfortunately, these results are a mistake. I accidentally turned off commissions, which eliminates transaction fees. This is fantasyland and we have known this for a long time (see third-to-last paragraph here and fourth-to-last paragraph here).

If I significantly overestimated slippage, then the single-only calendar strategy with no directional bias might be worth doing. This would then be another instance like that described in the second paragraph here. Any backtest where I stack the odds against and performance still comes shining undeniably through is a strategy with which I really want to go live.

Calendar Backtesting the COVID-19 Crash (Part 5)

Today I continue discussing my impressions from calendar trade backtesting the 2020 COVID-19 crash.

The bigger point with which I left off is that a cursory check (at least) to authenticate a backtest should always be done. No backtest is Truth and live trading is often not exactly per guidelines. Some traders are more discretionary than others and some couldn’t replicate a set of backtested trades even if they really wanted to. Nevertheless, in a backtest like this or this with a limited number of trades and a short time horizon, one mistake can materially affect results.

I’m not sold on double calendars (DCs) because overall, they don’t seem to grow profit as quickly as singles. This is subjective and anecdotal, but I’m going to try and put some reasoning behind it.

DCs are more risky than single calendars for two reasons. First as mentioned in Part 4, being established through adjustment implies less time to recoup transaction fees. Second, being established later means more gamma risk where substantial market moves can result in big losses. DCs need not be riskier from a margin standpoint, but the greater gamma risk is pervasive and insidious. This matters in the face of large market moves, which are a reality in trading.

To control gamma risk, I could implement a minimum DTE limit beyond which only closing contracts and re-entering farther out in time would be permitted. Theoretical determination of this optimal drop-dead date would be very challenging.

Another possibility would be to allow just single calendars. When an adjustment point is hit, exit and enter anew with at least 56 DTE. Here are the OptionNet Explorer (ONE) results of such a single calendar backtest:

Single cal 2020-1 backtest results with TF (12-21-21)

This is not encouraging. Profit factor (PF) is only 1.2 and number of losers exceed number of winners.

In running this backtest, I felt the market often moves around too much for a single calendar to contain it. This led to more frequent whipsaws than backtest #2 (Part 3) where DCs are permitted.

In running this backtest, I also became very cognizant of how quickly a single can lose money when the market moves away from the strike. DCs looks appealing with the wide profit tent, but one calendar is always losing money as the market trends. This may explain why profit seems to develop slowly. Day steps would do a better job at revealing extent to which the expiration curve is mere mirage. This is usually the case with my option trading, but perhaps it’s more pronounced here.

I will continue next time.

Calendar Backtesting the COVID-19 Crash (Part 4)

I left off asking about a subtlety that would probably be identified by only serious option traders. Submitted for your approval is a question about why one backtested calendar is ~30% more expensive than all the others. Were you able to explain it?

The nuance here is that calendars generally get more expensive when purchased closer to expiration. To lower the cost, I would generally go farther from expiration, but that is not the case with the Sep/Oct calendar I mentioned in the second-to-last paragraph of Part 3. The calendar I backtested was an Oct 3225 put priced at 24.09% IV and a Nov 3225 put priced at 26.24% IV. Is that normal contango? I’d have to survey a large sample size to find out but my guess is no.

I suspect two factors have contributed to make this calendar atypically expensive. First, it is 35 days wide. This happens four times per year [(4 * 5 weeks in between mos) + (8 * 4 weeks in between mos) = 32 + 20 = 52 weeks for the full 12 mos] and it means paying for one extra week of extrinsic value in the long leg.

Second, and what I believe to be of greater importance, is the 2020 presidential election. I think the Nov options are priced rich due to election uncertainty.* On 7/31/20, IV for Nov and Dec are 24.55% and 24.66%, respectively (beyond which term structure decreases). Purchasing the Nov/Dec 2020 112/140 (DTE) calendar would cost $4,650: much less than the $8,840 seen here. The horizontal skew on the Nov/Dec would be -0.19, which is much lower in magnitude than -2.15 for the backtested Oct/Nov. Term structure rises steeply through November and then starts to trail off. This is unusual.

Incidentally, I’m puzzled why I exited this adjusted DC at 43 DTE when a third adjustment point was hit on 9/3/20 to open a single Nov (78 DTE)/Dec calendar. I could have rolled to ATM, as the presenter’s guidelines suggest (green):

Proposed roll for 11940 TD 7 (1) (12-16-21)

I could have also gone an extra 100 points OTM (green):

Proposed roll for 11730 TD 17 (2) (12-16-21)

Theta is comparable for both and cost is slightly less for the latter. TD is even higher for the latter (17 vs. 7). Either of them look doable. Every time I adjust a calendar, PnL gets dragged down a bit by transaction fees. Since it usually takes a few days of positive theta to recoup that, maybe I should never adjust too close to my drop-dead 21 DTE. That is no excuse here with 43 DTE and plenty of time left.

I will continue next time.

*—As an exercise for another post, I should plot this term structure in Python. An additional
     detail would be to scale the x-axis properly since data points start one month apart and
     later increase to points that are three months apart.

Calendar Backtesting the COVID-19 Crash (Part 3)

Today I continue discussing calendar trades, which I recently revisited after viewing an online presentation on performance during the COVID-19 2020 crash.

I left off explaining that my second backtest was an attempt to be well-defined compared to the first, especially with regard to margin by implementing a constant number of contracts. No more starting out with one contract, adjusting by adding another, taking it away, etc. Also, no more double calendars with different expiration months, which may be regarded as different numbers of contracts from a theta perspective (e.g. on Jan 3, a Feb/Mar calendar will generally generate more positive theta than an Apr/May).

The second backtest returned $7,681, which seems significantly higher than the +$5,072 from the first backtest. Looking closer, though, $7,681 on a maximum margin of $10,074 is 76.2% ROI, which is only slightly better than 65% for the first backtest. The latter ROI is based on a max margin of $7,775.

With all that effort given to maintaining a constant number of contracts, why is max margin ~30% higher in backtest #2?

Here is the trade list from the second backtest:

Variance in max risk (12-7-21)

Looking down the “MaxAbsRisk” column, the first thing I noticed was the highlighted $10,074. This intuitively seemed to be about 25% more than the rest, but a look across the entire distribution suggests otherwise. Note the low values marked with red. Going one step farther, the standard deviation of these numbers is 1756. For the first backtest, the standard deviation of max risk (recorded after every trade) is 1734. Evidently, consistency across number of contracts (check!) is not the same as consistency across nominal risk (miss!). I could normalize the latter by selecting a position size (e.g. $10,000 or arguably X% of account value) and always trading the number of contracts coming closest to matching it.

The $10,074 trade bothered me enough to take a closer look for understanding. I opened that PCal 28 points OTM on 7/31/20. This appears to be a choice (maybe an attempt to bias the backtest negatively as mentioned in the second paragraph here) to lean bearish because I could have placed it 3 points OTM or 22 points ITM (bullish). Either way, all three of these have similar cost and therefore would not explain the anomaly.

Instead of doing the 77/112 (DTE) Oct/Nov 2020 calendar for $8,840 (initial margin), I could have [violated the guidelines and] opened a 49/77 Sep/Oct 2020 calendar for $7,190. This would have replaced a winning trade with a losing one and instead of making $7,681 on $10,074 (max risk ever seen on any trade) for +76.2% ROI, the strategy would have returned $6,099 on $8204 for +74.3% ROI: pretty much the same.

Did you catch that? Something strange is afoot. If it rattled your bones the first time you read it, then you certainly deserve a free pizza for prowess!

*—I should plot this distribution in Python as an exercise for another post.

Calendar Backtesting the COVID-19 Crash (Part 2)

I recently viewed an online presentation about a monthly calendar, which motivated me to revisit the strategy.

As mentioned in Part 1, I backtested a second time because my results fell short of the advertised +100% return. As also discussed, I wasn’t overly concerned about this because he did not offer details about his trades or his calculations.

Although I didn’t consider it after completing the first backtest, I probably overestimated slippage with $21/contract. Factoring in too much slippage can result in backtested returns falling short of live-trading counterparts.* I would be curious to know what slippage the presenter realized on his trades. I have a feeling very few actually track this. I do it by noting difference between my limit and midprice. Once my order is executed, I look immediately at the mark to see how far off I got filled.

I did give serious consideration to the fluctuating number of contracts that is typical of rolling calendar trading strategies [beyond the current scope is an entire debate awaiting to be had over this approach]. If I start with one calendar and adjust by adding a second, then I have roughly doubled margin. Trading strategies generally need consistency to be successful. Cruising along winning 1, 1, 1 on single calendars only to suddenly hit turbulence and lose 4 on a[n] [adjusted] calendar with double the margin can be very painful (see third-to-last paragraph here).

In my initial backtest, much of the profitability took place with one contract in place rather than two. Calendars make consistent profits when the market trades sideways. A cursory scan through the backtest shows 288 trading days with one contract and 163 trading days with two. As a weighted average, I was trading 1.36 contracts per day, which is 68% of the full 2-contract position size. I would expect a lower return with a smaller position size because maximum risk (denominator), is equal. Maximum risk relates to the largest position size ever seen whether it happens once, 36% of the time, or always.

In the subsequent backtest, I implemented two contracts at all times. I started with two and rolled one of two when I had to adjust. I paid much closer attention to when trades started and when they ended. Rather than ever having calendars with different expiration months on at the same time (e.g. Jan/Feb and Feb/Mar), I closed the entire position and started a new trade with the short leg around 60 DTE.

I will continue next time.

*—So many other things can also affect the comparison including strike prices, DTE of
     individual trades, timing of adjustments, etc. It’s also worth noting that difference in
     the headline return may not be statistically significant (see bottom of this post).

Calendar Backtesting the COVID-19 Crash (Part 1)

I recently saw a presentation extolling the virtues of a monthly calendar trade. I have backtested calendars in the past without convincing results. After seeing this presentation, I decided to give another look.

The presenter claimed to be a full-time trader for several years. He said he has now traded calendars for at least two years. He traded through the COVID-19 crash and did well. This gives him the confidence to trade calendars in any environment because March 2020 was “the worst thing to happen to the market since 9/11.”

These are his official guidelines on trade setup and management:

My initial backtest guidelines were as follows:

From 12/23/2019 – 10/29/2021, this returned $5,072 on maximum risk $7,775: +65.2%. Over the period, SPX ↑ 42.4% from 3225 to 4594. Max drawdown was -47.4% (-$3,684) on 3/23/20 compared to -31.7% (2203) for SPX on the same date.

The calendar strategy performed comparably (risk/return) to the underlying index, but the presenter said he returned +100% in 2020 alone. Did I do something wrong?

I should never get overly concerned about discrepant performance statistics when I do not exact details about presenter calculations or accuracy. He may have rounded up to 100%—to the nearest 1%, 10%, or 100%!

While I don’t know these details about the presenter, a more detailed look at my methodology reveals:

  1. I managed only by TD—not minding whether trade remained under the tent.
  2. I usually closed at 21 DTE and adjusted as late as 24 DTE.
  3. I did not implement a profit target.
  4. I opened trades near 60 DTE going as short as 56 DTE.
  5. I used only 25-point strikes.
  6. I conducted one long campaign rather than grouping entries and exits into separate trades.

When others present results, I seldom find them forthcoming with a comprehensive, accurate account of methodology enabling me to replicate/verify the numbers. Changing any of these details can affect performance calculations. Live trading results are even worse because some degree of discretion is almost always present. A list of trades can provide opportunity for verification, but this was omitted as well.

I will continue next time.