Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Put Credit Spread Study 1 (Part 4)

Today I will present data obtained from the methodology discussed here.

I started by adding $40 to each trade to represent the lower transaction fee. Going from $0.26/contract to $0.16 represents $10 per leg and the trade has two legs each to open and to close: 4 * $10 = $40.

I then recalculated and identified trades with ROI smaller than the -25% SL. I found 188 trades.

I then identified the original SL dates and looked at the chart to determine if these were bottoms. If so then I was probably looking at a flip. If not then I still had a loser and I would have to retest to see how big the loser would be.

This is when I realized that regardless of proposed alternatives, I would have to retest the 188 trades anyway. The previous step identified 40 trades as flip candidates. While that seemed encouraging, I only had part of the picture.

I proceeded to replace the original values of Exp ROI w/25% SL with 188 retested values. I then recalculated trade statistics.

Here are the results:

RUT PCS 30delta, 40pts width, recalculating results from TF 0.26 to TF 0.16 (8-14-17)

The third column is an approximation. While accounting for the lesser TF, it neither takes into account flips nor new PnL values for trades evading SL the original day only to trigger SL on a subsequent day.

To see the impact of lowering TF, therefore, the second and fourth columns should be compared. Doing so reveals an improvement in most of the statistics. I don’t see any surprises here. Simply adding $40 per trade is $40 / ($4000 – $40) = 1.01% on net margin. The average trade improved by 1.27%, which seems reasonable when flips are taken into account. Average loss remained about the same and 39 fewer trades actually lost with the lowered TF.

I think the moral of the story is that once again, execution makes a big difference. I am tempted to repeat the process for TF $0.06 but I think there may be cases where options priced $5.00 to $15.00 may incur more than nickel slippage. $0.16/contract may therefore be painting a realistic picture.

Another repetitive theme is the temptation to take only those trades that have gone against me by the slippage amount to improve the effective price. Profitable trades from inception throughout would go unable. Would this missed opportunity more than offset the benefit of improved entry price on all the others? That is the critical question.

Backtesting Frustration (Part 8)

Recall that my impetus for resurrecting this “Backtesting Frustration” blog series was the realization that I cannot use quick spreadsheet manipulations and calculations to reprocess 188 backtrades with lower transaction fees (TF). Today I want to go through a sampling of chart action showing different cases of false and real bottoms.

The highlighted candle below is a false bottom from 9/18/2001:

RUT Chart 9-18-01 False Bottom (8-7-17)

The SL would be triggered in subsequent days even if it was not triggered here due to lower TF.

The highlighted candle below is a false bottom from 7/11/2002:

RUT Chart 7-11-02 False Bottom (8-7-17)

The highlighted candle below is a real bottom from 3/24/2004:

RUT Chart 3-24-04 Real Bottom (8-7-17)

Because SL was not triggered two days earlier, this was the last downside move capable of taking the trade out at a loss. Smaller TF (slippage) would allow the position to evade SL and proceed to full profit.

Here is another real bottom from 7/21/2006:

RUT Chart 7-21-06 Real Bottom (8-7-17)

This is a false bottom from 3/1/2007:

RUT Chart 3-1-07 False Bottom (8-7-17)

Because I backtest [once] daily, long wicks (as shown here) represent price extremes that may or may not force trade exit depending on what time intraday (see 5/6/2010 candle, below) they occur.

Here is a real bottom from 11/21/2008:

RUT Chart 11-21-08 Real Bottom (8-7-17)

These are real bottoms from 2/4/2010 and 2/8/2010:

RUT Chart 2-4-10 2-8-10 Real Bottoms (8-7-17)

While the market went a few points lower on 2/8/2010, being close to February expiration allowed accelerated time decay to offset the move. Were this a March position, 2/4/2010 probably would have been a false bottom.

Here is a false bottom from 5/7/2010:

RUT Chart 5-7-10 False Bottom (8-7-17)

This would have been a real bottom for a May position but with the additional month to expiration, the market had time to recover and then fall once again.

Here is a false bottom from 6/10/2011:

RUT Chart 6-10-11 False Bottom (8-7-17)

Here is a false bottom from 11/14/2012:

RUT Chart 11-14-12 False Bottom (8-7-17)

Real bottom from 8/30/2013:

RUT Chart 8-30-13 Real Bottom (8-7-17)

False bottom from 10/9/2014:

RUT Chart 10-9-14 False Bottom (8-7-17)

False bottom from 8/21/2015:

RUT Chart 8-21-15 False Bottom (8-7-17)

Here is a false bottom from 1/20/2016:

RUT Chart 1-20-16 False Bottom (8-7-17)

This is another big wick but a lot can happen with several weeks to expiration.

Finally, here is a real bottom from 6/27/2016:

RUT Chart 6-27-16 Real Bottom (8-7-17)

While spreadsheets are great at managing large volumes of data and allowing us to do computational operations quite efficiently, we also have to be cognizant of what information they do not reveal. Besides outright fraud, I believe oversights like these are a major contributor to falsely optimistic backtesting results. This is a good reason why even advanced traders are best advised to undertake system development with others capable of proofreading the work.

Backtesting Frustration (Part 7)

Today I resume my series on backtesting frustrations by talking about the frustration of “flips.”

I mentioned this in Part 1 with regard to recalculating results with different TF values. A trade that is down 25.5% with $0.26/contract may only be down 24.5% at TF $0.16/contract thereby evading the trigger of SL and ending up profitable. Simply recalculating the results, I thought, would overlook these flips (from loser to winner) altogether.

In taking a closer look at the put credit spread results I see that 188 out of 1,093 trades originally hitting the -25% SL show a loss smaller than -25% with $40/trade added.

I can now proceed in a few different ways: 1. Redo the 188 trades with TF $0.16 ($0.06)/contract; 2. Assume the SL was not hit and use either 7 DTE or Exp PnL values instead; 3. Assume these 188 trades closed for zero gain/loss.

Without doubt, the first option would be most accurate and also the most time-consuming.

I therefore started working with option #2 until I realized a major problem with the assumption. A trade that evades SL on one day may still hit SL on a subsequent day. Being so close to SL, another down day would probably trigger the unprofitable exit. With the market showing recent bearishness this seems quite feasible.

Furthermore, if the subsequent down day is big then the loss might end up being much greater than initially recorded.

Option #3 was intended to be a more conservative form of #2. In [falsely] thinking most of these trades avoiding SL would flip, it occurred to me that the market may not recover enough for full profit to be realized. To be conservative I could just call those zeros. Even a zero is much better than -25% for overall performance.

Hopefully I have made it clear that I can’t assume enough to go with option #2 or #3.

I then considered a fourth option: look at the chart. If the market is bottoming on the day the SL is hit then I can proceed per option #2 or #3 depending on where the market is at 7 DTE or Exp.

Still though, if the “Furthermore” (look up four paragraphs) happens then I may be looking at a much larger loss; just leaving it as before plus $40 would be inaccurate. This would be an argument for redoing all 188 trades. While it may not seem like lower TFs could translate to larger losses, there is a lack of granularity when testing on an EOD basis.

In stepping back and considering the wider perspective, it seems like a chance occurrence whether a flip or larger loss will occur. Unfortunately, I feel I must retest in order to have any possibility of knowing for sure.

Put Credit Spread Study 1 (Part 3)

Last time I presented initial results for the put credit spread (PCS) backtest. Rarely does a trade actually average TF of $0.26/contract, though, so today I will look at smaller values.

Calculated on net MR, here are the results for TF of $0.16/contract:

RUT PCS 30delta, 40pts width, TF 0.16, net MR raw data (7-27-17)

Following are the results for $0.06/contract TF based on net MR:

RUT PCS 30delta, 40pts width, TF 0.06, net MR raw data (7-27-17)

I generally find these numbers more encouraging than the bullish iron butterfly because the latter is not profitable with TF greater than $0.06/contract. The PCS is marginally profitable even with TF $0.26/contract. Reducing TF to $0.06 increases the average PCS trade to 3-5% profit, which is 24-40% annualized.

Unlike a butterfly, the PCS has risk in one direction only. This dramatically increases the probability of profit.

Like a butterfly, magnitude of losses are a problem with the average PCS loss being 2-4x the average win. I thought the 7-DTE exit would cut out the worst losses but it reduces profit as well. The best performing trade seems to be holding to expiration with a 50% SL although I would also seriously consider the 25% SL for risk-adjusted reasons.

Put Credit Spread Study 1 (Part 2)

Today I will start presenting results for my first put credit spread study.

The global disclaimer is to say no winner really exists in the “best performing trade” competition. What is most meaningful to me may be less meaningful to you. This is why alignment between a trade strategy and individual personality is so important. All I can do is explain my interpretation of these numbers. You will have to do the same.

Here are the results for TF = $0.26 using net MR (see last post for explanation of table contents):

RUT PCS 30delta, 40pts width, TF 0.26, net MR raw data (7-27-17)

The first thing I look at is PF followed by risk-adjusted return. Exp barely edges out Exp w/50% SL for PF and vice versa for Avg Trade/SD. I would therefore trade Exp w/50% SL because this gives me a better chance of avoiding the biggest losses. Looking at the SD data gives me pause because I really like seeing drawdowns minimized. Perhaps a better comparison to put this in proper context would be to compare against $4,000 of long shares daily (as done in the last link).

Given the choice between exiting trades at 7 DTE or Exp, the latter seems to outperform. Avg Trade, PF, and Avg Trade/SD all reflect this in the comparisons between rows 2 vs. 3, 4 vs. 5, and 6 vs. 7.

Two things are missing from this rationale, though, with the first being max loss. 7 DTE has smaller max losses than Exp each time. That makes sense with the rapid option decay of the final week. Max loss can significantly limit position size. In thinking about strategies with max loss -100% vs. -50%, I would trade the former much smaller. Do remember, though, that in trading like I backtest my position size would be relatively small simply in virtue of putting a new trade on every single day. This not only gives me a large sample size but it also dilutes drawdowns.

Especially in thinking about this “perpetual scaling” approach, the risk-adjusted return is more important to me than max loss. As mentioned, expiration outperforms 7 DTE every time.

The second missing piece from the performance comparison is trade duration. The expiration trade is seven days longer. Annualized ROI would be one way of factoring this in because the same average trade would have a lower ROI per year if held to expiration than 7 DTE. PnL per day would be even more direct.

Next time I will study the impact of TFs.

Put Credit Spread Study 1 (Part 1)

After less than two months (personal record!) I now have initial data to present for put credit spreads.

The arbitrary parameters are as follows:

     –Sell first strike < 0.30 delta (less fudge factor for OV inconsistency)
     –40-point spread width
     –Exit at 7 DTE or 1 DTE
     –Stop-loss (SL) levels -25% or -50% based on net margin
     –Transaction fee (TF): $0.26/contract

This is a daily backtest using 3:30 PM ET data from 1/2/2001 – 6/21/2017 (4,136 trades). When the OV database was incomplete I went to 3:00 PM or 4:00 PM and/or filled in with theoretical values.

Margin requirements (MR) for credit spreads may be presented as gross or net. Net MR subtracts initial credit received from spread width multiplied by 100. This makes for larger winners and losers on a percentage (ROI) basis compared to gross MR and therefore increases standard deviation. I evaluated trades based on net MR.

Remembering my previous discussion about TFs, I recalculated results for $0.16/contract and $0.06/contract. One further consideration is that some losers near SL cutoffs might become winners (e.g. decreasing TF by $0.10 equates to an improvement of 1% in ROI on gross margin). I did not include the flips in the performance calculations.

Results of the backtest will be presented in forthcoming tables with bold type reflecting the best values (most positive for winners and least negative for losers) for each performance metric (row). Average Trade is mean ROI across all trades. SD is standard deviation. PF is profit factor. SD in the penultimate row of each table is calculated across all trades. The last row (Avg Trade/SD) is a risk-adjusted return.

The performance metrics were calculated for six (columns) exit combinations. 7-DTE ROI reflects trades closed with seven days to expiration. Exp ROI includes trades closed on expiration Thursday (1 DTE). 7-DTE ROI w/25% SL tabulates the first value of MAE to exceed 25% or ROI at 7 DTE if the threshold is never breached. Exp ROI w/25% SL uses Exp ROI if that 25% threshold is never breached. 7-DTE ROI w/50% SL tabulates the first value of MAE to exceed 50% or ROI at 7 DTE if the threshold is never breached. Exp ROI w/50% SL uses Exp ROI if that 50% threshold is never breached.

I will present the tables next time.

Bullish Iron Butterflies (Part 8)

Trading system development is, for me, a learning process and backtesting butterflies has been no different. This post is good background. What I found out last time was a real problem with the concept of width-adjusted MAE.

To be more specific, I do not believe width-adjusted MAE allows for an apples-to-apples comparison across trades. I came up with the “width-adjusted” concept here to correct for the fact that narrow [breakeven] trades seem to hit max loss more often. With regard to MAE, which is related to stop-loss, normalizing for width means the narrowest trades are most diluted in terms of ROI. That is to say the narrower the trade, the more unlikely it is to be stopped out.

To quantify this, I will study the percentage of 20-point BIBFs across both width-adjusted and non-adjusted MAE categories. I previously calculated that 28.6% of all trades were 20-point spreads. In the following table, cells colored red include a proportion of 20-point BIBFs that exceeds 28.6%:

BIBF Proportion of 20-point spreads stratified by width-adjusted MAE and MAE (6-8-17)

Indeed, narrow butterflies are more prevalent in the higher percentages of the non-adjusted MAE distribution while being more prevalent in the lower percentages of the width-adjusted MAE distribution. A stop-loss triggering on width-adjusted PnL would therefore be less likely to stop out narrow BIBFs than if based on non-adjusted PnL. Unfortunately the narrow BIBFs are most in need of a stop-loss.

Part of me thinks this is an absolute mess.

Another part of me thinks this is just a reflection of the level of complexity I’m dealing with here.

From a backtesting perspective, this might be an argument for using constant spread width regardless of underlying price. That would eliminate the need to normalize for width altogether. The question would then be what width to use. Perhaps backtesting 20-, 40-, 60-, and 80-point wide spreads would be sufficient for comparison.

Selecting a constant spread width would once again introduce a new degree of freedom into the equation. This variable would be in addition to exit day (introduced in last post), stop-loss (not yet identified), profit target (arbitrarily selected as 10%), and short strike selection (arbitrarily selected as 2-3% above the money).

Bullish Iron Butterflies (Part 7)

In need of a cure for these beautiful, sunny late-spring mornings? How about looking at maximum adverse excursion (MAE) distributions! Today I will proceed using the approach I described at the end of my last post.

What follows is a histogram of width-adjusted MAE. Total number of trades is plotted for every integer along the x-axis. Zero corresponds to the number of trades with width-adjusted MAE of zero. -1 on the x-axis corresponds to trades with width-adjusted MAE between 0 and -1%, -2 on the x-axis corresponds to trades with width-adjusted MAE smaller than -2% down to -1%, -15 corresponds to trades with width-adjusted MAE smaller than -15% down to -14%, etc.

BIBF Histogram of width-adjusted MAE (6-7-17)

Some of the cumulative percentage numbers are worth noting here. 7.80% of all trades have zero MAE. 54.4% of all trades have MAE smaller than -3%. 88.8% and 99.2% of all trades have MAEs smaller than -10% and -20%, respectively.

The percentage of winning trades in each group can help determine whether MAE distribution may be effectively used to define a stop-loss. A clear argument for a stop-loss threshold would be a PnL value having all winning trades on one side and all losing trades on the other:

BIBF % winning trades by width-adjusted MAE (6-7-17)

What surprised me was the presence of losing trades having such small MAEs (see yellow highlighting). Out of the 319 trades with zero MAE, 319 trades won: no surprise there. Out of the 1062, 495, and 351 trades with MAE smaller than -1%, -2%, and -3%, however, I had eight, six, and 14 losers, respectively. To be down so little during the lifetime of the trade yet not end up hitting the profit target is extremely unusual with the time-decay acceleration taking place into expiration.

A big market move on expiration Thursday could help to explain this. MAE includes PnL numbers from trade inception through 2 DTE while “expiration PnL” is tracked in another column. One reason I backtested this way was to identify big moves occurring late. I have strong suspicion of such a move wherever I have a maximum favorable excursion (MFE) occurring with < 7 DTE followed by a losing trade at expiration.

All of this is important because large losing trades in the face of small MAEs diminish the potential benefit of a stop-loss. One way to prevent this might be to exit all trades at 7 DTE and avoid expiration week altogether. This introduces “exit day” as another degree of freedom, though, which puts me at greater risk for the curse of dimensionality.

Before studying MFE and a date distribution for the losing trades described above, I see a bigger problem potentially lurking that should be addressed first.

I will talk about this next time.

Bullish Iron Butterflies (Part 6)

So far I have done several things with the BIBF analysis: considered the impact of transaction fees (TF), looked at width-adjusted ROI, identified a relationship between spread width and underlying price, and looked at performance stratified by implied volatility. Today I want to talk about maximum adverse excursion (MAE).

I have two issues to address before looking at MAE distribution: TF and width normalization. I like to remain as plain vanilla as possible in my analysis to minimize chances of curve-fitting. This means not implementing one condition then overlaying another on top of that then a third on top of the first two, etc. Adhering to the “plain vanilla” guideline could mean leaving the $26/contract TF and not normalizing for spread width.

I would be more willing to conduct the analysis this way if it didn’t differentially affect trades. At $26/contract, the total TF is $208/trade. Given $735 as the average cost for a 20-point butterfly, starting down $208 means the minimum MAE is -28.2% (and -52% for the cheapest trade of $400!). The wider butterflies are affected less due to the larger denominator.

Aside from this TF-induced-apples-to-oranges MAE comparison, the whole concept of being in loss at trade inception seems questionable. Yes, slippage is a reality of trading and this is a logical way of accounting for it. Intuitively, though, I feel MAE should be zero when the trade is placed.

Reducing TF to $6/contract would cost me $48/trade, which is a 77% reduction. For the average 20-point butterfly this is -6.5% (-12% for the cheapest 20-point butterfly). This feels small enough to be tolerable while still acknowledging the reality of slippage. Unfortunately this still affects narrow butterflies more than wider ones. In the true spirit of MAE, I think I must normalize for TF by adding back the $208 for each trade.

The discussion is similar with regard to spread width. Narrow-butterfly PnL seems to be skewed toward the loss side while normalizing for spread width mitigates this effect. To some degree this is a position sizing issue (how many contracts per $10,000?), which I would prefer to leave out of the system development process altogether. Because of the large effect, though, I think I have no choice but to normalize.

Next time I will study the distribution of width-adjusted MAE without transaction fees.

Bullish Iron Butterflies (Part 5)

Today I want to focus on implied volatility (IV) to better understand whether high IV offers any edge to trading the BIBF.

I sorted the spreadsheet by Avg IV and tabulated counts and trade results:

BIBF Distribution of trades by Avg IV (6-1-17)

As expected, high IV does not occur very often: 71.46% of all trades occurred with IV under 25.

Higher IV does not seem to offer much of an edge. You may recall that the average width adjusted ROI across all trades is -4.22%. The green cells correspond to ROI numbers that are better than this and they appear scattered across IV categories.

The four exceptions are the profitable trades placed with Avg IV between 60-85.

Two things give me pause about drawing meaningful conclusions from these highest of IV levels. First, IV of 60 or greater encompasses only 0.95% of the total trades. Second, all these trades occurred between October 6 and December 12, 2008, which is a mere sliver of the 16+ years covered by the entire backtest. This short time interval also corresponds to just one market condition: the worst crash we have seen this century. I would not generalize based on such a limited sample size.

This illustrates one of the dangers of doing spreadsheet research. I put in formulas and whipped up these numbers but I still need to look over the computations and scrutinize whether they make practical sense. In this case, they appear meaningful even though they may be due solely to chance.

Besides comparing trades in different IV groupings, another approach is to take trades only when Avg IV equals an n-day high. This is similar to the metric of IV Rank, which is frequently discussed in trading circles. Here is the breakdown of trade performance when Avg IV hits an n (ranging from 5 to 90)-day high:

BIBF Distribution of trades by Avg IV n-day high (6-1-17)

No groups show profitable average trades. I thought longer-term highs would correspond to higher IV levels, which would be more susceptible to mean-reversion thereby benefiting the BIBF. This may be happening along with big market moves at highest IV that offset the IV contraction (I have seen this before). I can tell that longer-term highs are selecting conditions with higher IV (including the most volatile IV spikes, which are probably included for most values of n) because IV is directly proportional to n.

To see such a strong inverse relationship between average trade and n, though, is quick shocking to this investigator.

Just in case you’re wondering why I’m bothering to analyze these data at all with them clearly amounting to a losing strategy, I remind you that the $26/contract transaction fee is having a significant negative effect on the results.