Automated Backtester Research Plan (Part 9)
Posted by Mark on January 24, 2019 at 06:27 | Last modified: November 23, 2018 09:40With digressions on position sizing for spreads and deceptive butterfly trading plans complete, I will now resume with the automated backtester research plan.
We can study [iron, perhaps, for better execution] butterfly trades entered daily from 10-90 days to expiration (DTE). We can center the trade 0% (ATM) to 5% OTM (bullish or bearish) by increments of 1% [perhaps using caution to stick to the most liquid (10- or 25-point) strikes especially when open interest is low*]. We can vary wing width from 1-5% of the underlying price by increments of 1%. We can vary contract size to keep notional risk as consistent as possible (given granularity constraints of the most liquid strikes).
An alternative approach to wing selection would be to buy longs at particular delta values (e.g. 2-4 potential delta values for each such as 16-delta put and 25-delta call). This could be especially useful to backtest asymmetrical structures, which are a combination of symmetrical butterflies and vertical spreads (as mentioned in the second-to-last paragraph here).
With trades held to expiration, I’d like to track and plot maximum adverse (favorable) excursion for the winners (losers) along with final PnL and total number of trades to determine whether a logical stop-loss (profit target) may exist. We can also analyze differences between holding to expiration, managing winners at 5-25% profit by increments of 5%, or exiting at 1-3x profit target by increments of 0.25x. We can also study exiting at 7-28 (proportionally less on the upper end for short-term trades) DTE by increments of seven.
As an alternative not previously mentioned, we can use DIT as an exit criterion. This could be 20-40 days by increments of five. Longer-dated trades have greater profit (and loss) potential than shorter-dated trades given a fixed DIT, though. To keep things proportional, we could instead backtest exiting at 20-80% of the original DTE by increments of 15%.
Trade statistics to track include winning percentage, average (avg.) win, avg. loss, largest loss, largest win, profit factor, avg. trade (avg. PnL), PnL per day, standard deviation of winning trades, standard deviation of losing trades, avg. days in trade (DIT), avg. DIT for winning trades, and avg. DIT for losing trades. Reg T margin should be calculated and will remain constant throughout the trade. Initial PMR should be calculated along with the maximum value of the subsequent/initial PMR ratio.
We can later consider relatively simple adjustment criteria. I may spend some time later brainstorming some ideas on this, but I am most interested at this point in seeing raw statistics for the butterfly trades.
I will continue next time.
>
* This would be a liquidity filter coded into the backtester. A separate study to see how open interest for
different strikes varies across a range of DTE might also be useful.
Butterfly Skepticism (Part 4)
Posted by Mark on January 21, 2019 at 06:37 | Last modified: November 23, 2018 10:14Today I want to complete discussion of the protective put (PP) butterfly adjustment.
I might be able to come up with some workaround (as done in this second paragraph) for PP backtesting. I could look at EOD [OHLC] data and determine when the low was more than 1.6 SD below the previous day’s close. In this case, I could purchase the put at the close. This would bias the backtest against (not a bad thing) the adjustment in cases where the close was more than 1.6 SD below because the put would be more expensive.
Unfortunately, I am not sure this particular workaround would work. If the close is less than 1.6 SD below then the backtested PP would be less expensive than actual. Furthermore, if I waited until EOD then the NPD and corresponding PP(s) to purchase would be different. This would distort the study in an unknown direction. I could track error (difference) between -1.6 SD and closing market price. Positive and negative error might cancel out over time. If I had a large sample size then this might or might not be meaningful.
At best, this workaround seems like a questionable approximation of an adjustment strategy that is precisely defined.
Before dismissing the PP out of frustration, let’s step back for a moment and piece together some assumptions.
First, I believe the butterfly can be a trade with somewhat consistent profits and occasionally larger losses. Overall, I’m uncertain whether this has a positive or negative expectancy (hopefully to be determined as I begin to describe here).
Second, as butterflies are held longer, I believe profitability will be decreased. I have seen some anecdotal (methodology incompletely defined) research to suggest butterflies are more profitable when avoiding periods of greatest negative gamma.
Third, I have seen anecdotal (methodology incompletely defined) research to suggest PPs as unprofitable whether:
- purchased in high or low IV
- purchased at specified deltas
- purchased for specified amounts
- held to expiration, managed at specified profit targets, or managed at 50% loss
>
Fourth, this adjustment will require any butterfly to be held longer on average. The additional time will be needed to recoup the PP loss. The result will be, as described per second assumption above, decreased average profitability.
In my mind, combining the first and fourth assumptions does not bode well.
The big unknown involves the magnitude of the largest losses and in what percentage of trades those largest losses occur.
Interestingly, the trader who explained this to me said PP will lose money in most cases. What it can prevent is a massive windfall loss. Being forthright [about the obvious?] may give the teacher more credibility. Without backtesting, though, I think it leaves us with more than reasonable doubt over whether this approach tends toward profit or loss.
Categories: Option Trading | Comments (0) | PermalinkButterfly Skepticism (Part 3)
Posted by Mark on January 17, 2019 at 07:13 | Last modified: November 23, 2018 09:58On my mind this morning is skepticism regarding the protective put (PP).
I have seen the PP lauded by many traders as a lifesaving arrow to have in the quiver.
One trader described this to me with regard to a butterfly trading plan. Part of the plan provides for the following adjustment:
- If NPD is at least 10 with market at least -1.6 SD intraday then record NPD.
- Buy PP(s) to cut NPD by 75%.
- On a subsequent big move, if NPD again reaches value recorded in Step 1 then repeat Step 2.
- If market reverses to the high of purchase day, then close PP(s) from Step 2.
>
Upon further questioning, I got some additional information. He learned it from a guy who claims to have “mentored” many traders. The mentor (teacher) claims to have seen many lose significant money in big moves and therefore recommends this to avoid windfall losses. The teacher has shown numerous historical examples where this adjustment would keep people in the trade (not stopped out at maximum loss) and often wind up profitable. Further prodding revealed uncertainty over whether these “numerous” examples amount to more than a handful of instances. Of the several times the adjustment has been presented, he acknowledged the possibility that many could have been repetition of the same [handful of] instance[s]. He is uncertain whether anyone has presented big losing trades more than once.
Much of this casts doubt over the sample size behind this adjustment. We certainly wouldn’t want to fall prey to that described in this this second-to-last paragraph.
As described in this third paragraph, the PP is simply an overlay added later in the trade. This strategy also has its own catchy name and is marketed. In order to backtest, we could study the profitability of long puts purchased on days the market is down at least 1.6 SD (also explore the surrounding parameter space as discussed in the fourth complete paragraph here).
From a backtesting perspective, intraday is a huge wrinkle. Technically, I’d need intraday data to identify exactly when the market was down 1.6 SD in order to purchase the PP at the correct time. As mentioned in the second-to-last paragraph here, this is arguably another reason why certain trading plans cannot be backtested: data not available.
I will continue next time.
Categories: Option Trading | Comments (0) | PermalinkConstant Position Sizing of Spreads Revisited (Part 4)
Posted by Mark on January 8, 2019 at 07:23 | Last modified: November 8, 2018 10:54Today I will conclude this blog digression by deciding how to define constant position size, which I believe is important for a homogenous backtest.
The leading candidates—all mentioned in Part 3—are notional risk, leverage ratio, and contract size.
Possible means to achieve—both mentioned in Part 2—are fixed credit and fixed delta.
I thought it might be the case that fixed delta results in a fixed leverage ratio. I suggested this in the last paragraph of Part 1 where I asked whether fixed delta would lead to a constant SWUP percentage. For naked puts under Reg T margining, gross requirement is notional risk. For spreads under Reg T margining, notional risk is spread width x # contracts and while notional risk may be fixed, the SWUP percentage varies.
Speaking of, we also have Reg T versus portfolio margining (PM) to complicate things. Both focus on a fixed percentage down (e.g. -100% for Reg T vs. -12% for PM) on the underlying. However, PnL at -12% can vary significantly with underlying price movement. PnL for spreads at -100% will not change as the underlying moves around because the long strike—at which point the expiration risk curve goes horizontal to the downside—is so far above.
Implied volatility (IV) also needs to be teased out since it will affect some of these parameters but not others. Given fixed strike price, IV is directly (inversely) proportional to delta (relative moneyness). For naked puts assuming constant contracts and fixed delta, IV is inversely proportional to notional risk and to leverage ratio. IV does not relate to leverage ratio for spreads, which is net liquidation value (NLV) divided by notional risk as defined two paragraphs above in the last sentence.
After spending extensive time immersed in all this wildly theoretical stuff, I seem to keep coming back to notional risk, leverage ratio, and fixed delta. The first two vary with NLV* and with # contracts due to proportional slope of the risk graph. Number of contracts can vary to keep notional risk relatively constant as strike price changes but this applies more to naked puts and less to spreads where spread width is of equal importance.
I want to say that for naked puts, the answer is fixed notional risk (strike price x # contracts), but we also need to keep delta fixed to maintain moneyness. With fixed credit, changing the latter would affect slope and leverage ratio. This is how I described the research plan originally and we will see whether an optimal delta exists or whether results are similar across the range. In the midst of all this mental wheel spinning, I seem to have gotten this right for naked puts without realizing it.
I guess I have also lost sight of the fact that this post is not even supposed to be about nakeds (see title)!
Getting back to constant position sizing of spreads, I think we can focus on notional risk and moneyness but we should also factor in SWUP. As the underlying price increases (decreases), spread width can increase (decrease) and we will normalize notional risk by varying contract size. Short strikes at fixed delta will be implemented and compared across a delta range.
Which is what I had settled on before (for spreads)…
[To reaffirm] Which is what I had settled on before (for naked puts)…
As I unleash a gigantic SIGH, I question whether any of this extensive deliberation was ever necessary in the first place?
I think at some level, this mental wheel spinning is what I missed as a pharmacist. The complexity fires my intellectual juices and is great enough to require peer review/collaboration to sort through. Once that is done, selling the strategy is an entirely separate domain suited to different talents, perhaps.
I left a job of the people (co-workers/customers) for a job that begs for people, which I have really yet to find. Oh the irony!
>
* By association, this is why I stressed magnitude of drawdown as a % of initial account value (NLV) in previous posts.
Constant Position Sizing of Spreads Revisited (Part 3)
Posted by Mark on January 3, 2019 at 06:45 | Last modified: November 7, 2018 05:17Happy New Year, everyone!
The current blog mini-series has been a tangent from the automated backtester research plan. Today I will discuss whether fixed notional risk—with regard to naked puts and spreads—is even important.
This issue is significant because it seems like fixed notional risk is the “last man standing” since I initially mentioned it in Part 1. I have reassessed the importance of so many concepts and parameters in this research plan. The fact that they get misunderstood and reinterpreted is testament to how theoretical and highly complex they are. Especially from the perspective of avoiding confirmation bias, I believe this is all debate that must be had, and a main reason why system development is best done in groups as a means to check each other.
The reason fixed notional risk may not matter is because leverage ratio can vary. I also mentioned this in the third-to-last paragraph here. Leverage ratio is notional risk divided by portfolio margin requirement (PMR). Keeping PMR under net liquidation value and meeting the concentration criterion are essential to satisfy the brokerage. Leverage ratio can be lowered by selling the same total premium NTM. This will affect the expiration curve by decreasing margin of safety as it lifts T+x lines. Analyzing this, somehow, might be worth doing if backtesting over a delta range does not provide sufficient comparison.
Whether “homogeneous backtest” should mean constant leverage ratio throughout is another highly theoretical question that is subject to debate. Keeping allocation constant, which I aim to do in the serial, non-overlapping backtests, is one thing, but leverage can vary in the face of fixed allocation. I discussed this here in the final four paragraphs. In that example, buying the long option for cheap halves Reg T risk but dramatically increases the chance of blowing up (complete loss) since the market only needs to drop to 500 rather than zero. While the chance of a drop even to 500 is infinitesimal, theoretically it could happen and on a percent of percentage basis, the chance of that happening is much greater than a drop to zero.
Portfolio margin (PM) provides leverage because the requirement is capped at T+0 loss seen 12% down on the underlying. In the previous example, 500 represents a 50% drop. Even under PM, though, leverage ratio can vary because of what I said in second-to-last sentence of paragraph #4 (above).
When talking just about naked puts, much of this question about leverage seems to relate to how far down the expiration curve extends at a market drop of 12%, 25%, or 100%. This brings contract size back into the picture because contract size is proportional to downside slope of that curve.
With verticals, though, number of contracts is less meaningful because width of the spread is also important. The downside slope will be proportional to number of contracts. The max potential loss of the vertical depends not only on the downside slope, but for how long that slope persists because the graph only slopes down between the short and long strikes.
Either way, you can see how number of contracts gets brought back into the discussion and could, itself, be mistaken as being sufficient for “constant position size.”
I certainly was not wrong with my prediction from the second paragraph of Part 1.
Categories: Option Trading | Comments (0) | PermalinkConstant Position Sizing of Spreads Revisited (Part 2)
Posted by Mark on December 31, 2018 at 06:22 | Last modified: November 6, 2018 11:00I’m doing a Part 2 because early this morning, I had another flash of confusion about the meaning of “homogeneous backtest.”
The confusion originated from my current trading approach. Despite my backtesting, I still trade with a fixed credit. If I used a fixed delta then 2x-5x initial credit (stop loss) would be larger at higher underlying prices. Gross drawdown as a percentage of the initial account value would consequently be higher. This means drawdown percentage could not be compared on an apples-to-apples basis across the entire backtesting interval.
Read the “with regard to backtesting” paragraph under the graph shown here. Constant position size (e.g. number of contracts or notional value?), apples-to-apples comparison of PnL changes (e.g. gross or percentage of initial/current account value?) throughout, and evaluating any drawdown (e.g. gross or as a percentage of initial/current account value?) as if it happened from Day 1 are all nebulous and potentially contradictory references (as described).
In this post, I argue:
> Sticking with the conservative theme, I should also calculate
> DD as a percentage of initial equity because this will give a
> larger DD value and a smaller position size. For a backtest
> from 2001-2015, 2008 was horrific but as a percentage of
> total equity it might not look so bad if the system had
> doubled initial equity up to that point.
If I trade fixed credit then I am less likely to incur drawdown altogether at higher underlying price, which makes for a heterogeneous backtest when looking at the entire sample of daily trades. If I trade fixed delta then see the last sentence of (above) paragraph #2.
I focused the discussion on position size in this 2016 post where I stressed constant number of contracts. Recent discussion has neither focused on fixed contracts nor fixed credit.
“Things” seem to “get screwed up” (intentionally nebulous) if I attempt to normalize to allow for an apples-to-apples comparison of any drawdown as if it occurred from Day 1.
If I allow spread width [if backtesting a spread] to vary with underlying price and I sell a fixed delta—as discussed in Part 1—then a better solution may be to calculate gross drawdowns as a percentage of the highwater account value to date. I will leave this to simmer until my next blogging session for review.
I was going to end with one further point but I think this post has been sufficiently thick to leave it here. I will conclude with Part 3 of this blogging detour next year!
Categories: Backtesting | Comments (0) | PermalinkConstant Position Sizing of Spreads Revisited (Part 1)
Posted by Mark on December 28, 2018 at 07:37 | Last modified: November 6, 2018 05:06In Part 7, I said constant position size is easy to do with vertical spreads by maintaining a fixed spread width. I now question whether a fixed spread width is sufficient to achieve my goal of a homogeneous backtest throughout.
I enter this deliberation with reason to believe it will be a real mess. I have addressed this point before without a successful resolution. This post provides additional background.
The most recent episode of my thinking on the matter began with the next part of the research plan on butterflies. I want to backtest ATM structures and perhaps one strike OTM/ITM, two strikes OTM/ITM, etc. Rather than number of strikes, which would not be consistent by percentage of underlying price, a better approach may be to specify % OTM/ITM.
I then started thinking about my previous backtesting along with reports of backtests from others suggesting spread width to be inversely proportional to ROI (%). It makes sense to think the wider the spread, the more moderate the losses because it’s more likely for a 30-point (for example) spread to go ITM than it is a 50-point spread with the underlying having to move an additional 20 points in the latter case. This begs the question about whether an optimal spread width exists because while wider spreads will incur fewer max losses, wider spreads also carry proportionally higher margin requirements.
Also realize that a 30-point spread at a low underlying value is relatively wide compared to a 30-point spread at a high underlying price. I mentioned graphing this spread-width-to-underlying-price (SWUP) percentage in Part 7. We could look to maintain a constant SWUP percentage if granularity is sufficient; with the 10- and 25-point strikes most liquid, having to round to the nearest liquid strike could force SWUP percentage to vary significantly (especially at lower underlying prices).
All of this is to suggest that spread width should be left to fluctuate with underlying price, which contradicts what I said about fixed spread width and constant capital. We can attempt to normalize total capital by varying the number of contracts as discussed earlier with regard to naked puts. From the archives, similar considerations about normalizing capital and granularity were discussed here and here.
Aside from notional value, I think the other essential factor to hold constant for a homogeneous backtest is moneyness. As mentioned above, spreads should probably not be sold X strikes OTM/ITM. We should look to sell spreads at fixed delta values (e.g. “short strike nearest to Y delta”) since delta takes into account days to expiration, implied volatility, and underlying price.
An interesting empirical question is how well “long strike nearest to Z delta” does to maintain a constant SWUP percentage.
Categories: Backtesting | Comments (0) | PermalinkAutomated Backtester Research Plan (Part 8)
Posted by Mark on December 24, 2018 at 06:54 | Last modified: November 9, 2018 13:49After studying put credit spreads (PCS) as daily trades, the next step is to study them as non-overlapping trades.
As discussed earlier, I would like to tabulate several statistics for the serial approach. These include number (and temporal distribution) of trades, winning percentage, compound annualized growth rate (CAGR), maximum drawdown, average days in trade, PnL per day, risk-adjusted return (RAR), and profit factor (PF). Equity curves will represent just one potential sequence of trades and some consideration could be given to Monte Carlo simulation. We can plot equity curves for different account allocations such as 10% to 70% initial account value by increments of 5% or 10% for a $50M account. A 30% allocation (for example) would then be $15M per trade. By holding spread width constant, drawdowns throughout the backtesting interval may be considered normalized.
As an example of the serial approach, I would like to backtest “The Bull” with the following guidelines:
- Entry: at 65 days to expiration (DTE), sell closest 30-point PCS to $2.50/contract
- Exit: when next month’s trade is at the entry date (65 DTE) or position down $250/contract
- Max profit: $250/contract (at expiration; $200-$225 would be more reasonable to manage early)
- Max loss: -$250/contract
- Max risk: $2,750/contract (assuming at least $2.50 credit obtained)
>
I will not detail a research plan for call credit spreads. If we see encouraging results from looking at naked calls then this can be done as described for PCS.
I also am not interested in backtesting rolling adjustments for spreads due to potential execution difficulty.
Thus far, the automated backtester research plan has two major components: study of daily trades to maximize sample size and study of non-overlapping trades. I alluded to a third possibility when discussing filters and the concentration criterion: multiple open trades not to exceed or match one per day.
This is suggestive of traditional backtesting I have seen over the years where trades are opened at a specified DTE. For trades lasting longer than 28 (or 35 every three months) days, overlapping positions will result. As discussed here, I am not a proponent of this approach. Nevertheless, for completeness I think it would be interesting to do this analysis from 30-64 DTE and compare results between groups, which I hypothesize would be similar. To avoid future leaks, position sizing should be done assuming two overlapping trades at all times. ROI should also be calculated based on double the capital.
Another aspect of traditional backtesting I have eschewed in this trading plan is the use of standard deviation (SD) units. I have discussed backtesting many trades from (for example) 0.10-0.40 delta by units of 0.10. More commonly used are 1 SD (0.16 delta corresponding to 68%), 1.5 SD (0.07 delta corresponding to 86.6%), and 2 SD (0.025 delta corresponding to 95%). Although not necessary, we could run additional backtests based on these unit measures for completeness.
Categories: Backtesting | Comments (0) | Permalink