Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Constant Position Sizing of Spreads Revisited (Part 2)

I’m doing a Part 2 because early this morning, I had another flash of confusion about the meaning of “homogeneous backtest.”

The confusion originated from my current trading approach. Despite my backtesting, I still trade with a fixed credit. If I used a fixed delta then 2x-5x initial credit (stop loss) would be larger at higher underlying prices. Gross drawdown as a percentage of the initial account value would consequently be higher. This means drawdown percentage could not be compared on an apples-to-apples basis across the entire backtesting interval.

Read the “with regard to backtesting” paragraph under the graph shown here. Constant position size (e.g. number of contracts or notional value?), apples-to-apples comparison of PnL changes (e.g. gross or percentage of initial/current account value?) throughout, and evaluating any drawdown (e.g. gross or as a percentage of initial/current account value?) as if it happened from Day 1 are all nebulous and potentially contradictory references (as described).

In this post, I argue:

     > Sticking with the conservative theme, I should also calculate
     > DD as a percentage of initial equity because this will give a
     > larger DD value and a smaller position size. For a backtest
     > from 2001-2015, 2008 was horrific but as a percentage of
     > total equity it might not look so bad if the system had
     > doubled initial equity up to that point.

If I trade fixed credit then I am less likely to incur drawdown altogether at higher underlying price, which makes for a heterogeneous backtest when looking at the entire sample of daily trades. If I trade fixed delta then see the last sentence of (above) paragraph #2.

I focused the discussion on position size in this 2016 post where I stressed constant number of contracts. Recent discussion has neither focused on fixed contracts nor fixed credit.

“Things” seem to “get screwed up” (intentionally nebulous) if I attempt to normalize to allow for an apples-to-apples comparison of any drawdown as if it occurred from Day 1.

If I allow spread width [if backtesting a spread] to vary with underlying price and I sell a fixed delta—as discussed in Part 1—then a better solution may be to calculate gross drawdowns as a percentage of the highwater account value to date. I will leave this to simmer until my next blogging session for review.

I was going to end with one further point but I think this post has been sufficiently thick to leave it here. I will conclude with Part 3 of this blogging detour next year!

Constant Position Sizing of Spreads Revisited (Part 1)

In Part 7, I said constant position size is easy to do with vertical spreads by maintaining a fixed spread width. I now question whether a fixed spread width is sufficient to achieve my goal of a homogeneous backtest throughout.

I enter this deliberation with reason to believe it will be a real mess. I have addressed this point before without a successful resolution. This post provides additional background.

The most recent episode of my thinking on the matter began with the next part of the research plan on butterflies. I want to backtest ATM structures and perhaps one strike OTM/ITM, two strikes OTM/ITM, etc. Rather than number of strikes, which would not be consistent by percentage of underlying price, a better approach may be to specify % OTM/ITM.

I then started thinking about my previous backtesting along with reports of backtests from others suggesting spread width to be inversely proportional to ROI (%). It makes sense to think the wider the spread, the more moderate the losses because it’s more likely for a 30-point (for example) spread to go ITM than it is a 50-point spread with the underlying having to move an additional 20 points in the latter case. This begs the question about whether an optimal spread width exists because while wider spreads will incur fewer max losses, wider spreads also carry proportionally higher margin requirements.

Also realize that a 30-point spread at a low underlying value is relatively wide compared to a 30-point spread at a high underlying price. I mentioned graphing this spread-width-to-underlying-price (SWUP) percentage in Part 7. We could look to maintain a constant SWUP percentage if granularity is sufficient; with the 10- and 25-point strikes most liquid, having to round to the nearest liquid strike could force SWUP percentage to vary significantly (especially at lower underlying prices).

All of this is to suggest that spread width should be left to fluctuate with underlying price, which contradicts what I said about fixed spread width and constant capital. We can attempt to normalize total capital by varying the number of contracts as discussed earlier with regard to naked puts. From the archives, similar considerations about normalizing capital and granularity were discussed here and here.

Aside from notional value, I think the other essential factor to hold constant for a homogeneous backtest is moneyness. As mentioned above, spreads should probably not be sold X strikes OTM/ITM. We should look to sell spreads at fixed delta values (e.g. “short strike nearest to Y delta”) since delta takes into account days to expiration, implied volatility, and underlying price.

An interesting empirical question is how well “long strike nearest to Z delta” does to maintain a constant SWUP percentage.

Automated Backtester Research Plan (Part 8)

After studying put credit spreads (PCS) as daily trades, the next step is to study them as non-overlapping trades.

As discussed earlier, I would like to tabulate several statistics for the serial approach. These include number (and temporal distribution) of trades, winning percentage, compound annualized growth rate (CAGR), maximum drawdown, average days in trade, PnL per day, risk-adjusted return (RAR), and profit factor (PF). Equity curves will represent just one potential sequence of trades and some consideration could be given to Monte Carlo simulation. We can plot equity curves for different account allocations such as 10% to 70% initial account value by increments of 5% or 10% for a $50M account. A 30% allocation (for example) would then be $15M per trade. By holding spread width constant, drawdowns throughout the backtesting interval may be considered normalized.

As an example of the serial approach, I would like to backtest “The Bull” with the following guidelines:


I will not detail a research plan for call credit spreads. If we see encouraging results from looking at naked calls then this can be done as described for PCS.

I also am not interested in backtesting rolling adjustments for spreads due to potential execution difficulty.

Thus far, the automated backtester research plan has two major components: study of daily trades to maximize sample size and study of non-overlapping trades. I alluded to a third possibility when discussing filters and the concentration criterion: multiple open trades not to exceed or match one per day.

This is suggestive of traditional backtesting I have seen over the years where trades are opened at a specified DTE. For trades lasting longer than 28 (or 35 every three months) days, overlapping positions will result. As discussed here, I am not a proponent of this approach. Nevertheless, for completeness I think it would be interesting to do this analysis from 30-64 DTE and compare results between groups, which I hypothesize would be similar. To avoid future leaks, position sizing should be done assuming two overlapping trades at all times. ROI should also be calculated based on double the capital.

Another aspect of traditional backtesting I have eschewed in this trading plan is the use of standard deviation (SD) units. I have discussed backtesting many trades from (for example) 0.10-0.40 delta by units of 0.10. More commonly used are 1 SD (0.16 delta corresponding to 68%), 1.5 SD (0.07 delta corresponding to 86.6%), and 2 SD (0.025 delta corresponding to 95%). Although not necessary, we could run additional backtests based on these unit measures for completeness.

Automated Backtester Research Plan (Part 7)

Once done with straddles and strangles, put credit spreads (PCS) are next in the automated backtester research plan.

The methodology is much the same for PCS as for naked puts, which I detailed here and here.

We can first study PCS placed every trading day to maximize sample size. Trades can be entered between 30-64 days to expiration (DTE). The short leg can be placed at the first strike under -0.10 to -0.50 delta by increments of -0.10. We can hold to expiration, manage winners at 25% (ATM options only?) or 50%, or exit at 7-21 DTE by increments of seven. We can also exit at 20-80% of the original DTE by increments of 15%. We can manage losers at 2x, 3x, 4x, and 5x initial credit. I’d like to track and plot maximum adverse (favorable) excursion (no management) for the winners (losers) along with final PnL and total number of trades. I want to monitor winning percentage, average win, average loss, largest loss, profit factor, average trade (average PnL), PnL per day, standard deviation of winning trades, standard deviation of losing trades, average days in trade (DIT), average DIT for winning trades, and average DIT for losing trades.

As always, I think maintenance of a constant position size is important. This is easier to do with vertical spreads because the width of the spread—to be held constant for each backtest—defines the risk. We can vary the width between 10-50 points by increments of 10 or 25-100 points by increments of 25 depending on underlying.

My gut says that we do not want long legs acting as unreactive units (standard options) at lower (higher) prices of the underlying. Rather than an apples-to-apples backtest throughout, this would actually be two different backtests with the long leg serving only as margin control at lower underlying prices and as an actual hedge otherwise. Unreactive units may result when the spread width is too large as a percentage of the underlying price: this percentage should be graphed over time. An alternative way of analyzing this is hedge ratio, which can also be graphed over time. Hedge ratio equals decay rate (theta divided by mark) of the short option divided by decay rate of the long. A hedge ratio less than 0.80 is suggestive of long option decay that is too rapid for the short. This may leave the short option unprotected.

The importance of this last paragraph is subject to debate. I alluded to the subject earlier where I cursorily addressed the feasibility of naked call backtesting altogether.

Shorter dated trades, which have not been discussed thus far in the research plan, may also be studied. I would be interested in studying trades placed at 4-30 DTE with all the permutations given above. We can also use weekly options [when available] subject to a liquidity filter. This filter can check for a minimum open interest requirement or a bid-ask spread below a specified percentage of the mark.

General filters can also be studied as discussed in Part 2 (linked in paragraph #2 above).

I will continue next time.

Automated Backtester Research Plan (Part 6)

In the last three posts, I detailed portfolio margin (PM) considerations with the automated backtester. After backtesting naked puts and naked calls separately, the next thing I want to do is add a naked call overlay to the naked puts.

This is not the previously-mentioned ATM call adjustment but rather study of 0.10-0.40- (by increments of 0.10) delta strangles. Strangles can be left to expiration, managed at 50%, or closed for loss at 2-5x initial credit. I want to track total number of trades, winning percentage, average PnL, average loss, largest loss, standard deviation of returns, days in trade, PnL per day, PF, RAR, and maximum adverse excursion. Strangles should be normalized for notional risk and with implementation of PM logic, notional risk can be replaced by theoretical loss from walking the chain up 15% (or up 12% plus a 10% vega adjustment). With this done, return on capital can then be calculated as return on PM (perhaps calculated as return on the largest PM ever seen in the trade since PMR varies from one day to the next). Maximum subsequent:initial PM ratio should be tracked. We can also study the effect of time stops.

If deemed useful then maximum favorable excursion (MFE) can also be studied for unmanaged trades. This could be studied and plotted in histogram format before looking at ranges of management levels (not mentioned in previous paragraph). With MFE and MAE, some thought may need to be given about whether to analyze in terms of dollars or percentages. If notional risk is somehow kept constant, though, then either may be acceptable.

Incorporating naked calls with filters can also be studied. Naked calls may or may not be part of the overall position at any given time. I am interested to study MAE by looking at underlying price change of different future periods given specified filter criteria. Any stable edges we identify could be higher probability entries for a naked call overlay. I approach this with some skepticism since it does imply market timing. As discussed in Part 3, this type of analylsis lends itself more to spreadsheet research than to the automated backtester, which would run simulated trades. We would primarily be studying histograms and running statistical analysis on distributions.

Backtesting of undefined risk strategies will conclude with naked straddles. Like strangles, straddles can be left to expiration, managed at 10-50% by increments of 5%, or closed for loss at 2-5x initial credit. I would want to monitor total number of trades, winning percentage, average PnL, average loss, largest loss, standard deviation of returns, days in trade, PnL per day, PF, RAR, and MAE (MFE?). The same comments given above for straddles regarding PM logic, return on PM, and PM ratios also apply here. We can also study the effect of time stops (managing early from 7-21 DTE by increments of seven).

As discussed with naked puts and calls, I would like to study rolling as a trade management tool. We can reposition strangles in the same (subject to a minimum DTE, perhaps) or following month back to the original delta values when a short strike gets tested or when down 2x-5x initial credit. We can do the same for straddles when an expiration breakeven (easily calculated) is tested as well as rolling just the profitable side to ATM.

Aside from studying straddles and strangles as daily trades, serial [non-overlapping] backtesting can be done in order to general equity curves and study relevant system metrics as discussed previously with regard to naked puts and naked calls.

Portfolio Margin Considerations with the Automated Backtester (Part 3)

Today I continue discussion of portfolio margin (PM) [requirement (PMR)] and the automated backtester.

Please recall that I have described two research approaches. The first analyzes trades opened daily to collect statistics on the largest sample size possible. The second approach studies serial backtesting of non-overlapping trades to generate an account equity curve and to study things like maximum drawdown and risk-adjusted return. The latter lends itself to one sequence of trades out of an infinite number of potential permutations, which is suggestive of a Monte Carlo simulation.

I can definitely see a use for PMR calculations in the daily trades category. For each trade, the automated backtester could approximate PMR at trade inception and for each [subsequent] day in trade. To get a sense of how much capital would be required to maintain the position, we would want to track the maximum value of the subsequent/initial PMR ratio. The amount of capital needed to implement a trading strategy is at least [possibly much more if done conservatively as discussed here and here] the maximum value of the subsequent/initial PMR ratio observed across all backtested trades. In addition to this single number, I would be interested in seeing the ratio distribution of all trades plotted as a histogram and perhaps also as a graph with date on the x-axis, ratio on the left y-axis (scatter plot), and underlying price on the right y-axis (line graph).

PMR calculations might have a place in the serial trades category as well. Plotting equity curves of different allocation percentages is different from whether those portfolios could be maintained depending on max PMR relative to account value. If PMR exceeds account value, then at least some positions would have to be closed. Since it’s impossible to know which positions this would involve or even whether the broker would do it automatically (at random), I might assume a worst-case scenario where the account would be completely liquidated. On the graph, the equity curve would go horizontal at this point. With a consequence this drastic, I think PMR monitoring is worth doing.

In addition to PM, some brokerages have a concentration requirement. One brokerage, for example, looks at the account PnL with the underlying down 25%. The projected loss must be less than 3x the net liquidation value of the account. Violation of this criterion will result in a “concentration call,” which is akin to a margin call. An account can be in the former but not the latter if it holds DOTM positions that would (not) significantly change in value with the underlying down 25% (12%). Closing these options (typically for $0.30 or less) will often resolve the concentration call.

Building concentration logic might be useful for backtesting with filters. A large enough account could actually be traded by opening daily positions. Otherwise, implementation of filters could result in multiple open positions (albeit less than one new position per day). Stressing the whole portfolio by walking the chain up 25% would be useful because a strategy that looks good in backtesting but violates the concentration criterion is not viable. Put another way, I cannot project a 20% annual return on capital when the capital actually needed to maintain a strategy is quadruple (quite possible with PM) that projected. In this case, 5% annualized would be a more accurate estimate.

Portfolio Margin Considerations with the Automated Backtester (Part 2)

Last time I started to explain portfolio margin (PM) and why a model is needed to calculate it.

I previously thought the automated backtester incapable of calculating/tracking PM requirement (PMR) without modeling equations [differential, such as Black-Scholes] and dedicated code, but this is not entirely correct. The database will have historical price data for all options. The automated backtester can simulate position value at different underlying price changes by “walking the chain.” In order to know the price of a particular option if the underlying were to instantaneously move X% higher (lower), I can look to the strike price that is X% lower (higher) than the current market price. Some rounding will have to be involved because +/- 3%, 6%, 9%, and 12% will not fall square on 10- or 25-point increment strikes that may be available in the data set (in highest liquidity and therefore most reliable prices).

The automated backtester would not be able to perfectly calculate PMR. In order to be perfect, the backtester would need to model the risk graph continuously on today’s date, which would require implementation of differential calculus. Rounding of the sort that I described above is not entirely precise. Also in order to be perfect, we would have to match the PM algorithm used by the brokerage(s). These are kept proprietary.

Another reason the automated backtester would not be able to perfectly calculate PMR is because walking the chain does not take into account implied volatility (IV) changes. [Some] brokerages also stress portfolios with increased IV changes to the downside when calculating PMR.

We can approximate the additional IV stress a couple different ways. First, instead of stressing up and down X% we could stress more to the downside. Second, we could use vega values from the data set in addition to walking the chain. Vega is the change in option price per 1% change in IV. If we want to simulate a 10% IV increase, then, we could add vega * 10 to short positions. This would probably not be exact because vega does not remain constant as IV changes. Vomma, a second-order greek that will not be included in the data set, is the change in vega per 1% increase in IV. The change in option price is actually the sum of X unequal terms in a series as defined by vega and vomma (along with third-order greeks and beyond to be absolutely precise).

Regardless of the imprecision, I think some PM estimate given by logic built into the automated backtester would be better than nothing. And my preference would always be to overestimate rather than underestimate PMR.

Portfolio Margin Considerations with the Automated Backtester (Part 1)

I want to revisit something mentioned in Part 1 about portfolio margin (PM).

Allocation and margin are two separate things with regard to short premium trades and I have only been taking into account the former. I have mentioned allocation with regard to serial backtesting of [non-overlapping] trades. After further consideration, I think margin should be monitored because while we may be able to place a trade, whether we can maintain the position when the market goes sharply against us is a different story.

At some brokerages, accounts of sufficient size can qualify for portfolio (also termed “risk-based”) margin (PM). Reg T margin [which applies to cash, not margin, accounts] reduces buying power by the maximum potential loss at expiration for a given trade. PM uses an algorithm that analyzes profit and loss of the whole portfolio when stressed X% up and Y% down. In other words, if the underlying security were to increase (decrease) by X% (Y%) today (not at expiration), then the algorithm calculates the worst change in value across that range. Specifics vary by brokerage but as an example, the algorithm may calculate -12% and +12% by increments of 3%. The maximum loss at any increment is the portfolio margin requirement (PMR). I will not incur a margin call provided PMR is less than the net liquidation value of my account.

Calculating PMR requires modeling of the cumulative position. A permanent component of the option pricing equation is implied volatility (IV). IV may be understood as the relative supply/demand for an option. This is inherently unknown, which is why a model is necessary.

As an example to explain this price uncertainty, suppose I am an institutional option trader looking to allocate $50 billion to a specific short premium position. The sooner I get this done, the sooner I have the opportunity to start making daily profit. Once the funds clear, I want to be in regardless of whether the market is up, down, a little or a lot.* You can be sure my $50 billion is going to move some markets by making purchased (sold) options more (less) expensive along with a coincident IV increase (decrease). This is the principle of supply and demand that, in this case, has nothing to do with underlying market move: simply when the bank/brokerage clears my funds for trading. For this and countless other reasons having nothing to do with market movement, unpredictable purchases/sales regularly occur—perhaps in smaller dollar amounts but the aggregate effects can be imagined to be similar.

I will continue next time.

* I may avoid “a lot” if liquidity dries up or bid/ask spreads become large.

Automated Backtester Research Plan (Part 5)

Today I will finish up the automated backtester research plan for naked calls.

Once done studying daily [overlapping] trade statistics, we can repeat the naked put analysis with a serial trading strategy for naked calls. This involves one open trade at a time. We can look at number of trades, winning percentage, compound annualized growth rate (CAGR), maximum drawdown, risk-adjusted return, and profit factor. Again, equity curves will represent just one potential sequence of trades and some consideration can be given to Monte Carlo simulation. We can plot equity curves for different account allocations such as 10% to 70% of initial account value by increments of 10%.

With both overlapping (daily) and non-overlapping (serial) trades, position size should be held constant to allow for an apples-to-apples comparison of drawdowns throughout the backtesting interval. With naked puts, position size is notional risk. Naked calls, though, have unlimited notional risk. Maybe we deduct 0.05-0.20 for the naked call premium under the assumption that we always purchase the lowest strike call available for minimal price to limit margin.

This would result in a vertical spread, though, and the width would be different depending on underlying price.

Does this compromise the feasibility of naked call backtesting altogether? If calls must be done as vertical spreads then buying the long leg for minimal premium will be different from most call credit spread studies to be done for widths 10 (25) to 50 (100) points wide by increments of 10 (25)—except for very low underlying prices where the larger widths may result in the same minimally-priced long being purchased. The naked call study has then become a call credit spread study, which overlaps with the vertical spread backtesting to be detailed later. This deserves further deliberation.

We can apply the same rolling ideas to naked calls as we did to naked puts. We can roll naked calls [up and] out to the next month when a short strike is tested or when the trade is down 2-5x initial credit. We can also roll naked puts up to same original delta in the same or next month if strike gets tested.

When studying filters, it will be important to look at total number (and distribution) of trades along with equity curve slopes to determine consistency of profit. Risk-adjusted return and profit factor should also be monitored.

Naked call filters for study are similar to those for naked puts. We can look at trades taken at 5-20-day highs (lows) by increments of five. Trades can be taken only when a short-term MA is above (below) a longer-term MA. As mentioned in the Part 2 footnote, my preference would be to avoid getting overly concerned with period optimization, but this may be unavoidable. Implied volatility (IV) filters may include trades taken with IV at an X-day high (low), on the first down day for IV after being up for two consecutive days, or with IVR above 25/50.

I am curious to find out if naked calls can add to total return and/or lower standard deviation of returns.

Next time I will revisit margin considerations.