Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Truth in Backtesting (Part 3)

In http://www.optionfanatic.com/2012/11/29/truth-in-backtesting-part-2/, I described how market on close (MOC) orders will likely not be executed at the closing price.  This presents a challenge to the trade like you backtest, or “truth in backtesting” mandate.

As an alternative to using MOC orders, I might be better off manually submitting a market order just before the close.  I believe only live trading experience of placing orders x seconds before the close for different values of x can give me a true idea about resultant slippage off the closing print.  In theory, the closer to 4:00 PM ET I place an order, the smaller the closing range relative to MOC closing range, which remains constant.  I must understand the risk in trying to execute closer to 4:00 PM ET as the greater possibility of not being filled at all due to Internet blips/delays or phone tie-ups, however.  Missing a trade altogether, which in trading jargon is called an “unable,” and having to execute at the next market open could result in a much different trade price especially if the market gaps.

Application of supply/demand logic corroborates the proposition that execution at the actual closing price is uncertain.  If a large number of people were trading my system and buy (sell) orders were submitted near the close in heavy volume then the stock price would move higher (lower) with speed.  This would make execution at the closing price even less likely unless the order were submitted proportionally closer to 4:00 PM ET.  Because the increased risk of unables is very difficult, if not impossible to quantify in monetary terms, I might have to accept a minimum amount of slippage at a latest possible order submission time and incorporate that into the backtesting model.

I will introduce and discuss more truth in backtesting with my next post.

Truth in Backtesting (Part 2)

Backtesting must be done the same way that live trades are executed to prevent results from being contaminated with immeasurable error due to differences in trading method.  The question at hand addresses the implicit assumption with end-of-day (EOD) backtesting that transactions will take place at the closing print.  A detailed look will make it evident that getting the closing price in live trading is at best more complicated than one might think and at worst not even realistic.

Market on Close orders would seem like the obvious choice to trade based on EOD backtesting.  As explained on the Interactive Brokers web site:

> [Suppose] You want to buy 100 shares… and decide that the closing price…
> has… proven to be the best price of the day. Create a BUY order, then select
> MOC… to specify a market-on-close order… You must submit the MOC order
> at least 15 minutes prior to the close. The order will be submitted at the close
> as a market order.

A significant caveat to MOC orders is explained on the Orion Futures web site:

> …this order will be executed on the market close. The fill price will be within
> the closing range which may in some markets be substantially
> different from the settlement price
. [emphasis mine].

With a MOC order, I don’t know how far off the closing price I will be filled!  Only through experimentation with different securities and different lot sizes can I really get a feel for how big the “closing range” is.  For all I know, getting filled at the closing price is a rarity.  As a prudent system developer, I should ultimately include the average deviation between closing price and actual fill as slippage into my backtesting because slippage can substantially affect the equity curve.

In my next post, I will discuss an alternative to MOC order placement.

Truth in Backtesting (Part 1)

I came across an advertisement for a trading service that provides a good educational point about backtesting.  The lesson to learn is that I must always attempt to trade like I backtest.  This is the “truth in backtesting” mantra, and if this is not possible then I can expect my live trading performance to be much worse than backtested models.

The first thing to understand is that I should always expect live trading performance to be somewhat worse than backtesting would suggest.  This can occur for many different reasons:  transaction fees (slippage), curve-fitting, loss of system effectiveness over time, etc.   Even if I can steer clear of any unintended factors that might exaggerate backtesting results, I should still expect the live trading results to disappoint.

This also suggests I should avoid any system that generates marginal profit in backtesting.  These systems may not profit at all in live trading.

One common way to generate exaggerated backtesting results is to couple trade signals and trade execution.  The advertisement mentioned above is for a trading system package that allegedly performed as follows over the last three years:

Each row represents performance of a different trading system (10 total).  Trades are triggered by end-of-day signals.  The first set of numbers shows performance when buy/sell signals are taken at the very same close.  The second set of numbers shows performance when buy/sell signals are taken at the following open.

Is it realistic to couple trading signals with immediate trade execution at the closing price as shown in the first set of data, above?

I will continue this discussion in the next post.

Trading System #2–Consecutive Directional Close (Part 10)

I am currently in the process of backtesting other broad-based indices with the CDC trading system.  In http://www.optionfanatic.com/2012/11/23/trading-system-2-consecutive-directional-close-part-9, I backtested QQQ (Nasdaq 100).  Today I will backtest IWM (Russell 2000 small caps).

Once again I will backtest x = 3, 4, 5, 6, 7 and N = 3, 4, 5, 6, 7 with a minimum total trades number of 55 (see http://www.optionfanatic.com/2012/11/16/trading-system-2-consecutive-directional-close-part-5).  I will include trade delays for buy and short trades with the understanding that any decent results I get may well be improved in live trading.  Here are the results as sorted by subjective function (RAR/MDD):

These numbers seem lackluster with a couple systems gaining nothing and most RAR/MDD < 5. As done with SPY and QQQ, I will next backtest long trades only:

Unlike SPY and QQQ, these numbers are actually worse.

When I see patterns in system development, I really want them to be robust.  Why should long-only trades outperform for S&P 500 and Nasdaq stocks but not small caps?  I’m sure imaginative types could come up with potential explanations but it makes me skeptical about the pattern since they’re all broad-based indices.  If it’s not a real pattern then perhaps I should go back to studying long and short trades together.  If the results are not satisfactory for both then perhaps I should waste no more time and move onto the next trading system concept.

This backtesting result has raised many conflicting points worthy of future discussion.  I will sleep on it with hopes of a clearer head upon awakening!

Trading System #2–Consecutive Directional Close (Part 9)

Back in http://www.optionfanatic.com/2012/11/16/trading-system-2-consecutive-directional-close-part-5, when facing an apparent sample size problem I suggested the inclusion of other broad based indices like QQQ and IWM for this trading strategy.  It now appears that the problem may not only have been small sample size but also a difference between short and long trade performance.  Today I will backtest QQQ.

I will start by backtesting x = 3, 4, 5, 6, 7 and N = 3, 4, 5, 6, 7 with a minimum total number of trades of 55 (see http://www.optionfanatic.com/2012/11/16/trading-system-2-consecutive-directional-close-part-5).  In lieu of my last post, I did include trade delays here for buy and short trades.  Here are the results as sorted by subjective function (RAR/MDD):

All systems with x = 6 or x = 7 were eliminated due to too few trades.

Compared to other backtesting done so far, these numbers are weak.  First, not all systems backtested here are profitable.  Second, all RAR/MDD numbers are in the single digits.  Third, no PF exceeds 1.60.

Certainly the results would be better with no trade delays but that is not necessarily realistic.

In http://www.optionfanatic.com/2012/11/19/trading-system-2-consecutive-directional-close-part-6, I concluded by suggesting development of the CDC system to continue with long trades only.  If I eliminate all short trades:

These numbers are an improvement.  As a check for consistency/plateau region:

I would trade x = 4 since the red curve is above the blue curve for 80% of the data points (4 out of 5).  Furthermore, with a 5-bar stop, I am in a somewhat stable area should performance be a bit better or worse than the backtested curve indicates.

In my next post, I’ll take a look at IWM.

Trading System #2–Consecutive Directional Close (Part 8)

In http://www.optionfanatic.com/2012/11/20/trading-system-2-consecutive-directional-close-part-7, I found the CDC trading system to perform better without trade delays.  Today I want to put this conclusion into perspective.

Even with trade delays, the CDC system is worth pursuing.  A mean RAR/MDD of 16.50 and mean profit factor (PF) of 1.90 are worthy enough statistics to shake a stick at.

Furthermore, while it may be necessary in some cases to trade at the next open, in other cases no trade delay will be needed.  As an example, consider the CDC system with x = 4 where four consecutive up or down closes triggers a trade.  If SPY has closed higher each of the last three days and is currently up 3% with just minutes left in the trading session then I can reasonably proceed with a trade.  SPY would have to catastrophically tank in the remaining time to close lower and I know in my experience of watching this market that such an occurrence would truly be a Black Swan.  On the other hand, if the market has been choppy all day moving back and forth between positive and negative territory then heading into the close I truly may have no clue whether the final print will be up or down.  In this case I will be forced to wait until the following open to trade.

As mentioned above, the CDC system is worth pursuing even with trade delays.  At least sometimes if not often, however, trade signals will be clear heading into the close and no delay will be needed.  I can therefore interpret these backtesting results to be better than the minimal, yet acceptable, performance numbers mentioned above (i.e. RAR/MDD > 16.50 and PF > 1.90).

Trading System #2–Consecutive Directional Close (Part 7)

Is it realistic to think trade signals for a current day can be known before the close and taken on that day to execute at the closing price?  I will discuss this in some detail at a later date.  Today, I want to incorporate trade delays with the backtesting from http://www.optionfanatic.com/2012/11/19/trading-system-2-consecutive-directional-close-part-6 and see how the results compare.

“No trade delays” means buy, sell, short, and cover signals generated at the close are immediately coupled with trades executed at the closing price.  To incorporate trade delays means opening trades (buy and short) will be taken at market open following the signal-generating close.

Here are the results of the same backtesting shown in http://www.optionfanatic.com/2012/11/19/trading-system-2-consecutive-directional-close-part-6 with buy and short trade delays included:

In bold are statistics that are better than no trade delays.  For the t test rows (bottom three), bold indicates differences for the one-tailed test at the 0.05 significance level.

These results show persistence of the tendencies seen in http://www.optionfanatic.com/2012/11/19/trading-system-2-consecutive-directional-close-part-6.  Performance is significantly better for x = 4 than x = 3 although this difference is less pronounced with the trade delays.  Long trades perform significantly better than short trades regardless of trade delays.

Finally, performance without trade delays is better than performance with trade delays at the 0.05 level of significance (one-tailed).

My next post will discuss these results in more detail.

Trading System #2–Consecutive Directional Close (Part 6)

In http://www.optionfanatic.com/2012/11/16/trading-system-2-consecutive-directional-close-part-5, I settled on x = 4 and n = 5 as a potentially viable combination with which to trade the Consecutive Directional Close (CDC) system.  The next step is to study long vs. short trade performance.

To do this, I used AmiBroker to conduct a series of backtests on the CDC system.  I set x = 3, 4 and n = 3, 4, 5, 6, 7, which generated 10 sets of performance statistics:


Note the trends in the data.  As the conditions get more extreme (higher values of x and n), total number of trades decreases and profitability generally decreases.

In order to directly compare the long trades vs. short trades, I ran a Student’s t test for independent samples.  The results are as conclusive as the table appears:

These miniscule p-values suggest statistically significantly differences between the data.

I will continue future CDC system development with long trades only.

 

Trading System #2–Consecutive Directional Close (Part 5)

As discussed in http://www.optionfanatic.com/2012/11/15/backtesting-conundrum-with-sp-500-stocks, I have not subscribed to delisted data nor does my database tag for index membership with respective time intervals.  I therefore must alter course away from backtesting S&P 500 stocks individually.

I can immediately think of three further directions for the Consecutive Directional Close (CDC) trading system.  First, I can explore elimination of the more extreme trading criteria that do not generate sufficient sample sizes.  Second, I can explore using long trades only since those seemed to perform better in Table 1 of http://www.optionfanatic.com/2012/11/01/trading-system-2-consecutive-directional-close-part-2.  Third, I can incorporate other broad-based indices like QQQ and IWM.  I will study these in order.

Eliminating the more extreme trading criteria upholds the old adage “don’t throw the baby out with the bath water.”  With an inconclusive graph like Figure 1 in http://www.optionfanatic.com/2012/11/07/trading-system-2-consecutive-directional-close-part-4, my academic background suggests scrapping the hypothesis (system) altogether to research elsewhere.  The difference here is that the data do not necessarily fail to fit the hypothesis.  Rather, due to the insufficient sample size I am unable to determine whether the data fit the hypothesis.

To draw a line in the sand, I will require at least 55 trades as a minimal sample size (I accepted 57 trades for the SPY VIX system).  Revisiting Table 1 from http://www.optionfanatic.com/2012/11/06/trading-system-2-consecutive-directional-close-part-3 then leaves me with:

These numbers look pretty good:  all 10 systems profitable with profit factors over 1.60, total number of trades in the triple digits, and Sharpe Ratios over 1.00.  Graphically, the results look like this:

The x = 4 curve is above the x = 3 curve, which corresponds to better results with more CDCs.  Furthermore, the curves are relatively flat as viewed in this logarithmic graph.  I would choose n = 5 as the middle.

In the next post, I will continue to explore other directions for the CDC system as described above.

Backtesting Conundrum with S&P 500 Stocks

I concluded http://www.optionfanatic.com/2012/11/12/position-sizing-implications-of-multiple-open-positions-part-3 by affirming the validity of backtesting S&P 500 member stocks as a proxy for results obtained with SPY.  Today I want to address two data challenges with backtesting S&P 500 member stocks.

The first challenge that must be overcome is survivorship bias.  Wikipedia explains:

> In finance, survivorship bias is the tendency for failed companies to be excluded
> from performance studies because they no longer exist. It often causes the results
> of studies to skew higher because only companies which were successful enough to
> survive until the end of the period are included.

Bankrupt companies are the quintessential example of survivorship bias.  Stocks of bankrupt companies get delisted from the exchange on their way to hitting $0.00/share.  Many bankrupt companies were once members of the S&P 500.  If I run a backtest on “S&P 500 stocks,” then bankrupt stocks are not going to be included since they are no longer in the database.

The second challenge of backtesting S&P 500 stocks is to accurately manage changes in index composition.  Stocks are added and deleted from the S&P 500 on an irregular, but not infrequent basis.  If I run a backtest on “S&P 500 stocks” then my database will look at those stocks currently in the S&P 500 folder. That folder is current as of right now but not accurate for historical dates.

To truly serve as a proxy for backtesting SPY, the individual S&P 500 stocks can therefore be used if:  1. the database includes delisted stocks to avoid survivorship bias; 2. The database tags stocks with both index membership and defined time interval(s) during which that index membership took place.