Category Archives: accuracy of forecasts

More on the Predictability of Stock and Bond Markets

Research by Lin, Wu, and Zhou in Predictability of Corporate Bond Returns: A Comprehensive Study suggests a radical change in perspective, based on new forecasting methods. The research seems to me to of a piece with a lot of developments in Big Data and the data mining movement generally. Gains in predictability are associated with more extensive databases and new techniques.

The abstract to their white paper, presented at various conferences and colloquia, is straight-forward –

Using a comprehensive data set, we find that corporate bond returns not only remain predictable by traditional predictors (dividend yields, default, term spreads and issuer quality) but also strongly predictable by a new predictor formed by an array of 26 macroeconomic, stock and bond predictors. Results strongly suggest that macroeconomic and stock market variables contain important information for expected corporate bond returns. The predictability of returns is of both statistical and economic significance, and is robust to different ratings and maturities.

Now, in a way, the basic message of the predictability of corporate bond returns is not news, since Fama and French made this claim back in 1989 – namely that default and term spreads can predict corporate bond returns both in and out of sample.

What is new is the data employed in the Lin, Wu, and Zhou (LWZ) research. According to the authors, it involves 780,985 monthly observations spanning from January 1973 to June 2012 from combined data sources, including Lehman Brothers Fixed Income (LBFI), Datastream, National Association of Insurance Commissioners (NAIC), Trade Reporting and Compliance Engine (TRACE) and Mergents Fixed Investment Securities Database (FISD).

There also is a new predictor which LWZ characterize as a type of partial least squares (PLS) formulation, but which is none other than the three pass regression filter discussed in a post here in March.

The power of this PLS formulation is evident in a table showing out-of-sample R2 of the various modeling setups. As in the research discussed in a recent post, out-of-sample (OS) R2 is a ratio which measures the improvement in mean square prediction errors (MSPE) for the predictive regression model over the historical average forecast. A negative OS R2 thus means that the MSPE of the benchmark forecast is less than the MSPE of the forecast by the designated predictor formulation.

PLSTableZhou

Again, this research finds predictability varies with economic conditions – and is higher during economic downturns.

There are cross-cutting and linked studies here, often with Goyal’s data and fourteen financial/macroeconomic variables figuring within the estimations. There also is significant linkage with researchers at regional Federal Reserve Banks.

My purpose in this and probably the next one or two posts is to just get this information out, so we can see the larger outlines of what is being done and suggested.

My guess is that the sum total of this research is going to essentially re-write financial economics and has huge implications for forecasting operations within large companies and especially financial institutions.

Stock Market Predictability – Controversy

In the previous post, I drew from papers by Neeley, who is Vice President of the Federal Reserve Bank of St. Louis, David Rapach at St. Louis University and Goufu Zhou at Washington University in St. Louis.

These authors contribute two papers on the predictability of equity returns.

The earlier one – Forecasting the Equity Risk Premium: The Role of Technical Indicators – is coming out in Management Science. Of course, the survey article – Forecasting the Equity Risk Premium: The Role of Technical Indicators – is a chapter in the recent volume 2 of the Handbook of Forecasting.

I go through this rather laborious set of citations because it turns out that there is an underlying paper which provides the data for the research of these authors, but which comes to precisely the opposite conclusion –

The goal of our own article is to comprehensively re-examine the empirical evidence as of early 2006, evaluating each variable using the same methods (mostly, but not only, in linear models), time-periods, and estimation frequencies. The evidence suggests that most models are unstable or even spurious. Most models are no longer significant even insample (IS), and the few models that still are usually fail simple regression diagnostics.Most models have performed poorly for over 30 years IS. For many models, any earlier apparent statistical significance was often based exclusively on years up to and especially on the years of the Oil Shock of 1973–1975. Most models have poor out-of-sample (OOS) performance, but not in a way that merely suggests lower power than IS tests. They predict poorly late in the sample, not early in the sample. (For many variables, we have difficulty finding robust statistical significance even when they are examined only during their most favorable contiguous OOS sub-period.) Finally, the OOS performance is not only a useful model diagnostic for the IS regressions but also interesting in itself for an investor who had sought to use these models for market-timing. Our evidence suggests that the models would not have helped such an investor. Therefore, although it is possible to search for, to occasionally stumble upon, and then to defend some seemingly statistically significant models, we interpret our results to suggest that a healthy skepticism is appropriate when it comes to predicting the equity premium, at least as of early 2006. The models do not seem robust.

This is from Ivo Welch and Amit Goyal’s 2008 article A Comprehensive Look at The Empirical Performance of Equity Premium Prediction in the Review of Financial Studies which apparently won an award from that journal as the best paper for the year.

And, very importantly, the data for this whole discussion is available, with updates, from Amit Goyal’s site now at the University of Lausanne.

AmitGoyal

Where This Is Going

Currently, for me, this seems like a genuine controversy in the forecasting literature. And, as an aside, in writing this blog I’ve entertained the notion that maybe I am on the edge of a new form of or focus in journalism – namely stories about forecasting controversies. It’s kind of wonkish, but the issues can be really, really important.

I also have a “hands-on” philosophy, when it comes to this sort of information. I much rather explore actual data and run my own estimates, than pick through theoretical arguments.

So anyway, given that Goyal generously provides updated versions of the data series he and Welch originally used in their Review of Financial Studies article, there should be some opportunity to check this whole matter. After all, the estimation issues are not very difficult, insofar as the first level of argument relates primarily to the efficacy of simple bivariate regressions.

By the way, it’s really cool data.

Here is the book-to-market ratio, dating back to 1926.

bmratio

But beyond these simple regressions that form a large part of the argument, there is another claim made by Neeley, Rapach, and Zhou which I take very seriously. And this is that – while a “kitchen sink” model with all, say, fourteen so-called macroeconomic variables does not outperform the benchmark, a principal components regression does.

This sounds really plausible.

Anyway, if readers have flagged updates to this controversy about the predictability of stock market returns, let me know. In addition to grubbing around with the data, I am searching for additional analysis of this point.

Evidence of Stock Market Predictability

In business forecast applications, I often have been asked, “why don’t you forecast the stock market?” It’s almost a variant of “if you’re so smart, why aren’t you rich?” I usually respond something about stock prices being largely random walks.

But, stock market predictability is really the nut kernel of forecasting, isn’t it?

Earlier this year, I looked at the S&P 500 index and the SPY ETF numbers, and found I could beat a buy and hold strategy with a regression forecasting model. This was an autoregressive model with lots of lagged values of daily S&P returns. In some variants, it included lagged values of the Chicago Board of Trade VIX volatility index returns. My portfolio gains were compiled over an out-of-sample (OS) period. This means, of course, that I estimated the predictive regression on historical data that preceded and did not include the OS or test data.

Well, today I’m here to report to you that it looks like it is officially possible to achieve some predictability of stock market returns in out-of-sample data.

One authoritative source is Forecasting Stock Returns, an outstanding review by Rapach and Zhou  in the recent, second volume of the Handbook of Economic Forecasting.

The story is fascinating.

For one thing, most of the successful models achieve their best performance – in terms of beating market averages or other common benchmarks – during recessions.

And it appears that technical market indicators, such as the oscillators, momentum, and volume metrics so common in stock trading sites, have predictive value. So do a range of macroeconomic indicators.

But these two classes of predictors – technical market and macroeconomic indicators – are roughly complementary in their performance through the business cycle. As Christopher Neeley et al detail in Forecasting the Equity Risk Premium: The Role of Technical Indicators,

Macroeconomic variables typically fail to detect the decline in the actual equity risk premium early in recessions, but generally do detect the increase in the actual equity risk premium late in recessions. Technical indicators exhibit the opposite pattern: they pick up the decline in the actual premium early in recessions, but fail to match the unusually high premium late in recessions.

Stock Market Predictors – Macroeconomic and Technical Indicators

Rapach and Zhou highlight fourteen macroeconomic predictors popular in the finance literature.

1. Log dividend-price ratio (DP): log of a 12-month moving sum of dividends paid on the S&P 500 index minus the log of stock prices (S&P 500 index).

2. Log dividend yield (DY): log of a 12-month moving sum of dividends minus the log of lagged stock prices.

3. Log earnings-price ratio (EP): log of a 12-month moving sum of earnings on the S&P 500 index minus the log of stock prices.

4. Log dividend-payout ratio (DE): log of a 12-month moving sum of dividends minus the log of a 12-month moving sum of earnings.

5. Stock variance (SVAR): monthly sum of squared daily returns on the S&P 500 index.

6. Book-to-market ratio (BM): book-to-market value ratio for the DJIA.

7. Net equity expansion (NTIS): ratio of a 12-month moving sum of net equity issues by NYSE-listed stocks to the total end-of-year market capitalization of NYSE stocks.

8. Treasury bill rate (TBL): interest rate on a three-month Treasury bill (secondary market).

9. Long-term yield (LTY): long-term government bond yield.

10. Long-term return (LTR): return on long-term government bonds.

11. Term spread (TMS): long-term yield minus the Treasury bill rate.

12. Default yield spread (DFY): difference between BAA- and AAA-rated corporate bond yields.

13. Default return spread (DFR): long-term corporate bond return minus the long-term government bond return.

14. Inflation (INFL): calculated from the CPI (all urban consumers

In addition, there are technical indicators, which are generally moving average, momentum, or volume-based.

The moving average indicators typically provide a buy or sell signal based on a comparing two moving averages – a short and a long period MA.

Momentum based rules are based on the time trajectory of prices. A current stock price higher than its level some number of periods ago indicates “positive” momentum and expected excess returns, and generates a buy signal.

Momentum rules can be combined with information about the volume of stock purchases, such as Granville’s on-balance volume.

Each of these predictors can be mapped onto equity premium excess returns – measured by the rate of return on the S&P 500 index net of return on a risk-free asset. This mapping is a simple bi-variate regression with equity returns from time t on the left side of the equation and the economic predictor lagged by one time period on the right side of the equation. Monthly data are used from 1927 to 2008. The out-of-sample (OS) period is extensive, dating from the 1950’s, and includes most of the post-war recessions.

The following table shows what the authors call out-of-sample (OS) R2 for the 14 so-called macroeconomic variables, based on a table in the Handbook of Forecasting chapter. The OS R2 is equal to 1 minus a ratio. This ratio has the mean square forecast error (MSFE) of the predictor forecast in the numerator and the MSFE of the forecast based on historic average equity returns in the denominator. So if the economic indicator functions to improve the OS forecast of equity returns, the OS R2 is positive. If, on the other hand, the historic average trumps the economic indicator forecast, the OS R2 is negative.

Rapach1

(click to enlarge).

Overall, most of the macro predictors in this list don’t make it.  Thus, 12 of the 14 OS R2 statistics are negative in the second column of the Table, indicating that the predictive regression forecast has a higher MSFE than the historical average.

For two of the predictors with a positive out-of-sample R2, the p-values reported in the brackets are greater than 0.10, so that these predictors do not display statistically significant out-of-sample performance at conventional levels.

Thus, the first two columns in this table, under “Overall”, support a skeptical view of the predictability of equity returns.

However, during recessions, the situation is different.

For several the predictors, the R2 OS statistics move from being negative (and typically below -1%) during expansions to 1% or above during recessions. Furthermore, some of these R2 OS statistics are significant at conventional levels during recessions according to the  p-values, despite the decreased number of available observations.

Now imposing restrictions on the regression coefficients substantially improves this forecast performance, as the lower panel (not shown) in this table shows.

Rapach and Zhou were coauthors of the study with Neeley, published earlier as a working paper with the St. Louis Federal Reserve.

This working paper is where we get the interesting report about how technical factors add to the predictability of equity returns (again, click to enlarge).

RapachNeeley

This table has the same headings for the columns as Table 3 above.

It shows out-of-sample forecasting results for several technical indicators, using basically the same dataset, for the overall OS period, for expansions, and recessions in this period dating from the 1950’s to 2008.

In fact, these technical indicators generally seem to do better than the 14 macroeconomic indicators.

Low OS R2

Even when these models perform their best, their increase in mean square forecast error (MSFE) is only slightly more than the MSFE of the benchmark historic average return estimate.

This improved performance, however, can still achieve portfolio gains for investors, based on various trading rules, and, as both papers point out, investors can use the information in these forecasts to balance their portfolios, even when the underlying forecast equations are not statistically significant by conventional standards. Interesting argument, and I need to review it further to fully understand it.

In any case, my experience with an autoregressive model for the S&P 500 is that trading rules can be devised which produce portfolio gains over a buy and hold strategy, even when the Ris on the order of 1 or a few percent. All you have to do is correctly predict the sign of the return on the following trading day, for instance, and doing this a little more than 50 percent of the time produces profits.

Rapach and Zhou, in fact, develop insights into how predictability of stock returns can be consistent with rational expectations – providing the relevant improvements in predictability are bounded to be low enough.

Some Thoughts

There is lots more to say about this, naturally. And I hope to have further comments here soon.

But, for the time being, I have one question.

The is why econometricians of the caliber of Rapach, Zhou, and Neeley persist in relying on tests of statistical significance which are predicated, in a strict sense, on the normality of the residuals of these financial return regressions.

I’ve looked at this some, and it seems the t-statistic is somewhat robust to violations of normality of the underlying error distribution of the regression. However, residuals of a regression on equity rates of return can be very non-normal with fat tails and generally some skewness. I keep wondering whether anyone has really looked at how this translates into tests of statistical significance, or whether what we see on this topic is mostly arm-waving.

For my money, OS predictive performance is the key criterion.

Bootstrapping

I’ve been reading about the bootstrap. I’m interested in bagging or bootstrap aggregation.

The primary task of a statistician is to summarize a sample based study and generalize the finding to the parent population in a scientific manner..

The purpose of a sample study is to gather information cheaply in a timely fashion. The idea behind bootstrap is to use the data of a sample study at hand as a “surrogate population”, for the purpose of approximating the sampling distribution of a statistic; i.e. to resample (with replacement) from the sample data at hand and create a large number of “phantom samples” known as bootstrap samples. The sample summary is then computed on each of the bootstrap samples (usually a few thousand). A histogram of the set of these computed values is referred to as the bootstrap distribution of the statistic.

These well-phrased quotes come from Bootstrap: A Statistical Method by Singh and Xie.

OK, so let’s do a simple example.

Suppose we generate ten random numbers, drawn independently from a Gaussian or normal distribution with a mean of 10 and standard deviation of 1.

vector

This sample has an average of 9.7684. We would like to somehow project a 95 percent confidence interval around this sample mean, to understand how close it is to the population average.

So we bootstrap this sample, drawing 10,000 samples of ten numbers with replacement.

Here is the distribution of bootstrapped means of these samples.

bootstrapdist

The mean is 9.7713.

Based on the method of percentiles, the 95 percent confidence interval for the sample mean is between 9.32 and 10.23, which, as you note, correctly includes the true mean for the population of 10.

Bias-correction is another primary use of the bootstrap. For techies, there is a great paper from the old Bell Labs called A Real Example That Illustrates Properties of Bootstrap Bias Correction. Unfortunately, you have to pay a fee to the American Statistical Association to read it – I have not found a free copy on the Web.

In any case, all this is interesting and a little amazing, but what we really want to do is look at the bootstrap in developing forecasting models.

Bootstrapping Regressions

There are several methods for using bootstrapping in connection with regressions.

One is illustrated in a blog post from earlier this year. I treated the explanatory variables as variables which have a degree of randomness in them, and resampled the values of the dependent variable and explanatory variables 200 times, finding that doing so “brought up” the coefficient estimates, moving them closer to the underlying actuals used in constructing or simulating them.

This method works nicely with hetereoskedastic errors, as long as there is no autocorrelation.

Another method takes the explanatory variables as fixed, and resamples only the residuals of the regression.

Bootstrapping Time Series Models

The underlying assumptions for the standard bootstrap include independent and random draws.

This can be violated in time series when there are time dependencies.

Of course, it is necessary to transform a nonstationary time series to a stationary series to even consider bootstrapping.

But even with a time series that fluctuates around a constant mean, there can be autocorrelation.

So here is where the block bootstrap can come into play. Let me cite this study – conducted under the auspices of the Cowles Foundation (click on link) – which discusses the asymptotic properties of the block bootstrap and provides key references.

There are many variants, but the basic idea is to sample blocks of a time series, probably overlapping blocks. So if a time series yt  has n elements, y1,..,yn and the block length is m, there are n-m blocks, and it is necessary to use n/m of these blocks to construct another time series of length n. Issues arise when m is not a perfect divisor of n, and it is necessary to develop special rules for handling the final values of the simulated series in that case.

Block bootstrapping is used by Bergmeir, Hyndman, and Benıtez in bagging exponential smoothing forecasts.

How Good Are Bootstrapped Estimates?

Consistency in statistics or econometrics involves whether or not an estimate or measure converges to an unbiased value as sample size increases – or basically goes to infinity.

This is a huge question with bootstrapped statistics, and there are new findings all the time.

Interestingly, sometimes bootstrapped estimates can actually converge faster to the appropriate unbiased values than can be achieved simply by increasing sample size.

And some metrics really do not lend themselves to bootstrapping.

Also some samples are inappropriate for bootstrapping.  Gelman, for example, writes about the problem of “separation” in a sample

[In} ..an example of a poll from the 1964 U.S. presidential election campaign, … none of the black respondents in the sample supported the Republican candidate, Barry Goldwater… If zero black respondents in the sample supported Barry Goldwater, then zero black respondents in any bootstrap sample will support Goldwater as well. Indeed, bootstrapping can exacerbate separation by turning near-separation into complete separation for some samples. For example, consider a survey in which only one or two of the black respondents support the Republican candidate. The resulting logistic regression estimate will be noisy but it will be finite.

Here is a video doing a good job of covering the bases on boostrapping. I suggest sampling portions of it first. It’s quite good, but it may seem too much going into it.

US Growth Stalls

The US Bureau of Economic Analysis (BEA) announced today that,

Real gross domestic product — the output of goods and services produced by labor and property located in the United States — increased at an annual rate of 0.1 percent in the first quarter (that is, from the fourth quarter of 2013 to the first quarter of 2014), according to the “advance” estimate released by the Bureau of Economic Analysis.  In the fourth quarter, real GDP increased 2.6 percent.

This flatline growth number is in stark contrast to the median forecast of 83 economists surveyed by Bloomberg, which called for a 1.2 percent increase for the first quarter.

Bloomberg writes in a confusingly titled report – Dow Hits Record as Fed Trims Stimulus as Economy Improves

The pullback in growth came as snow blanketed much of the eastern half of the country, keeping shoppers from stores, preventing builders from breaking ground and raising costs for companies including United Parcel Service Inc. Another report today showing a surge in regional manufacturing this month adds to data on retail sales, production and employment that signal a rebound is under way as temperatures warm.

Here’s is the BEA table of real GDP, along with the advanced estimates for the first quarter 2014 (click to enlarge).

usgdp

The large negative slump in investment in equipment (-5.5) indicates to me something more is going on than bad weather.

Indeed, Econbrowser notes that,

Both business fixed investment and new home construction fell in the quarter, which would be ominous developments if they’re repeated through the rest of this year. And a big drop in exports reminds us that America is not immune to weakness elsewhere in the world.

Even the 2% growth in consumption spending is not all that encouraging. As Bricklin Dwyer of BNP Paribas noted, 1.1% of that consumption growth– more than half– was attributed to higher household expenditures on health care.

What May Be Happening

I think there is some amount of “happy talk” about the US economy linked to the urgency about reducing Fed bond purchases. So just think of what might happen if the federal funds rate is still at the zero bound when another recession hits. What tools would the Fed have left? Somehow the Fed has to position itself rather quickly for the inevitable swing of the business cycle.

I have wondered, therefore, whether some of the pronouncements recently from the Fed did not have a unrealistic slant.

So, as the Fed unwinds quantitative easing (QE), dropping bond (mortgage-backed securities) purchases to zero, surely there will be further impacts on the housing markets.

Also, China is not there this time to take up the slack.

And it is always good to remember that new employment numbers are basically a lagging indicator of the business cycle.

Let’s hope for a better second and third quarter, and that this flatline growth for the first quarter is a blip.

More on Automatic Forecasting Packages – Autobox Gold Price Forecasts

Yesterday, my post discussed the statistical programming language R and Rob Hyndman’s automatic forecasting package, written in R – facts about this program, how to download it, and an application to gold prices.

In passing, I said I liked Hyndman’s disclosure of his methods in his R package and “contrasted” that with leading competitors in the automatic forecasting market space –notably Forecast Pro and Autobox.

This roused Tom Reilly, currently Senior Vice-President and CEO of Automatic Forecast Systems – the company behind Autobox.

62_tom

Reilly, shown above, wrote  –

You say that Autobox doesn’t disclose its methods.  I think that this statement is unfair to Autobox.  SAS tried this (Mike Gilliland) on the cover of his book showing something purporting to a black box.  We are a white box.  I just downloaded the GOLD prices and recreated the problem and ran it. If you open details.htm it walks you through all the steps of the modeling process.  Take a look and let me know your thoughts.  Much appreciated!

AutoBox Gold Price Forecast

First, disregarding the issue of transparency for a moment, let’s look at a comparison of forecasts for this monthly gold price series (London PM fix).

A picture tells the story (click to enlarge).

ABFPHcomp

So, for this data, 2007 to early 2011, Autobox dominates. That is, all forecasts are less than the respective actual monthly average gold prices. Thus, being linear, if one forecast method is more inaccurate than another for one month, that method is less accurate than the forecasts generated by this other approach for the entire forecast horizon.

I guess this does not surprise me. Autobox has been a serious contender in the M-competitions, for example, usually running just behind or perhaps just ahead of Forecast Pro, depending on the accuracy metric and forecast horizon. (For a history of these “accuracy contests” see Markridakis and Hibon’s article on M3).

And, of course, this is just one of many possible forecasts that can be developed with this time series, taking off from various ending points in the historic record.

The Issue of Transparency

In connection with all this, I also talked with Dave Reilly, a founding principal of Autobox, shown below.

DaveReilly

Among other things, we went over the “printout” Tom Reilly sent, which details the steps in the estimation of a final time series model to predict these gold prices.

A blog post on the Autobox site is especially pertinent, called Build or Make your own ARIMA forecasting model? This discussion contains two flow charts which describe the process of building a time series model, I reproduce here, by kind permission.

The first provides a plain vanilla description of Box-Jenkins modeling.

Rflowchart1

The second flowchart adds steps revised for additions by Tsay, Tiao, Bell, Reilly & Gregory Chow (ie chow test).

Rflowchart2

Both start with plotting the time series to be analyzed and calculating the autocorrelation and partial autocorrelation functions.

But then additional boxes are added for accounting for and removing “deterministic” elements in the time series and checking for the constancy of parameters over the sample.

The analysis run Tom Reilly sent suggests to me that “deterministic” elements can mean outliers.

Dave Reilly made an interesting point about outliers. He suggested that the true autocorrelation structure can be masked or dampened in the presence of outliers. So the tactic of specifying an intervention variable in the various trial models can facilitate identification of autoregressive lags which otherwise might appear to be statistically not significant.

Really, the point of Autobox model development is to “create an error process free of structure.” That a Dave Reilly quote.

So, bottom line, Autobox’s general methods are well-documented. There is no problem of transparency with respect to the steps in the recommended analysis in the program. True, behind the scenes, comparisons are being made and alternatives are being rejected which do not make it to the printout of results. But you can argue that any commercial software has to keep some kernel of its processes proprietary.

I expect to be writing more about Autobox. It has a good track record in various forecasting competitions and currently has a management team that actively solicits forecasting challenges.

Automatic Forecasting Programs – the Hyndman Forecast Package for R

I finally started learning R.

It’s a vector and matrix-based statistical programming language, a lot like MathWorks Matlab and GAUSS. The great thing is that it is free. I have friends and colleagues who swear by it, so it was on my to-do list.

The more immediate motivation, however, was my interest in Rob Hyndman’s automatic time series forecast package for R, described rather elegantly in an article in the Journal of Statistical Software.

This is worth looking over, even if you don’t have immediate access to R.

Hyndman and Exponential Smoothing

Hyndman, along with several others, put the final touches on a classification of exponential smoothing models, based on the state space approach. This facilitates establishing confidence intervals for exponential smoothing forecasts, for one thing, and provides further insight into the modeling options.

There are, for example, 15 widely acknowledged exponential smoothing methods, based on whether trend and seasonal components, if present, are additive or multiplicative, and also whether any trend is damped.

15expmethods

When either additive or multiplicative error processes are added to these models in a state space framewoprk, the number of modeling possibilities rises from 15 to 30.

One thing the Hyndman R Package does is run all the relevant models from this superset on any time series provided by the user, picking a recommended model for use in forecasting with the Aikaike information criterion.

Hyndman and Khandakar comment,

Forecast accuracy measures such as mean squared error (MSE) can be used for selecting a model for a given set of data, provided the errors are computed from data in a hold-out set and not from the same data as were used for model estimation. However, there are often too few out-of-sample errors to draw reliable conclusions. Consequently, a penalized method based on the in-sample  t is usually better.One such approach uses a penalized likelihood such as Akaike’s Information Criterion… We select the model that minimizes the AIC amongst all of the models that are appropriate for the data.

Interestingly,

The AIC also provides a method for selecting between the additive and multiplicative error models. The point forecasts from the two models are identical so that standard forecast accuracy measures such as the MSE or mean absolute percentage error (MAPE) are unable to select between the error types. The AIC is able to select between the error types because it is based on likelihood rather than one-step forecasts.

So the automatic forecasting algorithm, involves the following steps:

1. For each series, apply all models that are appropriate, optimizing the parameters (both smoothing parameters and the initial state variable) of the model in each case.

2. Select the best of the models according to the AIC.

3. Produce point forecasts using the best model (with optimized parameters) for as many steps ahead as required.

4. Obtain prediction intervals for the best model either using the analytical results of Hyndman et al. (2005b), or by simulating future sample paths..

This package also includes an automatic forecast module for ARIMA time series modeling.

One thing I like about Hyndman’s approach is his disclosure of methods. This, of course, is in contrast with leading competitors in the automatic forecasting market space –notably Forecast Pro and Autobox.

Certainly, go to Rob J Hyndman’s blog and website to look over the talk (with slides) Automatic time series forecasting. Hyndman’s blog, mentioned previously in the post on bagging time series, is a must-read for statisticians and data analysts.

Quick Implementation of the Hyndman R Package and a Test

But what about using this package?

Well, first you have to install R on your computer. This is pretty straight-forward, with the latest versions of the program available at the CRAN site. I downloaded it to a machine using Windows 8 as the OS. I downloaded both the 32 and 64-bit versions, just to cover my bases.

Then, it turns out that, when you launch R, a simple menu comes up with seven options, and a set of icons underneath. Below that there is the work area.

Go to the “Packages” menu option. Scroll down until you come on “forecast” and load that.

That’s the Hyndman Forecast Package for R.

So now you are ready to go, but, of course, you need to learn a little bit of R.

You can learn a lot by implementing code from the documentation for the Hyndman R package. The version corresponding to the R file that can currently be downloaded is at

http://cran.r-project.org/web/packages/forecast/forecast.pdf

Here are some general tutorials:

http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf

http://cyclismo.org/tutorial/R/

http://cran.r-project.org/doc/manuals/R-intro.html#Simple-manipulations-numbers-and-vectors

http://www.statmethods.net/

And here is a discussion of how to import data into R and then convert it to a time series – which you will need to do for the Hyndman package.

I used the exponential smoothing module to forecast monthly averages from London gold PM fix price series, comparing the results with a ForecastPro run. I utilized data from 2007 to February 2011 as a training sample, and produced forecasts for the next twelve months with both programs.

The Hyndman R package and exponential smoothing module outperformed Forecast Pro in this instance, as the following chart shows.

RFPcomp

Another positive about the R package is it is possible to write code to produce a whole number of such out-of-sample forecasts to get an idea of how the module works with a time series under different regimes, e.g. recession, business recovery.

I’m still caging together the knowledge to put programs like that together and appropriately save results.

But, my introduction to this automatic forecasting package and to R has been positive thus far.

Forecasting Controversies – Impacts of QE

Where there is smoke, there is fire, and other similar adages are suggested by an arcane statistical controversy over quantitative easing (QE) by the US Federal Reserve Bank.

Some say this Fed policy, estimated to have involved $3.7 trillion dollars in asset purchases, has been a bust, a huge waste of money, a give-away program to speculators, but of no real consequence to Main Street.

Others credit QE as the main force behind lower long term interest rates, which have supported US housing markets.

Into the fray jump two elite econometricians – Johnathan Wright of Johns Hopkins and Christopher Neeley, Vice President of the St. Louis Federal Reserve Bank.

The controversy provides an ersatz primer in estimation and forecasting issues with VAR’s (vector autoregressions). I’m not going to draw out all the nuances, but highlight the main features of the argument.

The Effect of QE Announcements From the Fed Are Transitory – Lasting Maybe Two or Three Months

Basically, there is the VAR (vector autoregression) analysis of Johnathan Wright of Johns Hopkins Univeristy, which finds that  –

..stimulative monetary policy shocks lower Treasury and corporate bond yields, but the effects die o¤ fairly fast, with an estimated half-life of about two months.

This is in a paper What does Monetary Policy do to Long-Term Interest Rates at the Zero Lower Bound? made available in PDF format dated May 2012.

More specifically, Wright finds that

Over the period since November 2008, I estimate that monetary policy shocks have a significant effect on ten-year yields and long-maturity corporate bond yields that wear o¤ over the next few months. The effect on two-year Treasury yields is very small. The initial effect on corporate bond yields is a bit more than half as large as the effect on ten-year Treasury yields. This finding is important as it shows that the news about purchases of Treasury securities had effects that were not limited to the Treasury yield curve. That is, the monetary policy shocks not only impacted Treasury rates, but were also transmitted to private yields which have a more direct bearing on economic activity. There is slight evidence of a rotation in breakeven rates from Treasury Inflation Protected Securities (TIPS), with short-term breakevens rising and long-term forward breakevens falling.

Not So, Says A Federal Reserve Vice-President

Christopher Neeley at the St. Louis Federal Reserve argues Wright’s VAR system is unstable and has poor performance in out-of-sample predictions. Hence, Wright’s conclusions cannot be accepted, and, furthermore, that there are good reasons to believe that QE has had longer term impacts than a couple of months, although these become more uncertain at longer horizons.

ChristopherNeely

Neeley’s retort is in a Federal Reserve working paper How Persistent are Monetary Policy Effects at the Zero Lower Bound?

A key passage is the following:

Specifically, although Wright’s VAR forecasts well in sample, it forecasts very poorly out-of-sample and fails structural stability tests. The instability of the VAR coefficients imply that any conclusions about the persistence of shocks are unreliable. In contrast, a naïve, no-change model out-predicts the unrestricted VAR coefficients. This suggests that a high degree of persistence is more plausible than the transience implied by Wright’s VAR. In addition to showing that the VAR system is unstable, this paper argues that transient policy effects are inconsistent with standard thinking about risk-aversion and efficient markets. That is, the transient effects estimated by Wright would create an opportunity for risk-adjusted  expected returns that greatly exceed values that are consistent with plausible risk aversion. Restricted VAR models that are consistent with reasonable risk aversion and rational asset pricing, however, forecast better than unrestricted VAR models and imply a more plausible structure. Even these restricted models, however, do not outperform naïve models OOS. Thus, the evidence supports the view that unconventional monetary policy shocks probably have fairly persistent effects on long yields but we cannot tell exactly how persistent and our uncertainty about the effects of shocks grows with the forecast horizon.

And, it’s telling, probably, that Neeley attempts to replicate Wright’s estimation of a VAR with the same data, checking the parameters, and then conducting additional tests to show that this model cannot be trusted – it’s unstable.

Pretty serious stuff.

Neeley gets some mileage out of research he conducted at the end of the 1990’s in Predictability in International Asset Returns: A Re-examination where he again called into question the longer term forecasting capability of VAR models, given their instabilities.

What is a VAR model?

We really can’t just highlight this controversy without saying a few words about VAR models.

A simple autoregressive relationship for a time series yt can be written as

yt = a1yt-1+..+anyt-n + et

Now if we have other variables (wt, zt..) and we write yt and all these other variables as equations in which the current values of these variables are functions of lagged values of all the variables.

The matrix notation is somewhat hairy, but that is a VAR. It is a system of autoregressive equations, where each variable is expressed as a linear sum of lagged terms of all the other variables.

One of the consequences of setting up a VAR is there are lots of parameters to estimate. So if p lags are important for each of three variables, each equation contains 3p parameters to estimate, so altogether you need to estimate 9p parameters – unless it is reasonable to impose certain restrictions.

Another implication is that there can be reduced form expressions for each of the variables – written only in terms of their own lagged values. This, in turn, suggests construction of impulse-response functions to see how effects propagate down the line.

Additionally, there is a whole history of Bayesian VAR’s, especially associated with the Minneapolis Federal Reserve and the University of Minnesota.

My impression is that, ultimately, VAR’s were big in the 1990’s, but did not live up to their expectations, in terms of macroeconomic forecasting. They gave way after 2000 to the Stock and Watson type of factor models. More variables could be encompassed in factor models than VAR’s, for one thing. Also, factor models often beat the naïve benchmark, while VAR’s frequently did not, at least out-of-sample.

The Naïve Benchmark

The naïve benchmark is a martingale, which often boils down to a simple random walk. The best forecast for the next period value of a martingale is the current period value.

This is the benchmark which Neeley shows the VAR model does not beat, generally speaking, in out-of-sample applications.

Naive

When the ratio is 1 or greater, this means that the mean square forecast error of the VAR is greater than the benchmark model.

Reflections

There are many fascinating details of these papers I am not highlighting. As an old Republican Congressman once said, “a billion here and a billion there, and pretty soon you are spending real money.”

So the defense of QE in this instance boils down to invalidating an analysis which suggests the impacts of QE are transitory, lasting a few months.

There is no proof, however, that QE has imparted lasting impacts on long term interest rates developed in this relatively recent research.

Forecasting Housing Markets – 3

Maybe I jumped to conclusions yesterday. Maybe, in fact, a retrospective analysis of the collapse in US housing prices in the recent 2008-2010 recession has been accomplished – but by major metropolitan area.

The Yarui Li and David Leatham paper Forecasting Housing Prices: Dynamic Factor Model versus LBVAR Model focuses on out-of-sample forecasts for house price indices for 42 metropolitan areas. Forecast models are built with data from 1980:01 to 2007:12. These models – dynamic factor and Large-scale Bayesian Vector Autoregressive (LBVAR) models – are used to generate forecasts of the one- to twelve- months ahead price growth 2008:01 to 2010:12.

Judging from the graphics and other information, the dynamic factor model (DFM) produces impressive results.

For example, here are out-of-sample forecasts of the monthly growth of housing prices (click to enlarge).

DFMhousing

The house price indices for the 42 metropolitan areas are from the Office of Federal Housing Enterprise Oversight (OFEO). The data for macroeconomic indicators in the dynamic factor and VAR models are from the DRI/McGraw Hill Basic Economics Database provided by IHS Global Insight.

I have seen forecasting models using Internet search activity which purportedly capture turning points in housing price series, but this is something different.

The claim here is that calculating dynamic or generalized principal components of some 141 macroeconomic time series can lead to forecasting models which accurately capture fluctuations in cumulative growth rates of metropolitan house price indices over a forecasting horizon of up to 12 months.

That’s pretty startling, and I for one would like to see further output of such models by city.

But where is the publication of the final paper? The PDF file linked above was presented at the Agricultural & Applied Economics Association’s 2011 Annual Meeting in Pittsburgh, Pennsylvania, July, 2011. A search under both authors does not turn up a final publication in a refereed journal, but does indicate there is great interest in this research. The presentation paper thus is available from quite a number of different sources which obligingly archive it.

Yarui

Currently, the lead author, Yarui Li, Is a Decision Tech Analyst at JPMorgan Chase, according to LinkedIn, having received her PhD from Texas A&M University in 2013. The second author is Professor at Texas A&M, most recently publishing on VAR models applied to business failure in the US.

Dynamic Principal Components

It may be that dynamic principal components are the new ingredient accounting for an uncanny capability to identify turning points in these dynamic factor model forecasts.

The key research is associated with Forni and others, who originally documented dynamic factor models in the Review of Economics and Statistics in 2000. Subsequently, there have been two further publications by Forni on this topic:

Do financial variables help forecasting inflation and real activity in the euro area?

The Generalized Dynamic Factor Model, One Sided Estimation and Forecasting

Forni and associates present this method of dynamic prinicipal componets as an alternative to the Stock and Watson factor models based on many predictors – an alternative with superior forecasting performance.

Run-of-the-mill standard principal components are, according to Li and Leatham, based on contemporaneous covariances only. So they fail to exploit the potentially crucial information contained in the leading-lagging relations between the elements of the panel.

By contrast, the Forni dynamic component approach is used in this housing price study to

obtain estimates of common and idiosyncratic variance-covariance matrices at all leads and lags as inverse Fourier transforms of the corresponding estimated spectral density matrices, and thus overcome(s)[ing] the limitation of static PCA.

There is no question but that any further discussion of this technique must go into high mathematical dudgeon, so I leave that to another time, when I have had an opportunity to make computations of my own.

However, I will say that my explorations with forecasting principal components last year have led to me to wonder whether, in fact, it may be possible to pull out some turning points from factor models based on large panels of macroeconomic data.

Forecasting Housing Markets – 2

I am interested in business forecasting “stories.” For example, the glitch in Google’s flu forecasting program.

In real estate forecasting, the obvious thing is whether quantitative forecasting models can (or, better yet, did) forecast the collapse in housing prices and starts in the recent 2008-2010 recession (see graphics from the previous post).

There are several ways of going at this.

Who Saw The Housing Bubble Coming?

One is to look back to see whether anyone saw the bursting of the housing bubble coming and what forecasting models they were consulting.

That’s entertaining. Some people, like Ron Paul, and Nouriel Roubini, were prescient.

Roubini earned the soubriquet Dr. Doom for an early prediction of housing market collapse, as reported by the New York Times:

On Sept. 7, 2006, Nouriel Roubini, an economics professor at New York University, stood before an audience of economists at the International Monetary Fund and announced that a crisis was brewing. In the coming months and years, he warned, the United States was likely to face a once-in-a-lifetime housing bust, an oil shock, sharply declining consumer confidence and, ultimately, a deep recession. He laid out a bleak sequence of events: homeowners defaulting on mortgages, trillions of dollars of mortgage-backed securities unraveling worldwide and the global financial system shuddering to a halt. These developments, he went on, could cripple or destroy hedge funds, investment banks and other major financial institutions like Fannie Mae and Freddie Mac.

NR

Roubini was spot-on, of course, even though, at the time, jokes circulated such as “even a broken clock is right twice a day.” And my guess is his forecasting model, so to speak, is presented in Crisis Economics: A Crash Course in the Future of Finance, his 2010 book with Stephen Mihm. It is less a model than whole database of tendencies, institutional facts, areas in which Roubini correctly identifies moral hazard.

I think Ron Paul, whose projections of collapse came earlier (2003), was operating from some type of libertarian economic model.  So Paul testified before House Financial Services Committee on Fannie Mae and Freddy Mac, that –

Ironically, by transferring the risk of a widespread mortgage default, the government increases the likelihood of a painful crash in the housing market,” Paul predicted. “This is because the special privileges granted to Fannie and Freddie have distorted the housing market by allowing them to attract capital they could not attract under pure market conditions. As a result, capital is diverted from its most productive use into housing. This reduces the efficacy of the entire market and thus reduces the standard of living of all Americans.

On the other hand, there is Ben Bernanke, who in a CNBC interview in 2005 said:

7/1/05 – Interview on CNBC 

INTERVIEWER: Ben, there’s been a lot of talk about a housing bubble, particularly, you know [inaudible] from all sorts of places. Can you give us your view as to whether or not there is a housing bubble out there?

BERNANKE: Well, unquestionably, housing prices are up quite a bit; I think it’s important to note that fundamentals are also very strong. We’ve got a growing economy, jobs, incomes. We’ve got very low mortgage rates. We’ve got demographics supporting housing growth. We’ve got restricted supply in some places. So it’s certainly understandable that prices would go up some. I don’t know whether prices are exactly where they should be, but I think it’s fair to say that much of what’s happened is supported by the strength of the economy.

Bernanke was backed by one of the most far-reaching economic data collection and analysis operations in the United States, since he was in 2005 a member of the Board of Governors of the Federal Reserve System and Chairman of the President’s Council of Economic Advisors.

So that’s kind of how it is. Outsiders, like Roubini and perhaps Paul, make the correct call, but highly respected and well-placed insiders like Bernanke simply cannot interpret the data at their fingertips to suggest that a massive bubble was underway.

I think it is interesting currently that Roubini, in March, promoted the idea that Yellen Is Creating another huge Bubble in the Economy

But What Are the Quantitative Models For Forecasting the Housing Market?

In a long article in the New York Times in 2009, How Did Economists Get It So Wrong?, Paul Krugman lays the problem at the feet of the efficient market hypothesis –

When it comes to the all-too-human problem of recessions and depressions, economists need to abandon the neat but wrong solution of assuming that everyone is rational and markets work perfectly.

Along these lines, it is interesting that the Zillow home value forecast methodology builds on research which, in one set of models, assumes serial correlation and mean reversion to a long-term price trend.

Zillow

Key research in housing market dynamics includes Case and Shiller (1989) and Capozza et al (2004), who show that the housing market is not efficient and house prices exhibit strong serial correlation and mean reversion, where large market swings are usually followed by reversals to the unobserved fundamental price levels.

Based on the estimated model parameters, Capozza et al are able to reveal the housing market characteristics where serial correlation, mean reversion, and oscillatory, convergent, or divergent trends can be derived from the model parameters.

Here is an abstract from critical research underlying this approach done in 2004.

An Anatomy of Price Dynamics in Illiquid Markets: Analysis and Evidence from Local Housing Markets

This research analyzes the dynamic properties of the difference equation that arises when markets exhibit serial correlation and mean reversion. We identify the correlation and reversion parameters for which prices will overshoot equilibrium (“cycles”) and/or diverge permanently from equilibrium. We then estimate the serial correlation and mean reversion coefficients from a large panel data set of 62 metro areas from 1979 to 1995 conditional on a set of economic variables that proxy for information costs, supply costs and expectations. Serial correlation is higher in metro areas with higher real incomes, population growth and real construction costs. Mean reversion is greater in large metro areas and faster growing cities with lower construction costs. The average fitted values for mean reversion and serial correlation lie in the convergent oscillatory region, but specific observations fall in both the damped and oscillatory regions and in both the convergent and divergent regions. Thus, the dynamic properties of housing markets are specific to the given time and location being considered.

The article is not available for free download so far as I can determine. But it is based on earler research, dating back to the later 1990’s in the pdf The Dynamic Structure of Housing Markets.

The more recent Housing Market Dynamics: Evidence of Mean Reversion and Downward Rigidity by Fannie Mae researchers, lists a lot of relevant research on the serial correlation of housing prices, which is usually locality-dependent.

In fact, the Zillow forecasts are based on ensemble methods, combining univariate and multivariate models – a sign of modernity in the era of Big Data.

So far, though, I have not found a truly retrospective study of the housing market collapse, based on quantitative models. Perhaps that is because only the Roubini approach works with such complex global market phenomena.

We are left, thus, with solid theoretical foundations, validated by multiple housing databases over different time periods, that suggests that people invest in housing based on momentum factors – and that this fairly obvious observation can be shown statistically, too.