Category Archives: accuracy of forecasts

Real Estate Forecasts – 1

Nationally, housing prices peaked in 2014, as the following Case-Shiller chart shows.


The Case Shiller home price indices have been the gold standard and the focus of many forecasting efforts. A key feature is reliance on the “repeat sales method.” This uses data on properties that have sold at least twice to capture the appreciated value of each specific sales unit, holding quality constant.

The following chart shows Case-Shiller (C-S) house indexes for four MSA’s (metropolitan statistical areas) – Denver, San Francisco, Miami, and Boston.


The price “bubble” was more dramatic in some cities than others.

Forecasting Housing Prices and Housing Starts

The challenge to predictive modeling is more or less the same – how to account for a curve which initially rises, and then falls (in some cases dramatically), “stabilizes” and begins to climb again, although with increased volatility, again as long term interest rates rise. 

Volatility is a feature of housing starts, also, when compared with growth in households and the housing stock, as highlighted in the following graphic taken from an econometric analysis by San Francisco Federal Reserve analysts.

SandDfactorshousingThe fluctuations in housing starts track with drivers such as employment, energy prices, prices of construction materials, and real mortgage rates, but the short term forecasting models, including variables such as current listings and even Internet search activity, are promising.

Companies operating in this space include CoreLogic, Zillow and Moody’s Analytics. The sweet spot in all these services is to disaggregate housing price forecasts more local levels – the county level, for example.

Finally, in this survey of resources, one of the best housing and real estate blogs is Calculated Risk.

I’d like to post more on these predictive efforts, their statistical rationale, and their performance.

Also, the Federal Reserve “taper” of Quantitative Easing (QE) currently underway is impacting long term interest rates and mortgage rates.

The key question is whether the US housing market can withstand return to “normal” interest rate conditions in the next one to two years, and how that will play out.

Loess Seasonal Decomposition as a Forecasting Tool

I’ve applied something called loess decomposition to the London PM Fix gold series previously discussed in this blog.

This suggests insights missing from an application of Forecast Pro – a sort of standard in the automatic forecasting field.

Loess decomposition separates a time series into components – trend, seasonals, and residuals or remainder – based on locally weighted regression smoothing of the data.

I always wondered whether, in fact, there was a seasonal component to the monthly London PM fix time series.

Not every monthly or quarterly time series has credible seasonal components, of course.

The proof would seem to be in the pudding. If a program derives seasonal components for a time series, do those seasonal components improve forecasts? That seems to be the critical issue.

STL Decomposition

STL decomposition – seasonal trend decomposition based on loess – was proposed by Cleveland et al in an interesting-sounding publication called “The Journal of Official Statistics.” I found the citation working through the procedure for bagging exponential smoothing mentioned in the previous post.

Amazingly, there is an online resource which calculates this loess decomposition for data you input, based on a listed R routine. The citation is Wessa P., (2013), Decomposition by Loess (v1.0.2) in Free Statistics Software (v1.1.23-r7), Office for Research Development and Education, URL

Comparison of STL Decomposition and Forecast Pro Gold Price Forecasts

Here’s a typical graph comparing the forecast errors from the Forecast Pro runs with STL Decomposition.


The trend component extracted by the STL decomposition was uncomplicated and easy to forecast by linear extrapolation. I added the seasonal component to these extrapolations to get the monthly forecasts over the six month forecast horizon. Forecast Pro, on the other hand, did not signal the existence of a seasonal component in this series, and, furthermore, identified the optimal forecast model as a random walk and the optimal forecast as the last observed value.

Here is the trend component from the STL decomposition.



Potentially, there is lots more to discuss here.

For example, to establish forecasts based on the loess decomposition of the gold price outperform Forecast Pro means compiling a large number of forecast comparisons, ideally one for all possible training sets beyond a minimum number of observations required for stable calculation of the STL algorithm. That is, each training set generates somewhat different values for the trend, seasonals, and residuals with loess decomposition. And Forecast Pro needs to be run for all these possible training sets also, with forecasts compared to out-of-sample data.

While I have not gone to this extent, I have done these computations several times with good results for STL decomposition.

Also, it’s clear that loess decomposition extracts constant variance seasonals. However, the shape of these seasonals change as the training set changes. It is necessary, thus, to study whether these changes can reflect multiplicative seasonality, for series in which that type of seasonality predominates. For example, perhaps STL seasonals tend to reflect the end points of the training sets.

Bergmeir, Hyndman, and Benıtez (BHB) apply a Box Cox transformation in one of their bagged exponential smoothing methods. This is possibly another way to sidestep problems of multiplicative or hetereoskedastic seasonality. It also makes sense when one is attempting to bag a time series.

However, my explorations suggest the results of STL decomposition are quite flexible, and, in the case of this gold price series, often produce superior forecasts to results from one of the main off-the-shelf automatic forecasting programs.

I personally am going to work on including STL decomposition in my forecasting toolkit.

Bagging Exponential Smoothing Forecasts

Bergmeir, Hyndman, and Benıtez (BHB) successfully combine two powerful techniques – exponential smoothing and bagging (bootstrap aggregation) – in ground-breaking research.

I predict the forecasting system described in Bagging Exponential Smoothing Methods using STL Decomposition and Box-Cox Transformation will see wide application in business and industry forecasting.

These researchers demonstrate their algorithms for combining exponential smoothing and bagging outperform all other forecasting approaches in the M3 forecasting competition database for monthly time series, and do better than many approaches for quarterly and annual data. Furthermore, the BHB approach can be implemented with extant routines in the programming language R.

This table compares bagged exponential smoothing with other approaches on monthly data from the M3 competition.


Here BaggedETS.BC refers to a variant of the bagged exponential smoothing model which uses a Box Cox transformation of the data to reduce the variance of model disturbances, The error metrics are the symmetric mean absolute percentage error (sMAPE) and the mean absolute scaled error (MASE). These are calculated for applications of the various models to out-of-sample, holdout, or test sample data from each of 1428 monthly time series in the competition.

See the online text by Hyndman and Athanasopoulos for motivations and discussions of these error metrics.

The BHB Algorithm

In a nutshell, here is the BHB description of their algorithm.

After applying a Box-Cox transformation to the data, the series is decomposed into trend, seasonal and remainder components. The remainder component is then bootstrapped using the MBB, the trend and seasonal components are added back, and the Box-Cox transformation is inverted. In this way, we generate a random pool of similar bootstrapped time series. For each one of these bootstrapped time series, we choose a model among several exponential smoothing models, using the bias-corrected AIC. Then, point forecasts are calculated using all the different models, and the resulting forecasts are averaged.

The MBB is the moving block bootstrap. It involves random selection of blocks of the remainders or residuals, preserving the time sequence and, hence, autocorrelation structure in these residuals.

Several R routines supporting these algorithms have previously been developed by Hyndman et al. In particular, the ets routine developed by Hyndman and Khandakar fits 30 different exponential smoothing models to a time series, identifying the optimal model by an Akaike information criterion.

Some Thoughts

This research lays out an almost industrial-scale effort to extract more information for prediction purposes from time series, and at the same time to use an applied forecasting workhorse – exponential smoothing.

Exponential smoothing emerged as a forecasting technique in applied contexts in the 1950’s and 1960’s. The initial motivation was error correction from forecasts of arbitrary origin, instead of an underlying stochastic model. Only later were relationships between exponential smoothing and time series processes, such as random walks, revealed with the work of Muth and others.

The M-competitions, initially organized in the 1970’s, gave exponential smoothing a big boost, since, by some accounts, exponential smoothing “won.” This is one of the sources of the meme – simpler models beat more complex models.

Then, at the end of the 1990’s, Makridakis and others organized a penultimate M-competition which was, in fact, won by the automatic forecasting software program Forecast Pro. This program typically compares ARIMA and exponential smoothing models, picking the best model through proprietary optimization of the parameters and tests on holdout samples. As in most sales and revenue forecasting applications, the underlying data are time series.

While all this was going on, the machine learning community was ginning up new and powerful tactics, such as bagging or bootstrap aggregation. Bagging can be a powerful technique for focusing on parameter estimates which are otherwise masked by noise.

So this application and research builds interestingly on a series of efforts by Hyndman and his associates and draws in a technique that has been largely confined to machine learning and data mining.

It is really almost the first of its kind – where bagging applications to time series forecasting have been less spectacularly successful than in cross-sectional regression modeling, for example.

A future post here will go through the step-by-step of this approach using some specific and familiar time series from the M competition data.

Interest Rates – Forecasting and Hedging

A lot relating to forecasting interest rates is encoded in the original graph I put up, several posts ago, of two major interest rate series – the federal funds and the prime rates.


This chart illustrates key features of interest rate series and signals several important questions. Thus, there is relationship between a very short term rate and a longer term interest rates – a sort of two point yield curve. Almost always, the federal funds rate is below the prime rate. If for short periods this is not the case, it indicates a radical reversion of the typical slope of the yield curve.

Credit spreads are not illustrated in this figure, but have been shown to be significant in forecasting key macroeconomic variables.

The shape of the yield curve itself can be brought into play in forecasting future rates, as can typical spreads between interest rates.

But the bottom line is that interest rates cannot be forecast with much accuracy beyond about a two quarter forecast horizon.

There is quite a bit of research showing this to be true, including –

Professional Forecasts of Interest Rates and Exchange Rates: Evidence from the Wall Street Journal’s Panel of Economists

We use individual economists’ 6-month-ahead forecasts of interest rates and exchange rates from the Wall Street Journal’s survey to test for forecast unbiasedness, accuracy, and heterogeneity. We find that a majority of economists produced unbiased forecasts but that none predicted directions of changes more accurately than chance. We find that the forecast accuracy of most of the economists is statistically indistinguishable from that of the random walk model when forecasting the Treasury bill rate but that the forecast accuracy is significantly worse for many of the forecasters for predictions of the Treasury bond rate and the exchange rate. Regressions involving deviations in economists’ forecasts from forecast averages produced evidence of systematic heterogeneity across economists, including evidence that independent economists make more radical forecasts

Then, there is research from the London School of Economics Interest Rate Forecasts: A Pathology

In this paper we have demonstrated that, in the two countries and short data periods studied, the forecasts of interest rates had little or no informational value when the horizon exceeded two quarters (six months), though they were good in the next quarter and reasonable in the second quarter out. Moreover, all the forecasts were ex post and, systematically, inefficient, underestimating (overestimating) future outturns during up (down) cycle phases. The main reason for this is that forecasters cannot predict the timing of cyclical turning points, and hence predict future developments as a convex combination of autoregressive momentum and a reversion to equilibrium

Also, the Chapter in the Handbook of Forecasting Forecasting interest rates is relevant, although highly theoretical.

Hedging Interest Rate Risk

As if in validation of this basic finding – beyond about two quarters, interest rate forecasts generally do not beat a random walk forecast – interest rate swaps, are the largest category of interest rate contracts of derivatives, according to the Bank of International Settlements (BIS).


Not only that, but interest rate contracts generally are, by an order of magnitude, the largest category of OTC derivatives – totaling more than a half a quadrillion dollars as of the BIS survey in July 2013.

The gross value of these contracts was only somewhat less than the Gross Domestic Product (GDP) of the US.

A Bank of International Settlements background document defines “gross market values” as follows;

Gross positive and negative market values: Gross market values are defined as the sums of the absolute values of all open contracts with either positive or negative replacement values evaluated at market prices prevailing on the reporting date. Thus, the gross positive market value of a dealer’s outstanding contracts is the sum of the replacement values of all contracts that are in a current gain position to the reporter at current market prices (and therefore, if they were settled immediately, would represent claims on counterparties). The gross negative market value is the sum of the values of all contracts that have a negative value on the reporting date (ie those that are in a current loss position and therefore, if they were settled immediately, would represent liabilities of the dealer to its counterparties).  The term “gross” indicates that contracts with positive and negative replacement values with the same counterparty are not netted. Nor are the sums of positive and negative contract values within a market risk category such as foreign exchange contracts, interest rate contracts, equities and commodities set off against one another. As stated above, gross market values supply information about the potential scale of market risk in derivatives transactions. Furthermore, gross market value at current market prices provides a measure of economic significance that is readily comparable across markets and products.

Clearly, by any account, large sums of money and considerable exposure are tied up in interest rate contracts in the over the counter (OTC) market.

A Final Thought

This link between forecastability and financial derivatives is interesting. There is no question but that, in practical terms, business is putting eggs in the basket of managing interest rate risk, as opposed to refining forecasts – which may not be possible beyond a certain point, in any case.

What is going to happen when the quantitative easing maneuvers of central banks around the world cease, as they must, and long term interest rates rise in a consistent fashion? That’s probably where to put the forecasting money.

Credit Spreads As Predictors of Real-Time Economic Activity

Several distinguished macroeconomic researchers, including Ben Bernanke, highlight the predictive power of the “paper-bill” spread.

The following graphs, from a 1993 article by Benjamin M. Friedman and Kenneth N. Kuttner, show the promise of credit spreads in forecasting recessions – indicated by the shaded blocks in the charts.


Credit spreads, of course, are the differences in yields between various corporate debt instruments and government securities of comparable maturity.

The classic credit spread illustrated above is the difference between six-month commercial paper rates and 6 month Treasury bill rates.

Recent Research

More recent research underlines the importance of building up credit spreads from metrics relating to individual corporate bonds , rather than a mishmash of bonds with different duration, credit risk and other characteristics.

Credit Spreads as Predictors of Real-Time Economic Activity: A Bayesian Model-Averaging Approach is key research in this regard.

The authors first note that,

the “paper-bill” spread—the difference between yields on nonfinancial commercial paper and comparable-maturity Treasury bills—had substantial forecasting power for economic activity during the 1970s and the 1980s, but its predictive ability vanished in the subsequent decade

They then acknowledge that credit spreads based on indexes of speculative-grade or “junk” corporate bonds work fairly well for the 1990s, but their performance is uneven.

Accordingly, Faust, Gilchrist, Wright, and Zakrajsek (GYZ) write that

In part to address these problems, GYZ constructed 20 monthly credit spread indexes for different maturity and credit risk categories using secondary market prices of individual senior unsecured corporate bonds.. [measuring]..the underlying credit risk by the issuer’s expected default frequency (EDF™), a market-based default-risk indicator calculated by Moody’s/KMV that is more timely that the issuer’s credit rating]

Their findings indicate that these credit spread indexes have substantial predictive power, at both short- and longer-term horizons, for the growth of payroll employment and industrial production. Moreover, they significantly outperform the predictive ability of the standard default-risk indicators, a result that suggests that using “cleaner” measures of credit spreads may, indeed, lead to more accurate forecasts of economic activity.

Their research applies credit spreads constructed from the ground up, as it were, to out-of-sample forecasts of

…real economic activity, as measured by real GDP, real personal consumption expenditures (PCE), real business fixed investment, industrial production, private payroll employment, the civilian unemployment rate, real exports, and real imports over the period from 1986:Q1 to 2011:Q3. All of these series are in quarter-over-quarter growth rates (actually 400 times log first differences), except for the unemployment rate, which is simply in first differences

The results are forecasts which significantly beat univariate (autoregressive) model forecass, as shown in the following table.


Here BMA is an abbreviation for Bayesian Model Averaging, the author’s method of incorporating these calculated credit spreads in predictive relationships.

Additional research validates the usefulness of credit spreads so constructed for predicting macroeconomic dynamics in several European economies –

We find that credit spreads and excess bond premiums, when used alongside monetary policy tightness indicators and leading indicators of economic performance, are highly significant for predicting the growth in the index of industrial production, employment growth, the unemployment rate and real GDP growth at horizons ranging from one quarter to two years ahead. These results are confirmed for individual countries in the euroarea and for the United Kingdom, and are robust to different measures of the credit spread. It is the unpredictable part associated with the excess bond premium that has greater influence on real activity compared to the predictable part of the credit spread. The implications of our results are that careful selection of the bonds used to construct the credit spreads, excluding those with embedded options and or illiquid secondary markets, delivers a robust indicator of financial market tightness that is distinct from tightness due to monetary policy measures or leading indicators of economic activity.

The Situation Today

A Morgan Stanley Credit Report for fixed income, released March 21, 2014, notes that

Spreads in both IG and HY are at the lowest levels we have seen since 2007, roughly 110bp for IG and 415bp for HY. A question we are commonly asked is how much tighter can spreads go in this cycle

So this is definitely something to watch. 

Interest Rates – 2

I’ve been looking at forecasting interest rates, the accuracy of interest rate forecasts, and teasing out predictive information from the yield curve.

This literature can be intensely theoretical and statistically demanding. But it might be quickly summarized by saying that, for horizons of more than a few months, most forecasts (such as from the Wall Street Journal’s Panel of Economists) do not beat a random walk forecast.

At the same time, there are hints that improvements on a random walk forecast might be possible under special circumstances, or for periods of time.

For example, suppose we attempt to forecast the 30 year fixed mortgage rate monthly averages, picking a six month forecast horizon.

The following chart compares a random walk forecast with an autoregressive (AR) model.


Let’s dwell for a moment on some of the underlying details of the data and forecast models.

The thick red line is the 30 year fixed mortgage rate for the prediction period which extends from 2007 to the most recent monthly average in 2014 in January 2014. These mortgage rates are downloaded from the St. Louis Fed data site FRED.

This is, incidentally, an out-of-sample period, as the autoregressive model is estimated over data beginning in April 1971 and ending September 2007. The autoregressive model is simple, employing a single explanatory variable, which is the 30 year fixed rate at a lag of six months. It has the following form,

rt = k + βrt-6

where the constant term k and the coefficient β of the lagged rate rt-6 are estimated by ordinary least squares (OLS).

The random walk model forecast, as always, is the most current value projected ahead however many periods there are in the forecast horizon. This works out to using the value of the 30 year fixed mortgage in any month as the best forecast of the rate that will obtain six months in the future.

Finally, the errors for the random walk and autoregressive models are calculated as the forecast minus the actual value.

When an Autoregressive Model Beats a Random Walk Forecast

The random walk errors are smaller in absolute value than the autoregressive model errors over most of this out-of-sample period, but there are times when this is not true, as shown in the graph below.


This chart itself suggests that further work could be done on optimizing the autoregressive model, perhaps by adding further corrections from the residuals, which themselves are autocorrelated.

However, just taking this at face value, it’s clear the AR model beats the random walk forecast when the direction of interest rates changes from a downward movement.

Does this mean that going forward, an AR model, probably considerably more sophisticated than developed for this exercise, could beat a random walk forecast over six month forecast horizons?

That’s an interesting and bankable question. It of course depends on the rate at which the Fed “withdraws the punch bowl” but it’s also clear the Fed is no longer in complete control in this situation. The markets themselves will develop a dynamic based on expectations and so forth.

In closing, for reference, I include a longer picture of the 30 year fixed mortgage rates, which as can be seen, resemble the whole spectrum of rates in having a peak in the early 1980’s and showing what amounts to trends before and after that.


Forecasting the Price of Gold – 2

Searching “forecasting gold prices” on Google lands on a number of ARIMA (autoregressive integrated moving average) models of gold prices. Ideally, researchers focus on shorter term forecast horizons with this type of time series model.

I take a look at this approach here, moving onto multivariate approaches in subsequent posts.

Stylized Facts

These ARIMA models support stylized facts about gold prices such as: (1) gold prices constitute a nonstationary time series, (2) first differencing can reduce gold price time series to a stationary process, and, usually, (3) gold prices are random walks.

For example, consider daily gold prices from 1978 to the present.


This chart, based World Gold Council data and the London PM fix, shows gold prices do not fluctuate about a fixed level, but can move in patterns with a marked trend over several years.

The trick is to reduce such series to a mean stationary series through appropriate differencing and, perhaps, other data transformations, such as detrending and taking out seasonal variation. Guidance in this is provided by tools such as the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series, as well as tests for unit roots.

Some Terminology

I want to talk about specific ARIMA models, such as ARIMA(0,1,1) or ARIMA(p,d,q), so it might be a good idea to review what this means.

Quickly, ARIMA models are described by three parameters: (1) the autoregressive parameter p, (2) the number of times d the time series needs to be differenced to reduce it to a mean stationary series, and (3) the moving average parameter q.

ARIMA(0,1,1) indicates a model where the original time series yt is differenced once (d=1), and which has one lagged moving average term.

If the original time series is yt, t=1,2,..n, the first differenced series is zt=yt-yt-1, and an ARIMA(0,1,1) model looks like,

zt = θ1εt-1

or converting back into the original series yt,

yt = μ + yt-1 + θ1εt-1

This is a random walk process with a drift term μ, incidentally.

As a note in the general case, the p and q parameters describe the span of the lags and moving average terms in the model.  This is often done with backshift operators Lk (click to enlarge)  


So you could have a sum of these backshift operators of different orders operating against yt or zt to generate a series of lags of order p. Similarly a sum of backshift operators of order q can operate against the error terms at various times. This supposedly provides a compact way of representing the general model with p lags and q moving average terms.

Similar terminology can indicate the nature of seasonality, when that is operative in a time series.

These parameters are determined by considering the autocorrelation function ACF and partial autocorrelation function PACF, as well as tests for unit roots.

I’ve seen this referred to as “reading the tea leaves.”

Gold Price ARIMA models

I’ve looked over several papers on ARIMA models for gold prices, and conducted my own analysis.

My research confirms that the ACF and PACF indicates gold prices (of course, always defined as from some data source and for some trading frequency) are, in fact, random walks.

So this means that we can take, for example, the recent research of Dr. M. Massarrat Ali Khan of College of Computer Science and Information System, Institute of Business Management, Korangi Creek, Karachi as representative in developing an ARIMA model to forecast gold prices.

Dr. Massarrat’s analysis uses daily London PM fix data from January 02, 2003 to March 1, 2012, concluding that an ARIMA(0,1,1) has the best forecasting performance. This research also applies unit root tests to verify that the daily gold price series is stationary, after first differencing. Significantly, an ARIMA(1,1,0) model produced roughly similar, but somewhat inferior forecasts.

I think some of the other attempts at ARIMA analysis of gold price time series illustrate various modeling problems.

For example there is the classic over-reach of research by Australian researchers in An overview of global gold market and gold price forecasting. These academics identify the nonstationarity of gold prices, but attempt a ten year forecast, based on a modeling approach that incorporates jumps as well as standard ARIMA structure.

A new model proposed a trend stationary process to solve the nonstationary problems in previous models. The advantage of this model is that it includes the jump and dip components into the model as parameters. The behaviour of historical commodities prices includes three differ- ent components: long-term reversion, diffusion and jump/dip diffusion. The proposed model was validated with historical gold prices. The model was then applied to forecast the gold price for the next 10 years. The results indicated that, assuming the current price jump initiated in 2007 behaves in the same manner as that experienced in 1978, the gold price would stay abnormally high up to the end of 2014. After that, the price would revert to the long-term trend until 2018.

As the introductory graph shows, this forecast issued in 2009 or 2010 was massively wrong, since gold prices slumped significantly after about 2012.

So much for long-term forecasts based on univariate time series.

Summing Up

I have not referenced many ARIMA forecasting papers relating to gold price I have seen, but focused on a couple – one which “gets it right” and another which makes a heroically wrong but interesting ten year forecast.

Gold prices appear to be random walks in many frequencies – daily, monthly average, and so forth.

Attempts at superimposing long term trends or even jump patterns seem destined to failure.

However, multivariate modeling approaches, when carefully implemented, may offer some hope of disentangling longer term trends and changes in volatility. I’m working on that post now.

Flu Forecasting and Google – An Emerging Big Data Controversy

It started innocently enough, when an article in the scientific journal Nature caught my attention – When Google got flu wrong. This highlights big errors in Google flu trends in the 2012-2013 flu season.


Then digging into the backstory, I’m intrigued to find real controversy bubbling below the surface. Phrases like “big data hubris” are being thrown around, and there are insinuations Google is fudging model outcomes, at least in backtests. Beyond that, there are substantial statistical criticisms of the Google flu trends model – relating to autocorrelation and seasonality of residuals.

I’m using this post to keep track of some of the key documents and developments.

Background on Google Flu Trends

Google flu trends, launched in 2008, targets public health officials, as well as the general public.

Cutting lead-time on flu forecasts can support timely stocking and distribution of vaccines, as well as encourage health practices during critical flue months.

What’s the modeling approach?

There seem to be two official Google-sponsored reports on the underlying prediction model.

Detecting influenza epidemics using search engine query data appears in Nature in early 2009, and describes a logistic regression model estimating the probability that a random physician visit in a particular region is related to an influenza-like illness (ILI). This approach is geared to historical logs of online web search queries submitted between 2003 and 2008, and publicly available data series from the CDC’s US Influenza Sentinel Provider Surveillance Network (

The second Google report – Google Disease Trends: An Update – came out recently, in response to our algorithm overestimating influenza-like illness (ILI) and the 2013 Nature article. It mentions in passing corrections discussed in a 2011 research study, but focuses on explaining the over-estimate in peak doctor visits during the 2012-2013 flu season.

The current model, while a well performing predictor in previous years, did not do very well in the 2012-2013 flu season and significantly deviated from the source of truth, predicting substantially higher incidence of ILI than the CDC actually found in their surveys. It became clear that our algorithm was susceptible to bias in situations where searches for flu-related terms on were uncharacteristically high within a short time period. We hypothesized that concerned people were reacting to heightened media coverage, which in turn created unexpected spikes in the query volume. This assumption led to a deep investigation into the algorithm that looked for ways to insulate the model from this type of media influence

The antidote – “spike detectors” and more frequent updating.

The Google Flu Trends Still Appears Sick Report

A just-published critique –Google Flu Trends Still Appears Sick – available as a PDF download from a site at Harvard University – provides an in-depth review of the errors and failings of Google foray into predictive analytics. This latest critique of Google flu trends even raises the issue of “transparency” of the modeling approach and seems to insinuate less than impeccable honesty at Google with respect to model performance and model details.

This white paper follows the March 2014 publication of The Parable of Google Flu: Traps in Big Data Analysis in Science magazine. The Science magazine article identifies substantive statistical problems with the Google flu trends modeling, such as the fact that,

..the overestimation problem in GFT was also present in the 2011‐2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google’s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models – what the article labeled “algorithm dynamics” and “big data hubris” respectively.

Google Flu Trends Still Appears Sick follows up on the very recent science article, pointing out that the 2013-2014 flu season also shows fairly large errors, and asking –

So have these changes corrected the problem? While it is impossible to say for sure based on one subsequent season, the evidence so far does not look promising. First, the problems identified with replication in GFT appear to, if anything, have gotten worse. Second, the evidence that the problems in 2012‐2013 were due to media coverage is tenuous. While GFT engineers have shown that there was a spike in coverage during the 2012‐2013 season, it seems unlikely that this spike was larger than during the 2005‐2006 A/H5N1 (“bird flu”) outbreak and the 2009 A/H1N1 (“swine flu”) pandemic. Moreover, it does not explain why the proportional errors were so large in the 2011‐2012 season. Finally, while the changes made have dampened the propensity for overestimation by GFT, they have not eliminated the autocorrelation and seasonality problems in the data.

The white paper authors also highlight continuing concerns with Google’s transparency.

One of our main concerns about GFT is the degree to which the estimates are a product of a highly nontransparent process… GFT has not been very forthcoming with this information in the past, going so far as to release misleading example search terms in previous publications (2, 3, 8). These transparency problems have, if anything, become worse. While the data on the intensity of media coverage of flu outbreaks does not involve privacy concerns, GFT has not released this data nor have they provided an explanation of how the information was collected and utilized. This information is critically important for future uses of GFT. Scholars and practitioners in public health will need to be aware of where the information on media coverage comes from and have at least a general idea of how it is applied in order to understand how to interpret GFT estimates the next time there is a season with both high flu prevalence and high media coverage.

They conclude by stating that GFT is still ignoring data that could help it avoid future problems.

Finally, to really muddy the waters Columbia University medical researcher Jeffrey Shaman recently announced First Real-Time Flu Forecast Successful. Shaman’s model apparently keys off Google flu trends.

What Does This Mean?

I think the Google flu trends controversy is important for several reasons.

First, predictive models drawing on internet search activity and coordinated with real-time clinical information are an ambitious and potentially valuable undertaking, especially if they can provide quicker feedback on prospective ILI in specific metropolitan areas. And the Google teams involved in developing and supporting Google flu trends have been somewhat forthcoming in presenting their modeling approach and acknowledging problems that have developed.

“Somewhat” but not fully forthcoming – and that seems to be the problem. Unlike research authored by academicians or the usual scientific groups, the authors of the two main Google reports mentioned above remain difficult to reach directly, apparently. So question linger and critics start to get impatient.

And it appears that there are some standard statistical issues with the Google flu forecasts, such as autocorrelation and seasonality in residuals that remain uncorrected.

I guess I am not completely surprised, since the Google team may have come from the data mining or machine learning community, and not be sufficiently indoctrinated in the “old ways” of developing statistical models.

Craig Venter has been able to do science, and yet operate in private spaces, rather than in the government or nonprofit sector. Whether Google as a company will allow scientific protocols to be followed – as apparently clueless as these are to issues of profit or loss – remains to be seen. But if we are going to throw the concept of “data scientist” around, I guess we need to think through the whole package of stuff that goes with that.

Forecasting Gold Prices – Goldman Sachs Hits One Out of the Park

March 25, 2009, Goldman Sachs’ Commodity and Strategy Research group published Global Economics Paper No 183: Forecasting Gold as a Commodity.

This offers a fascinating overview of supply and demand in global gold markets and an immediate prediction –

This “gold as a commodity” framework suggests that gold prices have strong support at and above current price levels should the current low real interest rate environment persist. Specifically, assuming real interest rates stay near current levels and the buying from gold-ETFs slows to last year’s pace, we would expect to see gold prices stay near $930/toz over the next six months, rising to $962/toz on a 12-month horizon.

The World Gold Council maintains an interactive graph of gold prices based on the London PM fix.

GoldpriceNow, of course, the real interest rate is an inflation-adjusted nominal interest rate. It’s usually estimated as a difference between some representative interest rate and relevant rate of inflation. Thus, the real interest rates in the Goldman Sachs report is really an extrapolation from extant data provided, for example, by the US Federal Reserve FRED database.

Gratis of Paul Krugman’s New York Times blog from last August, we have this time series for real interest rates –


The graph shows that “real interest rates stay near current levels” (from spring 2009), putting the Goldman Sachs group authoring Report No 183 on record as producing one of the most successful longer term forecasts that you can find.

I’ve been collecting materials on forecasting systems for gold prices, and hope to visit that topic in coming posts here.

Three Pass Regression Filter – New Data Reduction Method

Malcolm Gladwell’s 10,000 hour rule (for cognitive mastery) is sort of an inspiration for me. I picked forecasting as my field for “cognitive mastery,” as dubious as that might be. When I am directly engaged in an assignment, at some point or other, I feel the need for immersion in the data and in estimations of all types. This blog, on the other hand, represents an effort to survey and, to some extent, get control of new “tools” – at least in a first pass. Then, when I have problems at hand, I can try some of these new techniques.

Ok, so these remarks preface what you might call the humility of my approach to new methods currently being innovated. I am not putting myself on a level with the innovators, for example. At the same time, it’s important to retain perspective and not drop a critical stance.

The Working Paper and Article in the Journal of Finance

Probably one of the most widely-cited recent working papers is Kelly and Pruitt’s three pass regression filter (3PRF). The authors, shown above, are with the University of Chicago, Booth School of Business and the Federal Reserve Board of Governors, respectively, and judging from the extensive revisions to the 2011 version, they had a bit of trouble getting this one out of the skunk works.

Recently, however, Kelly and Pruit published an important article in the prestigious Journal of Finance called Market Expectations in the Cross-Section of Present Values. This article applies a version of the three pass regression filter to show that returns and cash flow growth for the aggregate U.S. stock market are highly and robustly predictable.

I learned of a published application of the 3PRF from Francis X. Dieblod’s blog, No Hesitations, where Diebold – one of the most published authorities on forecasting – writes

Recent interesting work, moreover, extends PLS in powerful ways, as with the Kelly-Pruitt three-pass regression filter and its amazing apparent success in predicting aggregate equity returns.

What is the 3PRF?

The working paper from the Booth School of Business cited at a couple of points above describes what might be cast as a generalization of partial least squares (PLS). Certainly, the focus in the 3PRF and PLS is on using latent variables to predict some target.

I’m not sure, though, whether 3PRF is, in fact, more of a heuristic, rather than an algorithm.

What I mean is that the three pass regression filter involves a procedure, described below.

(click to enlarge).


Here’s the basic idea –

Suppose you have a large number of potential regressors xi ε X, i=1,..,N. In fact, it may be impossible to calculate an OLS regression, since N > T the number of observations or time periods.

Furthermore, you have proxies zj ε  Z, I = 1,..,L – where L is significantly less than the number of observations T. These proxies could be the first several principal components of the data matrix, or underlying drivers which theory proposes for the situation. The authors even suggest an automatic procedure for generating proxies in the paper.

And, finally, there is the target variable yt which is a column vector with T observations.

Latent factors in a matrix F drive both the proxies in Z and the predictors in X. Based on macroeconomic research into dynamic factors, there might be only a few of these latent factors – just as typically only a few principal components account for the bulk of variation in a data matrix.

Now here is a key point – as Kelly and Pruitt present the 3PRF, it is a leading indicator approach when applied to forecasting macroeconomic variables such as GDP, inflation, or the like. Thus, the time index for yt ranges from 2,3,…T+1, while the time indices of all X and Z variables and the factors range from 1,2,..T. This means really that all the x and z variables are potentially leading indicators, since they map conditions from an earlier time onto values of a target variable at a subsequent time.

What Table 1 above tells us to do is –

  1. Run an ordinary least square (OLS) regression of the xi      in X onto the zj in X, where T ranges from 1 to T and there are      N variables in X and L << T variables in Z. So, in the example      discussed below, we concoct a spreadsheet example with 3 variables in Z,      or three proxies, and 10 predictor variables xi in X (I could      have used 50, but I wanted to see whether the method worked with lower      dimensionality). The example assumes 40 periods, so t = 1,…,40. There will      be 40 different sets of coefficients of the zj as a result of      estimating these regressions with 40 matched constant terms.
  2. OK, then we take this stack of estimates of      coefficients of the zj and their associated constants and map      them onto the cross sectional slices of X for t = 1,..,T. This means that,      at each period t, the values of the cross-section. xi,t, are      taken as the dependent variable, and the independent variables are the 40      sets of coefficients (plus constant) estimated in the previous step for      period t become the predictors.
  3. Finally, we extract the estimate of the factor loadings      which results, and use these in a regression with target variable as the      dependent variable.

This is tricky, and I have questions about the symbolism in Kelly and Pruitt’s papers, but the procedure they describe does work. There is some Matlab code here alongside the reference to this paper in Professor Kelly’s research.

At the same time, all this can be short-circuited (if you have adequate data without a lot of missing values, apparently) by a single humungous formula –


Here, the source is the 2012 paper.

Spreadsheet Implementation

Spreadsheets help me understand the structure of the underlying data and the order of calculation, even if, for the most part, I work with toy examples.

So recently, I’ve been working through the 3PRF with a small spreadsheet.

Generating the factors:I generated the factors as two columns of random variables (=rand()) in Excel. I gave the factors different magnitudes by multiplying by different constants.

Generating the proxies Z and predictors X. Kelly and Pruitt call for the predictors to be variance standardized, so I generated 40 observations on ten sets of xi by selecting ten different coefficients to multiply into the two factors, and in each case I added a normal error term with mean zero and standard deviation 1. In Excel, this is the formula =norminv(rand(),0,1).

Basically, I did the same drill for the three zj — I created 40 observations for z1, z2, and z3 by multiplying three different sets of coefficients into the two factors and added a normal error term with zero mean and variance equal to 1.

Then, finally, I created yt by multiplying randomly selected coefficients times the factors.

After generating the data, the first pass regression is easy. You just develop a regression with each predictor xi as the dependent variable and the three proxies as the independent variables, case-by-case, across the time series for each. This gives you a bunch of regression coefficients which, in turn, become the explanatory variables in the cross-sectional regressions of the second step.

The regression coefficients I calculated for the three proxies, including a constant term, were as follows – where the 1st row indicates the regression for x1 and so forth.


This second step is a little tricky, but you just take all the values of the predictor variables for a particular period and designate these as the dependent variables, with the constant and coefficients estimated in the previous step as the independent variables. Note, the number of predictors pairs up exactly with the number of rows in the above coefficient matrix.

This then gives you the factor loadings for the third step, where you can actually predict yt (really yt+1 in the 3PRF setup). The only wrinkle is you don’t use the constant terms estimated in the second step, on the grounds that these reflect “idiosyncratic” effects, according to the 2011 revision of the paper.

Note the authors describe this as a time series approach, but do not indicate how to get around some of the classic pitfalls of regression in a time series context. Obviously, first differencing might be necessary for nonstationary time series like GDP, and other data massaging might be in order.

Bottom line – this worked well in my first implementation.

To forecast, I just used the last regression for yt+1 and then added ten more cases, calculating new values for the target variable with the new values of the factors. I used the new values of the predictors to update the second step estimate of factor loadings, and applied the last third pass regression to these values.

Here are the forecast errors for these ten out-of-sample cases.


Not bad for a first implementation.

 Why Is Three Pass Regression Important?

3PRF is a fairly “clean” solution to an important problem, relating to the issue of “many predictors” in macroeconomics and other business research.

Noting that if the predictors number near or more than the number of observations, the standard ordinary least squares (OLS) forecaster is known to be poorly behaved or nonexistent, the authors write,

How, then, does one effectively use vast predictive information? A solution well known in the economics literature views the data as generated from a model in which latent factors drive the systematic variation of both the forecast target, y, and the matrix of predictors, X. In this setting, the best prediction of y is infeasible since the factors are unobserved. As a result, a factor estimation step is required. The literature’s benchmark method extracts factors that are significant drivers of variation in X and then uses these to forecast y. Our procedure springs from the idea that the factors that are relevant to y may be a strict subset of all the factors driving X. Our method, called the three-pass regression filter (3PRF), selectively identifies only the subset of factors that influence the forecast target while discarding factors that are irrelevant for the target but that may be pervasive among predictors. The 3PRF has the advantage of being expressed in closed form and virtually instantaneous to compute.

So, there are several advantages, such as (1) the solution can be expressed in closed form (in fact as one complicated but easily computable matrix expression), and (2) there is no need to employ maximum likelihood estimation.

Furthermore, 3PRF may outperform other approaches, such as principal components regression or partial least squares.

The paper illustrates the forecasting performance of 3PRF with real-world examples (as well as simulations). The first relates to forecasts of macroeconomic variables using data such as from the Mark Watson database mentioned previously in this blog. The second application relates to predicting asset prices, based on a factor model that ties individual assets’ price-dividend ratios to aggregate stock market fluctuations in order to uncover investors’ discount rates and dividend growth expectations.