# Automatic Forecasting Programs – the Hyndman Forecast Package for R

I finally started learning R.

It’s a vector and matrix-based statistical programming language, a lot like MathWorks Matlab and GAUSS. The great thing is that it is free. I have friends and colleagues who swear by it, so it was on my to-do list.

The more immediate motivation, however, was my interest in Rob Hyndman’s automatic time series forecast package for R, described rather elegantly in an article in the Journal of Statistical Software.

This is worth looking over, even if you don’t have immediate access to R.

Hyndman and Exponential Smoothing

Hyndman, along with several others, put the final touches on a classification of exponential smoothing models, based on the state space approach. This facilitates establishing confidence intervals for exponential smoothing forecasts, for one thing, and provides further insight into the modeling options.

There are, for example, 15 widely acknowledged exponential smoothing methods, based on whether trend and seasonal components, if present, are additive or multiplicative, and also whether any trend is damped.

When either additive or multiplicative error processes are added to these models in a state space framewoprk, the number of modeling possibilities rises from 15 to 30.

One thing the Hyndman R Package does is run all the relevant models from this superset on any time series provided by the user, picking a recommended model for use in forecasting with the Aikaike information criterion.

Forecast accuracy measures such as mean squared error (MSE) can be used for selecting a model for a given set of data, provided the errors are computed from data in a hold-out set and not from the same data as were used for model estimation. However, there are often too few out-of-sample errors to draw reliable conclusions. Consequently, a penalized method based on the in-sample  t is usually better.One such approach uses a penalized likelihood such as Akaike’s Information Criterion… We select the model that minimizes the AIC amongst all of the models that are appropriate for the data.

Interestingly,

The AIC also provides a method for selecting between the additive and multiplicative error models. The point forecasts from the two models are identical so that standard forecast accuracy measures such as the MSE or mean absolute percentage error (MAPE) are unable to select between the error types. The AIC is able to select between the error types because it is based on likelihood rather than one-step forecasts.

So the automatic forecasting algorithm, involves the following steps:

1. For each series, apply all models that are appropriate, optimizing the parameters (both smoothing parameters and the initial state variable) of the model in each case.

2. Select the best of the models according to the AIC.

3. Produce point forecasts using the best model (with optimized parameters) for as many steps ahead as required.

4. Obtain prediction intervals for the best model either using the analytical results of Hyndman et al. (2005b), or by simulating future sample paths..

This package also includes an automatic forecast module for ARIMA time series modeling.

One thing I like about Hyndman’s approach is his disclosure of methods. This, of course, is in contrast with leading competitors in the automatic forecasting market space –notably Forecast Pro and Autobox.

Certainly, go to Rob J Hyndman’s blog and website to look over the talk (with slides) Automatic time series forecasting. Hyndman’s blog, mentioned previously in the post on bagging time series, is a must-read for statisticians and data analysts.

Quick Implementation of the Hyndman R Package and a Test

But what about using this package?

Well, first you have to install R on your computer. This is pretty straight-forward, with the latest versions of the program available at the CRAN site. I downloaded it to a machine using Windows 8 as the OS. I downloaded both the 32 and 64-bit versions, just to cover my bases.

Then, it turns out that, when you launch R, a simple menu comes up with seven options, and a set of icons underneath. Below that there is the work area.

Go to the “Packages” menu option. Scroll down until you come on “forecast” and load that.

That’s the Hyndman Forecast Package for R.

So now you are ready to go, but, of course, you need to learn a little bit of R.

You can learn a lot by implementing code from the documentation for the Hyndman R package. The version corresponding to the R file that can currently be downloaded is at

http://cran.r-project.org/web/packages/forecast/forecast.pdf

Here are some general tutorials:

And here is a discussion of how to import data into R and then convert it to a time series – which you will need to do for the Hyndman package.

I used the exponential smoothing module to forecast monthly averages from London gold PM fix price series, comparing the results with a ForecastPro run. I utilized data from 2007 to February 2011 as a training sample, and produced forecasts for the next twelve months with both programs.

The Hyndman R package and exponential smoothing module outperformed Forecast Pro in this instance, as the following chart shows.

Another positive about the R package is it is possible to write code to produce a whole number of such out-of-sample forecasts to get an idea of how the module works with a time series under different regimes, e.g. recession, business recovery.

I’m still caging together the knowledge to put programs like that together and appropriately save results.

But, my introduction to this automatic forecasting package and to R has been positive thus far.

# Forecasting the Price of Gold – 2

Searching “forecasting gold prices” on Google lands on a number of ARIMA (autoregressive integrated moving average) models of gold prices. Ideally, researchers focus on shorter term forecast horizons with this type of time series model.

I take a look at this approach here, moving onto multivariate approaches in subsequent posts.

Stylized Facts

These ARIMA models support stylized facts about gold prices such as: (1) gold prices constitute a nonstationary time series, (2) first differencing can reduce gold price time series to a stationary process, and, usually, (3) gold prices are random walks.

For example, consider daily gold prices from 1978 to the present.

This chart, based World Gold Council data and the London PM fix, shows gold prices do not fluctuate about a fixed level, but can move in patterns with a marked trend over several years.

The trick is to reduce such series to a mean stationary series through appropriate differencing and, perhaps, other data transformations, such as detrending and taking out seasonal variation. Guidance in this is provided by tools such as the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series, as well as tests for unit roots.

Some Terminology

I want to talk about specific ARIMA models, such as ARIMA(0,1,1) or ARIMA(p,d,q), so it might be a good idea to review what this means.

Quickly, ARIMA models are described by three parameters: (1) the autoregressive parameter p, (2) the number of times d the time series needs to be differenced to reduce it to a mean stationary series, and (3) the moving average parameter q.

ARIMA(0,1,1) indicates a model where the original time series yt is differenced once (d=1), and which has one lagged moving average term.

If the original time series is yt, t=1,2,..n, the first differenced series is zt=yt-yt-1, and an ARIMA(0,1,1) model looks like,

zt = θ1εt-1

or converting back into the original series yt,

yt = μ + yt-1 + θ1εt-1

This is a random walk process with a drift term μ, incidentally.

As a note in the general case, the p and q parameters describe the span of the lags and moving average terms in the model.  This is often done with backshift operators Lk (click to enlarge)

So you could have a sum of these backshift operators of different orders operating against yt or zt to generate a series of lags of order p. Similarly a sum of backshift operators of order q can operate against the error terms at various times. This supposedly provides a compact way of representing the general model with p lags and q moving average terms.

Similar terminology can indicate the nature of seasonality, when that is operative in a time series.

These parameters are determined by considering the autocorrelation function ACF and partial autocorrelation function PACF, as well as tests for unit roots.

I’ve seen this referred to as “reading the tea leaves.”

Gold Price ARIMA models

I’ve looked over several papers on ARIMA models for gold prices, and conducted my own analysis.

My research confirms that the ACF and PACF indicates gold prices (of course, always defined as from some data source and for some trading frequency) are, in fact, random walks.

So this means that we can take, for example, the recent research of Dr. M. Massarrat Ali Khan of College of Computer Science and Information System, Institute of Business Management, Korangi Creek, Karachi as representative in developing an ARIMA model to forecast gold prices.

Dr. Massarrat’s analysis uses daily London PM fix data from January 02, 2003 to March 1, 2012, concluding that an ARIMA(0,1,1) has the best forecasting performance. This research also applies unit root tests to verify that the daily gold price series is stationary, after first differencing. Significantly, an ARIMA(1,1,0) model produced roughly similar, but somewhat inferior forecasts.

I think some of the other attempts at ARIMA analysis of gold price time series illustrate various modeling problems.

For example there is the classic over-reach of research by Australian researchers in An overview of global gold market and gold price forecasting. These academics identify the nonstationarity of gold prices, but attempt a ten year forecast, based on a modeling approach that incorporates jumps as well as standard ARIMA structure.

A new model proposed a trend stationary process to solve the nonstationary problems in previous models. The advantage of this model is that it includes the jump and dip components into the model as parameters. The behaviour of historical commodities prices includes three differ- ent components: long-term reversion, diffusion and jump/dip diffusion. The proposed model was validated with historical gold prices. The model was then applied to forecast the gold price for the next 10 years. The results indicated that, assuming the current price jump initiated in 2007 behaves in the same manner as that experienced in 1978, the gold price would stay abnormally high up to the end of 2014. After that, the price would revert to the long-term trend until 2018.

As the introductory graph shows, this forecast issued in 2009 or 2010 was massively wrong, since gold prices slumped significantly after about 2012.

So much for long-term forecasts based on univariate time series.

Summing Up

I have not referenced many ARIMA forecasting papers relating to gold price I have seen, but focused on a couple – one which “gets it right” and another which makes a heroically wrong but interesting ten year forecast.

Gold prices appear to be random walks in many frequencies – daily, monthly average, and so forth.

Attempts at superimposing long term trends or even jump patterns seem destined to failure.

However, multivariate modeling approaches, when carefully implemented, may offer some hope of disentangling longer term trends and changes in volatility. I’m working on that post now.

# Forecasting Gold Prices – Goldman Sachs Hits One Out of the Park

March 25, 2009, Goldman Sachs’ Commodity and Strategy Research group published Global Economics Paper No 183: Forecasting Gold as a Commodity.

This offers a fascinating overview of supply and demand in global gold markets and an immediate prediction –

This “gold as a commodity” framework suggests that gold prices have strong support at and above current price levels should the current low real interest rate environment persist. Specifically, assuming real interest rates stay near current levels and the buying from gold-ETFs slows to last year’s pace, we would expect to see gold prices stay near \$930/toz over the next six months, rising to \$962/toz on a 12-month horizon.

The World Gold Council maintains an interactive graph of gold prices based on the London PM fix.

Now, of course, the real interest rate is an inflation-adjusted nominal interest rate. It’s usually estimated as a difference between some representative interest rate and relevant rate of inflation. Thus, the real interest rates in the Goldman Sachs report is really an extrapolation from extant data provided, for example, by the US Federal Reserve FRED database.

Gratis of Paul Krugman’s New York Times blog from last August, we have this time series for real interest rates –

The graph shows that “real interest rates stay near current levels” (from spring 2009), putting the Goldman Sachs group authoring Report No 183 on record as producing one of the most successful longer term forecasts that you can find.

I’ve been collecting materials on forecasting systems for gold prices, and hope to visit that topic in coming posts here.