Forecasting the Price of Gold – 1

I’m planning posts on forecasting the price of gold this week. This is an introductory post.

The Question of Price

What is the “price” of gold, or, rather, is there a single, integrated global gold market?

This is partly an anthropological question. Clearly in some locales, perhaps in rural India, people bring their gold jewelry to some local merchant or craftsman, and get widely varying prices. Presumably, though this merchant negotiates with a broker in a larger city of India, and trades at prices which converge to some global average. Very similar considerations apply to interest rates, which are significantly higher at pawnbrokers and so forth.

The World Gold Council uses the London PM fix, which at the time of this writing was $1,379 per troy ounce.

The Wikipedia article on gold fixing recounts the history of this twice daily price setting, dating back, with breaks for wars, to 1919.

One thing is clear, however. The “price of gold” varies with the currency unit in which it is stated. The World Gold Council, for example, supplies extensive historical data upon registering with them. Here is a chart of the monthly gold prices based on the PM or afternoon fix, dating back to 1970.


Another insight from this chart is that the price of gold may be correlated with the price of oil, which also ramped up at the end of the 1970’s and again in 2007, recovering quickly from the Great Recession in 2008-09 to surge up again by 2010-11.

But that gets ahead of our story.

The Supply and Demand for Gold

Here are two valuable tables on gold supply and demand fundamentals, based on World Gold Council sources, via an  An overview of global gold market and gold price forecasting. I’ve more to say about the forecasting model in that article, but the descriptive material is helpful (click to enlarge).

Tab1and2These tables give an idea of the main components of gold supply and demand over a several years recently.

Gold is an unusual commodity in that one of its primary demand components – jewelry – can contribute to the supply-side. Thus, gold is in some sense renewable and recyclable.

Table 1 above shows the annual supplies in this period in the last decade ran on the order of three to four thousand tonnes, where a tonne is 2,240 pounds and equal conveniently to 1000 kilograms.

Demand for jewelry is a good proportion of this annual supply, with demands by ETF’s or exchange traded funds rising rapidly in this period. The industrial and dental demand is an order of magnitude lower and steady.

One of the basic distinctions is between the monetary versus nonmonetary uses or demands for gold.

In total, central banks held about 30,000 tonnes of gold as reserves in 2008.

Another estimated 30,000 tonnes was held in inventory for industrial uses, with a whopping 100,000 tonnes being held as jewelry.

India and China constitute the largest single countries in terms of consumer holdings of gold, where it clearly functions as a store of value and hedge against uncertainty.

Gold Market Activity

In addition to actual purchases of gold, there are gold futures. The CME Group hosts a website with gold future listings. The site states,

Gold futures are hedging tools for commercial producers and users of gold. They also provide global gold price discovery and opportunities for portfolio diversification. In addition, they: Offer ongoing trading opportunities, since gold prices respond quickly to political and economic events, Serve as an alternative to investing in gold bullion, coins, and mining stocks

Some of these contracts are recorded at exchanges, but it seems the bulk of them are over-the-counter.

A study by the London Bullion Market Association estimates that 10.9bn ounces of gold, worth $15,200bn, changed hands in the first quarter of 2011 just in London’s markets. That’s 125 times the annual output of the world’s gold mines – and twice the quantity of gold that has ever been mined.

The Forecasting Problem

The forecasting problem for gold prices, accordingly, is complex. Extant series for gold prices do exist and underpin a lot of the market activity at central exchanges, but the total volume of contracts and gold exchanging hands is many times the actual physical quantity of the product. And there is a definite political dimension to gold pricing, because of the monetary uses of gold and the actions of central banks increasing and decreasing their reserves.

But the standard approaches to the forecasting problem are the same as can be witnessed in any number of other markets. These include the usual time series methods, focused around arima or autoregressive moving average models and multivariate regression models. More up-to-date tactics revolve around tests of cointegration of time series and VAR models. And, of course, one of the fundamental questions is whether gold prices in their many incarnations are best considered to be a random walk.

Flu Forecasting and Google – An Emerging Big Data Controversy

It started innocently enough, when an article in the scientific journal Nature caught my attention – When Google got flu wrong. This highlights big errors in Google flu trends in the 2012-2013 flu season.


Then digging into the backstory, I’m intrigued to find real controversy bubbling below the surface. Phrases like “big data hubris” are being thrown around, and there are insinuations Google is fudging model outcomes, at least in backtests. Beyond that, there are substantial statistical criticisms of the Google flu trends model – relating to autocorrelation and seasonality of residuals.

I’m using this post to keep track of some of the key documents and developments.

Background on Google Flu Trends

Google flu trends, launched in 2008, targets public health officials, as well as the general public.

Cutting lead-time on flu forecasts can support timely stocking and distribution of vaccines, as well as encourage health practices during critical flue months.

What’s the modeling approach?

There seem to be two official Google-sponsored reports on the underlying prediction model.

Detecting influenza epidemics using search engine query data appears in Nature in early 2009, and describes a logistic regression model estimating the probability that a random physician visit in a particular region is related to an influenza-like illness (ILI). This approach is geared to historical logs of online web search queries submitted between 2003 and 2008, and publicly available data series from the CDC’s US Influenza Sentinel Provider Surveillance Network (

The second Google report – Google Disease Trends: An Update – came out recently, in response to our algorithm overestimating influenza-like illness (ILI) and the 2013 Nature article. It mentions in passing corrections discussed in a 2011 research study, but focuses on explaining the over-estimate in peak doctor visits during the 2012-2013 flu season.

The current model, while a well performing predictor in previous years, did not do very well in the 2012-2013 flu season and significantly deviated from the source of truth, predicting substantially higher incidence of ILI than the CDC actually found in their surveys. It became clear that our algorithm was susceptible to bias in situations where searches for flu-related terms on were uncharacteristically high within a short time period. We hypothesized that concerned people were reacting to heightened media coverage, which in turn created unexpected spikes in the query volume. This assumption led to a deep investigation into the algorithm that looked for ways to insulate the model from this type of media influence

The antidote – “spike detectors” and more frequent updating.

The Google Flu Trends Still Appears Sick Report

A just-published critique –Google Flu Trends Still Appears Sick – available as a PDF download from a site at Harvard University – provides an in-depth review of the errors and failings of Google foray into predictive analytics. This latest critique of Google flu trends even raises the issue of “transparency” of the modeling approach and seems to insinuate less than impeccable honesty at Google with respect to model performance and model details.

This white paper follows the March 2014 publication of The Parable of Google Flu: Traps in Big Data Analysis in Science magazine. The Science magazine article identifies substantive statistical problems with the Google flu trends modeling, such as the fact that,

..the overestimation problem in GFT was also present in the 2011‐2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google’s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models – what the article labeled “algorithm dynamics” and “big data hubris” respectively.

Google Flu Trends Still Appears Sick follows up on the very recent science article, pointing out that the 2013-2014 flu season also shows fairly large errors, and asking –

So have these changes corrected the problem? While it is impossible to say for sure based on one subsequent season, the evidence so far does not look promising. First, the problems identified with replication in GFT appear to, if anything, have gotten worse. Second, the evidence that the problems in 2012‐2013 were due to media coverage is tenuous. While GFT engineers have shown that there was a spike in coverage during the 2012‐2013 season, it seems unlikely that this spike was larger than during the 2005‐2006 A/H5N1 (“bird flu”) outbreak and the 2009 A/H1N1 (“swine flu”) pandemic. Moreover, it does not explain why the proportional errors were so large in the 2011‐2012 season. Finally, while the changes made have dampened the propensity for overestimation by GFT, they have not eliminated the autocorrelation and seasonality problems in the data.

The white paper authors also highlight continuing concerns with Google’s transparency.

One of our main concerns about GFT is the degree to which the estimates are a product of a highly nontransparent process… GFT has not been very forthcoming with this information in the past, going so far as to release misleading example search terms in previous publications (2, 3, 8). These transparency problems have, if anything, become worse. While the data on the intensity of media coverage of flu outbreaks does not involve privacy concerns, GFT has not released this data nor have they provided an explanation of how the information was collected and utilized. This information is critically important for future uses of GFT. Scholars and practitioners in public health will need to be aware of where the information on media coverage comes from and have at least a general idea of how it is applied in order to understand how to interpret GFT estimates the next time there is a season with both high flu prevalence and high media coverage.

They conclude by stating that GFT is still ignoring data that could help it avoid future problems.

Finally, to really muddy the waters Columbia University medical researcher Jeffrey Shaman recently announced First Real-Time Flu Forecast Successful. Shaman’s model apparently keys off Google flu trends.

What Does This Mean?

I think the Google flu trends controversy is important for several reasons.

First, predictive models drawing on internet search activity and coordinated with real-time clinical information are an ambitious and potentially valuable undertaking, especially if they can provide quicker feedback on prospective ILI in specific metropolitan areas. And the Google teams involved in developing and supporting Google flu trends have been somewhat forthcoming in presenting their modeling approach and acknowledging problems that have developed.

“Somewhat” but not fully forthcoming – and that seems to be the problem. Unlike research authored by academicians or the usual scientific groups, the authors of the two main Google reports mentioned above remain difficult to reach directly, apparently. So question linger and critics start to get impatient.

And it appears that there are some standard statistical issues with the Google flu forecasts, such as autocorrelation and seasonality in residuals that remain uncorrected.

I guess I am not completely surprised, since the Google team may have come from the data mining or machine learning community, and not be sufficiently indoctrinated in the “old ways” of developing statistical models.

Craig Venter has been able to do science, and yet operate in private spaces, rather than in the government or nonprofit sector. Whether Google as a company will allow scientific protocols to be followed – as apparently clueless as these are to issues of profit or loss – remains to be seen. But if we are going to throw the concept of “data scientist” around, I guess we need to think through the whole package of stuff that goes with that.

The Worst Bear Market in History – Guest Post

This is a fascinating case study of financial aberration, authored by Bryan Taylor, Ph.D., Chief Economist, Global Financial Data.


Which country has the dubious distinction of suffering the worst bear market in history?

To answer this question, we ignore countries where the government closed down the stock exchange, leaving investors with nothing, as occurred in Russia in 1917 or Eastern European countries after World War II. We focus on stock markets that continued to operate during their equity-destroying disaster.

There is a lot of competition in this category.  Almost every major country has had a bear market in which share prices have dropped over 80%, and some countries have had drops of over 90%. The Dow Jones Industrial Average dropped 89% between 1929 and 1932, the Greek Stock market fell 92.5% between 1999 and 2012, and adjusted for inflation, Germany’s stock market fell over 97% between 1918 and 1922.

The only consolation to investors is that the maximum loss on their investment is 100%, and one country almost achieved that dubious distinction. Cyprus holds the record for the worst bear market of all time in which investors have lost over 99% of their investment! Remember, this loss isn’t for one stock, but for all the shares listed on the stock exchange.

The Cyprus Stock Exchange All Share Index hit a high of 11443 on November 29, 1999, fell to 938 by October 25, 2004, a 91.8% drop.  The index then rallied back to 5518 by October 31, 2007 before dropping to 691 on March 6, 2009.  Another rally ensued to October 20, 2009 when the index hit 2100, but collapsed from there to 91 on October 24, 2013.  The chart below makes any roller-coaster ride look boring by comparison (click to enlarge).


The fall from 11443 to 91 means that someone who invested at the top in 1999 would have lost 99.2% of their investment by 2013.  And remember, this is for ALL the shares listed on the Cyprus Stock Exchange.  By definition, some companies underperform the average and have done even worse, losing their shareholders everything.

For the people in Cyprus, this achievement only adds insult to injury.  One year ago, in March 2013, Cyprus became the fifth Euro country to have its financial system rescued by a bail-out.  At its height, the banking system’s assets were nine times the island’s GDP. As was the case in Iceland, that situation was unsustainable.

Since Germany and other paymasters for Ireland, Portugal, Spain and Greece were tired of pouring money down the bail-out drain, they demanded not only the usual austerity and reforms to put the country on the right track, but they also imposed demands on the depositors of the banks that had created the crisis, creating a “bail-in”.

As a result of the bail-in, debt holders and uninsured depositors had to absorb bank losses. Although some deposits were converted into equity, given the decline in the stock market, this provided little consolation. Banks were closed for two weeks and capital controls were imposed upon Cyprus.  Not only did depositors who had money in banks beyond the insured limit lose money, but depositors who had money in banks were restricted from withdrawing their funds. The impact on the economy has been devastating. GDP has declined by 12%, and unemployment has gone from 4% to 17%.


On the positive side, when Cyprus finally does bounce back, large profits could be made by investors and speculators.  The Cyprus SE All-Share Index is up 50% so far in 2014, and could move up further. Of course, there is no guarantee that the October 2013 will be the final low in the island’s fourteen-year bear market.  To coin a phrase, Cyprus is a nice place to visit, but you wouldn’t want to invest there.

Forecasting Gold Prices – Goldman Sachs Hits One Out of the Park

March 25, 2009, Goldman Sachs’ Commodity and Strategy Research group published Global Economics Paper No 183: Forecasting Gold as a Commodity.

This offers a fascinating overview of supply and demand in global gold markets and an immediate prediction –

This “gold as a commodity” framework suggests that gold prices have strong support at and above current price levels should the current low real interest rate environment persist. Specifically, assuming real interest rates stay near current levels and the buying from gold-ETFs slows to last year’s pace, we would expect to see gold prices stay near $930/toz over the next six months, rising to $962/toz on a 12-month horizon.

The World Gold Council maintains an interactive graph of gold prices based on the London PM fix.

GoldpriceNow, of course, the real interest rate is an inflation-adjusted nominal interest rate. It’s usually estimated as a difference between some representative interest rate and relevant rate of inflation. Thus, the real interest rates in the Goldman Sachs report is really an extrapolation from extant data provided, for example, by the US Federal Reserve FRED database.

Gratis of Paul Krugman’s New York Times blog from last August, we have this time series for real interest rates –


The graph shows that “real interest rates stay near current levels” (from spring 2009), putting the Goldman Sachs group authoring Report No 183 on record as producing one of the most successful longer term forecasts that you can find.

I’ve been collecting materials on forecasting systems for gold prices, and hope to visit that topic in coming posts here.

Forecasting – Climate Change and Infrastructure

You really have to become something like a social philosopher to enter the climate change and infrastructure discussion. I mean this several ways.

Of course, there is first the continuing issue of whether or not climate change is real, or is currently being reversed by a “pause” due to the oceans or changes in trade winds absorbing some of the increase in temperatures. So for purposes of discussion, I’m going to assume that climate change is real, and with a new El Niño this year global temperatures and a whole panoply of related weather phenomena – like major hurricanes – will come back in spades.

But then can we do anything about it? Is it possible for a developed or “mature” society to plan for an uncertain, but increasingly likely future? With this question come visions of the amazingly dysfunctional US Congress, mordantly satirized in the US TV show House of Cards.

The National Society of Professional Engineers points out that major infrastructure bills relating to funding the US highway system and water systems are coming up in Congress in 2014.

Desperately needed long-term infrastructure projects were deferred to address other national priorities or simply fell victim to the ongoing budget crisis. In fact, federal lawmakers extended the surface transportation authorization an unprecedented 10 times between 2005 and 2012, when Congress finally authorized the two-year Moving Ahead for Progress in the 21st Century Act (MAP-21). Now, with MAP-21 set to expire before the end of 2014, two of the most significant pieces of infrastructure legislation are taking center stage in Congress. The Water Resources Reform and Development Act (WRRDA) and the reauthorization of the surface transportation bill present a rare opportunity for Congress to set long-term priorities and provide needed investment in our nation’s infrastructure. Collectively, these two bills cover much, though not all, of US infrastructure. The question then becomes, can Congress overcome continuing partisan gridlock and a decades-long pattern of short-term fixes to make a meaningful commitment to the long-term needs of US infrastructure?

Yes, for sure, that is the question.

Hurricane Sandy – really by the time it hit New Jersey and New York a fierce tropical storm – wreaked havoc on Far Rockaway, flooding the New York City subway system in 2012. This gave rise to talk of sea walls after the event.  And I assume something like that is in the planning stages on drawing boards somewhere on the East Coast. But the cost of “ten story tall pilings” on which would be hinged giant gates is on the order of billions of US dollars.


I notice interesting writing coming out of California, pertaining to the smart grid and the need to extend this concept from electricity to water.

The California Energy Commission (CEC) publishes an Integrated Energy Policy Report (IEPR – pronounced eye-per) every two years, and the 2013 IEPR was just approved ..Let’s look at two climate change impacts – temperature and precipitation.  From a temperature perspective, the IEPR anticipates that as the thermometer rises, so does the demand for electricity to run AC.  San Francisco Peninsula communities that never had a need for AC will install a couple million units to deal with summer temperatures formerly confined to the Central Valley.  PG&E and municipal utilities in Northern California will notice impacts in seasonal demand for electricity in both the duration of heat waves and peak apexes during the hottest times of day.  In the southern part of the state, the demand will also grow as AC units work harder to offset hotter days. At the same time, increased temperatures decrease power plant efficiencies, whether the plant generates electricity from natural gas, solar thermal, nuclear, or geothermal.  Their cooling processes are also negatively impacted by heat waves.  Increased temperatures also impact transmission lines – reducing their efficiency and creating line sags that can trigger service disruptions. Then there’s precipitation.  Governor Jerry Brown just announced a drought emergency for the state.  A significant portion of California’s water storage system relies on the Sierra Mountains snowpack, which is frighteningly low this winter.  This snowpack supplies most of the water sourced within the state, and hydropower derived from it supplies about 15% of the state’s homegrown electricity.  A hotter climate means snowfall becomes rainfall, and it is no longer freely stored as snow that obligingly melts as temperatures rise.  It may not be as reliably scheduled for generation of hydro power as snowfalls shift to rainfalls. We may also receive less precipitation as a result of climate change – that’s a big unknown right now.  One thing is certain.  A hotter climate will require more water for agriculture – a $45 billion economy in California – to sustain crops.  And whether it is water for industrial, commercial, agricultural, or residential uses, what doesn’t fall from the skies will require electricity to pump it, transport it, desalinate it, or treat it.

Boom – A Journal of California packs more punch in discussing the “worst case”

“The choice before us is not to stop climate change,” says Jonathan Parfrey, executive director of Climate Resolve in Los Angeles. “That ship has sailed. There’s no going back. There will be impacts. The choice that’s before humanity is how bad are we going to do it to ourselves?”

So what will it be? Do you want the good news or the bad news first?

The bad news. OK.

If we choose to do nothing, the nightmare scenario plays out something like this: amid prolonged drought conditions, wildfires continuously burn across a dust-dry landscape, while potable water has become such a precious commodity that watering plants is a luxury only residents of elite, gated communities can afford. Decimated by fires, the power grid infrastructure that once distributed electricity—towers and wires—now loom as ghostly relics stripped of function. Along the coast, sea level rise has decimated beachfront properties while flooding from frequent superstorms has transformed underground systems, such as Bay Area Rapid Transit (BART), into an unintended, unmanaged sewer system..

This article goes on to the “good news” which projects a wave of innovations and green technology by 2050 to 2075 in California.

Sea Level Rise

Noone knows, at this point, the extent of the rise in sea level in coming years, and interestingly, I never seen a climate change denier also, in the same breath, deny that sea levels have been rising historically.

There are interesting resources on sea level rise, although projections of how much rise over what period are uncertain, because no one knows whether a big ice mass, such as parts of the Antarctic ice shelf are going to melt on an accelerated schedule sometime soon.

An excellent scientific summary of the sea level situation historically can be found in Understanding global sea levels: past, present and future.

Here is an overall graph of Global Mean Sea Level –


This inexorable trend has given rise to map resources which suggest coastal areas which would be underwater or adversely affected in the future by sea surges.

The New York Times’ interactive What Could Disappear suggests Boston might look like this, with a five foot rise in sea level expected by 2100


The problem, of course, is that globally populations are concentrated in coastal areas.

Also, storm surges are nonlinearly related to sea level. Thus, a one (1) foot rise in sea level could be linked with significantly more than 1 foot increases in the height of storm surges.

Longer Term Forecasts

Some years back, an interesting controversy arose over present value discounting in calculating impacts of climate change.

So, currently, the medium term forecasts of climate change impacts – sea level rises of maybe 1 to 2 feet, average temperature increases of one or two degrees, and so forth – seem roughly manageable. The problem always seems to come in the longer term – after 2100 for example in the recent National Academy of Sciences study funded, among others, by the US intelligence community.

The problem with calculating the impacts and significance of these longer term impacts today is that the present value accounting framework just makes things that far into the future almost insignificant.

Currently, for example, global output is on the order of 80 trillion dollars. Suppose we accept a discount rate of 4 percent. Then, calculating the discount factor 150 years from today, in 2154, we have 0 .003. So according to this logic, the loss of 80 trillion dollars worth of production in 2154 has a present value of about 250 billion dollars. Thus, losing an amount of output in 150 years equal to the total productive activity of the planet today is worth a mere 250 billion dollars in present value terms, or about the current GDP of Ireland.

Now I may have rounded and glossed some of the arithmetic possibly, but the point stands no matter how you make the computation.

This is totally absurd. Because as a guide to losing future output of $80 trillion dollars in a century and one half, it seems we should be willing to spend on a planetary basis more than a one-time cost of $35 per person today, when the per capita global output is on the order of $1000 per person.

So we need a better accounting framework.

Of course, there are counterarguments. For example, in 150 years, perhaps science will have discovered how to boost the carbon dioxide processing capabilities of plants, so we can have more pollution. And looking back 150 years to the era of the horse and buggy, we can see that there has been tremendous technological change.

But this is a little like waiting for the amazing “secret weapons” to be unveiled in a war you are losing.

Header photo courtesy of NASA

Geopolitical Outlook 2014

One service forecasting “staff” can provide executives and managers is a sort of list of global geopolitical risks. This is compelling only at certain times – and 2014 and maybe 2015 seem to be shaping up as one of these periods.

Just a theory, but, in my opinion, the sustained lackluster economic performance in the global economy, especially in Europe and also, by historic standards, the United States adds fuel to the fire of many conflicts. Conflict intensifies as people fight over an economic pie that is shrinking, or at least, not getting appreciably bigger, despite population growth and the arrival of new generations of young people on the scene.

Some Hotspots


First, the recent election in Thailand solved nothing, so far. The tally of results looks like it is going to take months – sustaining a kind of political vacuum after many violent protests. Economic growth is impacted, and the situation looks to be fluid.

But the big issue is whether China is going to experience significantly slower economic growth in 2014-2015, and perhaps some type of debt crisis.

For the first time, we are seeing municipal bond defaults and the run-on effects are not pretty.

The default on a bond payment by China’s Chaori Solar last week signalled a reassessment of credit risk in a market where even high-yielding debt had been seen as carrying an implicit state guarantee. On Tuesday, another solar company announced a second year of net losses, leading to a suspension of its stock and bonds on the Shanghai stock exchange and stoking fears that it, too, may default.

There are internal and external forces at work in the Chinese situation. It’s important to remember lackluster growth in Europe, one of China’s biggest customers, is bound to exert continuing downward pressure on Chinese economic growth.


Michael Pettis addresses some of these issues in his recent post Will emerging markets come back? Concluding that –

Emerging markets may well rebound strongly in the coming months, but any rebound will face the same ugly arithmetic. Ordinary households in too many countries have seen their share of total GDP plunge. Until it rebounds, the global imbalances will only remain in place, and without a global New Deal, the only alternative to weak demand will be soaring debt. Add to this continued political uncertainty, not just in the developing world but also in peripheral Europe, and it is clear that we should expect developing country woes only to get worse over the next two to three years.

Indonesia is experiencing persisting issues with the stability of its currency.


In general, economic growth in Europe is very slow, tapering to static and negative growth in key economies and the geographic periphery.

The European Commission, the executive arm of the European Union, on Tuesday forecast growth in the 28-county EU at 1.5 per cent this year and 2 per cent in 2015. But growth in the 18 euro zone countries, many of which are weighed down by high debt and lingering austerity, is forecast at only 1.2 per cent this year, up marginally from 1.1 per cent in the previous forecast, and 1.8 per cent next year.

France avoided recession by posting 0.3 % GDP in the final quarter of calendar year 2013.

Since margin of error for real GDP forecasts is on the order of +/- 2 percent, current forecasts are, in many cases, indistinguishable from a prediction of another recession.

And what could cause such a wobble?

Well, possibly increases in natural gas prices, as a result of political conflict between Russia and the west, or perhaps the outbreak of civil war in various eastern European locales?

The Ukraine

The issue of the Ukraine is intensely ideological and politicized and hard to evaluate without devolving into propaganda.

The population of the Ukraine has been in radical decline. Between 1991 and 2011 the Ukrainian population decreased by 11.8%, from 51.6 million to 45.5 million, apparently the result of very low fertility rates and high death rates. Transparency International also rates the Ukraine 144th out of 177th in terms of corruption – with 177th being worst.


“Market reforms” such as would come with an International Monetary Fund (IMF) loan package would probably cause further hardship in the industrialized eastern areas of the country.

Stratfor and other emphasize the role of certain “oligarchs” in the Ukraine, operating more or less behind the scenes. I take it these immensely rich individuals in many cases were the beneficiaries of privatization of former state enterprise assets.

The Middle East

Again, politics is supreme. Political alliances between Saudi Arabia and others seeking to overturn Assad in Syria create special conditions, for sure. The successive governments in Egypt, apparently returning to rule by a strongman, are one layer – another layer is the increasingly challenged economic condition in the country – where fuel subsidies are commonly doled out to many citizens. Israel, of course, is a focus of action and reaction, and under Netanyahu is more than ready to rattle the sword. After Iraq and Afghanistan, it seems always possible for conflict to break out in unexpected directions in this region of the world.

A situation seems to be evolving in Turkey, which I do not understand, but may be involved with corruption scandals and spillovers from conflicts not only Syria but also the Crimea.

The United States

A good part of the US TV viewing audience has watched part or all of House of Cards, the dark, intricate story of corruption and intrigue at the highest levels of the US Congress. This show reinforces the view, already widely prevalent, that US politicians are just interested in fund-raising and feathering their own nest, and that they operate more or less in callous disregard or clear antagonism to the welfare of the people at large.


This is really too bad, in a way, since more than ever the US needs people to participate in the political process.

I wonder whether the consequence of this general loss of faith in the powers that be might fall naturally into the laps of more libertarian forces in US politics. State control and policies are so odious – how about trimming back the size of the central government significantly, including its ability to engage in foreign military and espionage escapades? Shades of Ron Paul and maybe his son, Senator Rand Paul of Kentucky. 

South and Central America

Brazil snagged the Summer 2016 Olympics and is rushing to construct an ambitious number of venues around that vast country.

While the United States was absorbed in wars in the Middle East, an indigenous, socialist movement emerged in South American – centered around Venezuela and perhaps Bolivia, or Chile and Argentina. At least in Venezuela, sustaining these left governments after the charismatic leader passes from the scene is proving difficult.


Observing the ground rule that this sort of inventory has to be fairly easy, in order to be convincing – it seems that conflict is the order of the day across Africa. At the same time, the continent is moving forward, experiencing economic development, dealing with AIDS. Perhaps the currency situation in South Africa is the biggest geopolitical risk.

Bottom Line

The most optimistic take is that the outlook and risks now define a sort of interim period, perhaps lasting several years, when the level of conflict will increase at various hotspots. The endpoint, hopefully, will be the emergence of new technologies and products, new industries, which will absorb everyone in more constructive growth – perhaps growth defined ecologically, rather than merely in counting objects.

Three Pass Regression Filter – New Data Reduction Method

Malcolm Gladwell’s 10,000 hour rule (for cognitive mastery) is sort of an inspiration for me. I picked forecasting as my field for “cognitive mastery,” as dubious as that might be. When I am directly engaged in an assignment, at some point or other, I feel the need for immersion in the data and in estimations of all types. This blog, on the other hand, represents an effort to survey and, to some extent, get control of new “tools” – at least in a first pass. Then, when I have problems at hand, I can try some of these new techniques.

Ok, so these remarks preface what you might call the humility of my approach to new methods currently being innovated. I am not putting myself on a level with the innovators, for example. At the same time, it’s important to retain perspective and not drop a critical stance.

The Working Paper and Article in the Journal of Finance

Probably one of the most widely-cited recent working papers is Kelly and Pruitt’s three pass regression filter (3PRF). The authors, shown above, are with the University of Chicago, Booth School of Business and the Federal Reserve Board of Governors, respectively, and judging from the extensive revisions to the 2011 version, they had a bit of trouble getting this one out of the skunk works.

Recently, however, Kelly and Pruit published an important article in the prestigious Journal of Finance called Market Expectations in the Cross-Section of Present Values. This article applies a version of the three pass regression filter to show that returns and cash flow growth for the aggregate U.S. stock market are highly and robustly predictable.

I learned of a published application of the 3PRF from Francis X. Dieblod’s blog, No Hesitations, where Diebold – one of the most published authorities on forecasting – writes

Recent interesting work, moreover, extends PLS in powerful ways, as with the Kelly-Pruitt three-pass regression filter and its amazing apparent success in predicting aggregate equity returns.

What is the 3PRF?

The working paper from the Booth School of Business cited at a couple of points above describes what might be cast as a generalization of partial least squares (PLS). Certainly, the focus in the 3PRF and PLS is on using latent variables to predict some target.

I’m not sure, though, whether 3PRF is, in fact, more of a heuristic, rather than an algorithm.

What I mean is that the three pass regression filter involves a procedure, described below.

(click to enlarge).


Here’s the basic idea –

Suppose you have a large number of potential regressors xi ε X, i=1,..,N. In fact, it may be impossible to calculate an OLS regression, since N > T the number of observations or time periods.

Furthermore, you have proxies zj ε  Z, I = 1,..,L – where L is significantly less than the number of observations T. These proxies could be the first several principal components of the data matrix, or underlying drivers which theory proposes for the situation. The authors even suggest an automatic procedure for generating proxies in the paper.

And, finally, there is the target variable yt which is a column vector with T observations.

Latent factors in a matrix F drive both the proxies in Z and the predictors in X. Based on macroeconomic research into dynamic factors, there might be only a few of these latent factors – just as typically only a few principal components account for the bulk of variation in a data matrix.

Now here is a key point – as Kelly and Pruitt present the 3PRF, it is a leading indicator approach when applied to forecasting macroeconomic variables such as GDP, inflation, or the like. Thus, the time index for yt ranges from 2,3,…T+1, while the time indices of all X and Z variables and the factors range from 1,2,..T. This means really that all the x and z variables are potentially leading indicators, since they map conditions from an earlier time onto values of a target variable at a subsequent time.

What Table 1 above tells us to do is –

  1. Run an ordinary least square (OLS) regression of the xi      in X onto the zj in X, where T ranges from 1 to T and there are      N variables in X and L << T variables in Z. So, in the example      discussed below, we concoct a spreadsheet example with 3 variables in Z,      or three proxies, and 10 predictor variables xi in X (I could      have used 50, but I wanted to see whether the method worked with lower      dimensionality). The example assumes 40 periods, so t = 1,…,40. There will      be 40 different sets of coefficients of the zj as a result of      estimating these regressions with 40 matched constant terms.
  2. OK, then we take this stack of estimates of      coefficients of the zj and their associated constants and map      them onto the cross sectional slices of X for t = 1,..,T. This means that,      at each period t, the values of the cross-section. xi,t, are      taken as the dependent variable, and the independent variables are the 40      sets of coefficients (plus constant) estimated in the previous step for      period t become the predictors.
  3. Finally, we extract the estimate of the factor loadings      which results, and use these in a regression with target variable as the      dependent variable.

This is tricky, and I have questions about the symbolism in Kelly and Pruitt’s papers, but the procedure they describe does work. There is some Matlab code here alongside the reference to this paper in Professor Kelly’s research.

At the same time, all this can be short-circuited (if you have adequate data without a lot of missing values, apparently) by a single humungous formula –


Here, the source is the 2012 paper.

Spreadsheet Implementation

Spreadsheets help me understand the structure of the underlying data and the order of calculation, even if, for the most part, I work with toy examples.

So recently, I’ve been working through the 3PRF with a small spreadsheet.

Generating the factors:I generated the factors as two columns of random variables (=rand()) in Excel. I gave the factors different magnitudes by multiplying by different constants.

Generating the proxies Z and predictors X. Kelly and Pruitt call for the predictors to be variance standardized, so I generated 40 observations on ten sets of xi by selecting ten different coefficients to multiply into the two factors, and in each case I added a normal error term with mean zero and standard deviation 1. In Excel, this is the formula =norminv(rand(),0,1).

Basically, I did the same drill for the three zj — I created 40 observations for z1, z2, and z3 by multiplying three different sets of coefficients into the two factors and added a normal error term with zero mean and variance equal to 1.

Then, finally, I created yt by multiplying randomly selected coefficients times the factors.

After generating the data, the first pass regression is easy. You just develop a regression with each predictor xi as the dependent variable and the three proxies as the independent variables, case-by-case, across the time series for each. This gives you a bunch of regression coefficients which, in turn, become the explanatory variables in the cross-sectional regressions of the second step.

The regression coefficients I calculated for the three proxies, including a constant term, were as follows – where the 1st row indicates the regression for x1 and so forth.


This second step is a little tricky, but you just take all the values of the predictor variables for a particular period and designate these as the dependent variables, with the constant and coefficients estimated in the previous step as the independent variables. Note, the number of predictors pairs up exactly with the number of rows in the above coefficient matrix.

This then gives you the factor loadings for the third step, where you can actually predict yt (really yt+1 in the 3PRF setup). The only wrinkle is you don’t use the constant terms estimated in the second step, on the grounds that these reflect “idiosyncratic” effects, according to the 2011 revision of the paper.

Note the authors describe this as a time series approach, but do not indicate how to get around some of the classic pitfalls of regression in a time series context. Obviously, first differencing might be necessary for nonstationary time series like GDP, and other data massaging might be in order.

Bottom line – this worked well in my first implementation.

To forecast, I just used the last regression for yt+1 and then added ten more cases, calculating new values for the target variable with the new values of the factors. I used the new values of the predictors to update the second step estimate of factor loadings, and applied the last third pass regression to these values.

Here are the forecast errors for these ten out-of-sample cases.


Not bad for a first implementation.

 Why Is Three Pass Regression Important?

3PRF is a fairly “clean” solution to an important problem, relating to the issue of “many predictors” in macroeconomics and other business research.

Noting that if the predictors number near or more than the number of observations, the standard ordinary least squares (OLS) forecaster is known to be poorly behaved or nonexistent, the authors write,

How, then, does one effectively use vast predictive information? A solution well known in the economics literature views the data as generated from a model in which latent factors drive the systematic variation of both the forecast target, y, and the matrix of predictors, X. In this setting, the best prediction of y is infeasible since the factors are unobserved. As a result, a factor estimation step is required. The literature’s benchmark method extracts factors that are significant drivers of variation in X and then uses these to forecast y. Our procedure springs from the idea that the factors that are relevant to y may be a strict subset of all the factors driving X. Our method, called the three-pass regression filter (3PRF), selectively identifies only the subset of factors that influence the forecast target while discarding factors that are irrelevant for the target but that may be pervasive among predictors. The 3PRF has the advantage of being expressed in closed form and virtually instantaneous to compute.

So, there are several advantages, such as (1) the solution can be expressed in closed form (in fact as one complicated but easily computable matrix expression), and (2) there is no need to employ maximum likelihood estimation.

Furthermore, 3PRF may outperform other approaches, such as principal components regression or partial least squares.

The paper illustrates the forecasting performance of 3PRF with real-world examples (as well as simulations). The first relates to forecasts of macroeconomic variables using data such as from the Mark Watson database mentioned previously in this blog. The second application relates to predicting asset prices, based on a factor model that ties individual assets’ price-dividend ratios to aggregate stock market fluctuations in order to uncover investors’ discount rates and dividend growth expectations.

Partial Least Squares and Principal Components

I’ve run across outstanding summaries of “partial least squares” (PLS) research recently – for example Rosipal and Kramer’s Overview and Recent Advances in Partial Least Squares and the 2010 Handbook of Partial Least Squares.

Partial least squares (PLS) evolved somewhat independently from related statistical techniques, owing to what you might call family connections. The technique was first developed by Swedish statistician Herman Wold and his son, Svante Wold, who applied the method in particular to chemometrics. Rosipal and Kramer suggest that the success of PLS in chemometrics resulted in a lot of applications in other scientific areas including bioinformatics, food research, medicine, [and] pharmacology..

Someday, I want to look into “path modeling” with PLS, but for now, let’s focus on the comparison between PLS regression and principal component (PC) regression. This post develops a comparison with Matlab code and macroeconomics data from Mark Watson’s website at Princeton.

The Basic Idea Behind PC and PLS Regression

Principal component and partial least squares regression share a couple of features.

Both, for example, offer an approach or solution to the problem of “many predictors” and multicollinearity. Also, with both methods, computation is not transparent, in contrast to ordinary least squares (OLS). Both PC and PLS regression are based on iterative or looping algorithms to extract either the principal components or underlying PLS factors and factor loadings.

PC Regression

The first step in PC regression is to calculate the principal components of the data matrix X. This is a set of orthogonal (which is to say completely uncorrelated) vectors which are weighted sums of the predictor variables in X.

This is an iterative process involving transformation of the variance-covariance or correlation matrix to extract the eigenvalues and eigenvectors.

Then, the data matrix X is multiplied by the eigenvectors to obtain the new basis for the data – an orthogonal basis. Typically, the first few (the largest) eigenvalues – which explain the largest proportion of variance in X – and their associated eigenvectors are used to produce one or more principal components which are regressed onto Y. This involves a dimensionality reduction, as well as elimination of potential problems of multicollinearity.

PLS Regression

The basic idea behind PLS regression, on the other hand, is to identify latent factors which explain the variation in both Y and X, then use these factors, which typically are substantially fewer in number than k, to predict Y values.

Clearly, just as in PC regression, the acid test of the model is how it performs on out-of-sample data.

The reason why PLS regression often outperforms PC regression, thus, is that factors which explain the most variation in the data matrix may not, at the same time, explain the most variation in Y. It’s as simple as that.

Matlab example

I grabbed some data from Mark Watson’s website at Princeton — from the links to a recent paper called Generalized Shrinkage Methods for Forecasting Using Many Predictors (with James H. Stock), Journal of Business and Economic Statistics, 30:4 (2012), 481-493.Download Paper (.pdf). Download Supplement (.pdf), Download Data and Replication Files (.zip). The data include the following variables, all expressed as year-over-year (yoy) growth rates: The first variable – real GDP – is taken as the forecasting target. The time periods of all other variables are lagged one period (1 quarter) behind the quarterly values of this target variable.


Matlab makes calculation of both principal component and partial least squares regressions easy.

The command to extract principal components is

[coeff, score, latent]=princomp(X)

Here X the data matrix, and the entities in the square brackets are vectors or matrices produced by the algorithm. It’s possible to compute a principal components regression with the contents of the matrix score. Generally, the first several principal components are selected for the regression, based on the importance of a component or its associated eigenvalue in latent. The following scree chart illustrates the contribution of the first few principal components to explaining the variance in X.


The relevant command for regression in Matlab is


where b is the column vector of estimated coefficients and the first six principal components are used in place of the X predictor variables.

The Matlab command for a partial least square regresssion is

[XL,YL,XS,YS,beta] = plsregress(X,Y,ncomp)

where ncomp is the number of latent variables of components to be utilized in the regression. There are issues of interpreting the matrices and vectors in the square brackets, but I used this code –

data=xlsread(‘stock.xls’); X=data(1:47,2:79); y = data(2:48,1);

[XL,yl,XS,YS,beta] = plsregress(X,y,10); yfit = [ones(size(X,1),1) X]*beta;

lookPLS=[y yfit]; ZZ=data(48:50,2:79);newy=data(49:51,1);

new=[ones(3,1) ZZ]*beta; out=[newy new];

The bottom line is to test the estimates of the response coefficients on out-of-sample data.

The following chart shows that PLS outperforms PC, although the predictions of both are not spectacularly accurate.



There are nuances to what I have done which help explain the dominance of PLS in this situation, as well as the weakly predictive capabilities of both approaches.

First, the target variable is quarterly year-over-year growth of real US GDP. The predictor set X contains 78 other macroeconomic variables, all expressed in terms of yoy (year-over-year) percent changes.

Again, note that the time period of all the variables or observations in X are lagged one quarter from the values in Y, or the values or yoy quarterly percent growth of real US GDP.

This means that we are looking for a real, live leading indicator. Furthermore, there are plausibly common factors in the Y series shared with at least some of the X variables. For example, the percent changes of a block of variables contained in real GDP are included in X, and by inspection move very similarly with the target variable.

Other Example Applications

There are at least a couple of interesting applied papers in the Handbook of Partial Least Squares – a downloadable book in the Springer Handbooks of Computational Statistics. See –

Chapter 20 A PLS Model to Study Brand Preference: An Application to the Mobile Phone Market

Chapter 22 Modeling the Impact of Corporate Reputation on Customer Satisfaction and Loyalty Using Partial Least Squares

Another macroeconomics application from the New York Fed –

“Revisiting Useful Approaches to Data-Rich Macroeconomic Forecasting”

Finally, the software company XLStat has a nice, short video on partial least squares regression applied to a marketing example.

Links – March 7, 2014

Stuff is bursting out all over, more or less in anticipation of the spring season – or World War III, however you might like to look at it. So I offer an assortment of links to topics which are central and interesting below.

Human Longevity Inc. (HLI) Launched to Promote Healthy Aging Using Advances in Genomics and Stem Cell Therapies Craig Venter – who launched a competing private and successful effort to map the human genome – is involved with this. Could be important.

MAA Celebrates Women’s History Month In celebration of Women’s History Month, the MAA has collected photographs and brief bios of notable female mathematicians from its Women of Mathematics poster. Emma Noether shown below – “mother” of Noetherian rings and other wonderous mathematical objects.


Three Business Benefits of Cloud Computing – price, access, and security

Welcome to the Big Data Economy This is the first chapter of a new eBook that details the 4 ways the future of data is cleaner, leaner, and smarter than its storied past. Download the entire eBook, Big Data Economy, for free here

Financial Sector Ignores Ukraine, Pushing Stocks Higher From March 6, video on how the Ukraine crisis has been absorbed by the market.

Employment-Population ratio Can the Fed reverse this trend?


How to Predict the Next Revolution

…few people noticed an April 2013 blog post by British academic Richard Heeks, who is director of the University of Manchester’s Center for Development Informatics. In that post, Heeks predicted the Ukrainian revolution.

A e-government expert, Heeks devised his “Revolution 2.0” index as a toy or a learning tool. The index combines three elements: Freedom House’s Freedom on the Net scores, the International Telecommunication Union’s information and communication technology development index, and the Economist’s Democracy Index (reversed into an “Outrage Index” so that higher scores mean more plutocracy). The first component measures the degree of Internet freedom in a country, the second shows how widely Internet technology is used, and the third supplies the level of oppression.

“There are significant national differences in both the drivers to mass political protest and the ability of such protest movements to freely organize themselves online,” Heeks wrote. “Both of these combine to give us some sense of how likely ‘mass protest movements of the internet age’ are to form in any given country.”

Simply put, that means countries with little real-world democracy and a lot of online freedom stand the biggest chance of a Revolution 2.0. In April 2013, Ukraine topped Heeks’s list, closely followed by Argentina and Georgia. The Philippines, Brazil, Russia, Kenya, Nigeria, Azerbaijan and Jordan filled out the top 10.

Proletarian Robots Getting Cheaper to Exploit Good report on a Russian robot conference recently.

The Top Venture Capital Investors By Exit Activity – Which Firms See the Highest Share of IPOs?


Complete Subset Regressions

A couple of years or so ago, I analyzed a software customer satisfaction survey, focusing on larger corporate users. I had firmagraphics – specifying customer features (size, market segment) – and customer evaluation of product features and support, as well as technical training. Altogether, there were 200 questions that translated into metrics or variables, along with measures of customer satisfaction. Altogether, the survey elicited responses from about 5000 companies.

Now this is really sort of an Ur-problem for me. How do you discover relationships in this sort of data space? How do you pick out the most important variables?

Since researching this blog, I’ve learned a lot about this problem. And one of the more fascinating approaches is the recent development named complete subset regressions.

And before describing some Monte Carlo exploring this approach here, I’m pleased Elliot, Gargano, and Timmerman (EGT) validate an intuition I had with this “Ur-problem.” In the survey I mentioned above, I calculated a whole bunch of univariate regressions with customer satisfaction as the dependent variable and each questionnaire variable as the explanatory variable – sort of one step beyond calculating simple correlations. Then, it occurred to me that I might combine all these 200 simple regressions into a predictive relationship. To my surprise, EGT’s research indicates that might have worked, but not be as effective as complete subset regression.

Complete Subset Regression (CSR) Procedure

As I understand it, the idea behind CSR is you run regressions with all possible combinations of some number r less than the total number n of candidate or possible predictors. The final prediction is developed as a simple average of the forecasts from these regressions with r predictors. While some of these regressions may exhibit bias due to specification error and covariance between included and omitted variables, these biases tend to average out, when the right number r < n is selected.

So, maybe you have a database with m observations or cases on some target variable and n predictors.

And you are in the dark as to which of these n predictors or potential explanatory variables really do relate to the target variable.

That is, in a regression y = β01 x1 +…+βn xn some of the beta coefficients may in fact be zero, since there may be zero influence between the associated xi and the target variable y.

Of course, calling all the n variables xi i=1,…n “predictor variables” presupposes more than we know initially. Some of the xi could in fact be “irrelevant variables” with no influence on y.

In a nutshell, the CSR procedure involves taking all possible combinations of some subset r of the n total number of potential predictor variables in the database, and mapping or regressing all these possible combinations onto the dependent variable y. Then, for prediction, an average of the forecasts of all these regressions is often a better predictor than can be generated by other methods – such as the LASSO or bagging.

EGT offer a time series example as an empirical application. based on stock returns, quarterly from 1947-2010 and twelve (12) predictors. The authors determine that the best results are obtained with a small subset of the twelve predictors, and compare these results with ridge regression, bagging, Lasso and Bayesian Model Averaging.

The article in The Journal of Econometrics is well-worth purchasing, if you are not a subscriber. Otherwise, there is a draft in PDF format from 2012.

The combination of n things taken r at a time is n!/[(n-r)!(r!)] and increases faster than exponentially, as n increases. For large n, accordingly, it is necessary to sample from the possible set of combinations – a procedure which still can generate improvements in forecast accuracy over a “kitchen sink” regression (under circumstances further delineated below). Otherwise, you need a quantum computer to process very fat databases.

When CSR Works Best – Professor Elloitt

I had email correspondence with Professor Graham Elliott, one of the co-authors of the above-cited paper in the Journal of Econometrics.

His recommendation is that CSR works best with when there are “weak predictors” sort of buried among a superset of candidate variables,

If a few (say 3) of the variables have large coefficients such as that they result in a relatively large R-square for the prediction regression when they are all included, then CSR is not likely to be the best approach. In this case model selection has a high chance of finding a decent model, the kitchen sink model is not all that much worse (about 3/T times the variance of the residual where T is the sample size) and CSR is likely to be not that great… When there is clear evidence that a predictor should be included then it should be always included…, rather than sometimes as in our method. You will notice that in section 2.3 of the paper that we construct properties where beta is local to zero – what this math says in reality is that we mean the situation where there is very little clear evidence that any predictor is useful but we believe that some or all have some minor predictive ability (the stock market example is a clear case of this). This is the situation where we expect the method to work well. ..But at the end of the day, there is no perfect method for all situations.

I have been toying with “hidden variables” and, then, measurement error in the predictor variables in simulations that further validate Graham Elliot’s perspective that CSR works best with “weak predictors.”

Monte Carlo Simulation

Here’s the spreadsheet for a relevant simulation (click to enlarge).


It is pretty easy to understand this spreadsheet, but it may take a few seconds. It is a case of latent variables, or underlying variables disguised by measurement error.

The z values determine the y value. The z values are multiplied by the bold face numbers in the top row, added together, and then the epsilon error ε value is added to this sum of terms to get each y value. You have to associate the first bold face coefficient with the first z variable, and so forth.

At the same time, an observer only has the x values at his or her disposal to estimate a predictive relationship.

These x variables are generated by adding a Gaussian error to the corresponding value of the z variables.

Note that z5 is an irrelevant variable, since its coefficient loading is zero.

This is a measurement error situation (see the lecture notes on “measurement error in X variables” ).

The relationship with all six regressors – the so-called “kitchen-sink” regression – clearly shows a situation of “weak predictors.”

I consider all possible combinations of these 6 variables, taken 3 at a time, or 20 possible distinct combinations of regressors and resulting regressions.

In terms of the mechanics of doing this, it’s helpful to set up the following type of listing of the combinations.


Each digit in the above numbers indicates a variable to include. So 123 indicates a regression with y and x1, x2, and x3. Note that writing the combinations in this way so they look like numbers in order of increasing size can be done by a simple algorithm for any r and n.

And I can generate thousands of cases by allowing the epsilon ε values and other random errors to vary.

In the specific run above, the CSR average soundly beats the mean square error (MSE) of this full specification in forecasts over ten out-of-sample values. The MSE of the kitchen sink regression, thus, is 2,440 while the MSE of the regression specifying all six regressors is 2653. It’s also true that picking the lowest within-sample MSE among the 20 possible combinations for k = 3 does not produce a lower MSE in the out-of-sample run.

This is characteristics of results in other draws of the random elements. I hesitate to characterize the totality without further studying the requirements for the number of runs, given the variances, and so forth.

I think CSR is exciting research, and hope to learn more about these procedures and report in future posts.