The Arc Sine Law and Competitions

There is a topic I think you can call the “structure of randomness.” Power laws are included, as are various “arcsine laws” governing the probability of leads and changes in scores in competitive games and, of course, in winnings from gambling.

I ran onto a recent article showing how basketball scores follow arcsine laws.

Safe Leads and Lead Changes in Competitive Team Sports is based on comprehensive data from league games over several seasons in the National Basketball Association (NBA).

“..we find that many …statistical properties are explained by modeling the evolution of the lead time X as a simple random walk. More strikingly, seemingly unrelated properties of lead statistics, specifically, the distribution of the times t: (i) for which one team is leading..(ii) for the last lead change..(and (iii) when the maximal lead occurs, are all described by the ..celebrated arcsine law..”

The chart below shows the arcsine probability distribution function (PDF). This probability curve is almost the opposite or reverse of the widely known normal probability distribution. Instead of a bell-shape with a maximum probability in the middle, the arcsine distribution has the unusual property that probabilities are greatest at the lower and upper bounds of the range. Of course, what makes both curves probability distributions is that the area they span adds up to 1.

arcsine

So, apparently, the distribution of time that a basketball team holds a lead in a basketball game is well-described by the arcsine distribution. This means lead changes are most likely at the beginning and end of the game, and least likely in the middle.

An earlier piece in the Financial Analysts Journal (The Arc Sine Law and the Treasure Bill Futures Market) notes,

..when two sports teams play, even though they have equal ability, the arc sine law dictates that one team will probably be in the lead most of the game. But the law also says that games with a close final score are surprisingly likely to be “last minute, come from behind” affairs, in which the ultimate winner trailed for most of the game..[Thus] over a series of games in which close final scores are common, one team could easily achieve a string of several last minute victories. The coach of such a team might be credited with being brilliantly talented, for having created a “second half” team..[although] there is a good possibility that he owes his success to chance.

There is nice mathematics underlying all this.

The name “arc sine distribution” derives from the integration of the PDF in the chart – a PDF which has the formula –

f(x) = 1/(π (x(1-x).5)

Here, the integral of f(x) yields the cumulative distribution function F(x) and involves an arcsine function,

F(x) = 2/(π arcsin(x.5))

Fundamentally, the arcsine law relates to processes where there are probabilities of winning and losing in sequential trials. The PDF follows from the application of Stirling’s formula to estimate expressions with factorials, such as the combination of p+q things taken p at a time, which quickly becomes computationally cumbersome as p+q increases in size.

There is probably no better introduction to the relevant mathematics than Feller’s exposition in his classic An Introduction to Probability Theory and Its Applications, Volume I.

Feller had an unusual ability to write lucidly about mathematics. His Chapter III “Fluctuations in Coin Tossing and Random Walks” in IPTAIA is remarkable, as I have again convinced myself by returning to study it again.

Feller

He starts out this Chapter III with comments:

We shall encounter theoretical conclusions which not only are unexpected but actually come as a shock to intuition and common sense. They will reveal that commonly accepted motions concerning chance fluctuations are without foundation and that the implications of the law of large numbers are widely misconstrued. For example, in various applications it is assumed that observations on an individual coin-tossing game during a long time interval will yield the same statistical characteristics as the observation of the results of a huge number of independent games at one given instant. This is not so..

Most pointedly, for example, “contrary to popular opinion, it is quite likely that in a long coin-tossing game one of the players remains practically the whole time on the winning side, the other on the losing side.”

The same underlying mathematics produces the Ballot Theorem, which states the chances a candidate will be ahead from an early point in vote counting, based on the final number of votes for that candidate.

This application, of course, comes very much to the fore in TV coverage of the results of on-going primaries at the present time. CNN’s initial announcement, for example, that Bernie Sanders beat Hillary Clinton in the New Hampshire primary came when less than half the precincts had reported in their vote totals.

In returning to Feller’s Volume 1, I recommend something like Sholmo Sternberg’s Lecture 8. If you read Feller, you have to be prepared to make little derivations to see the links between formulas. Sternberg cleared up some puzzles for me, which, alas, otherwise might have absorbed hours of my time.

The arc sine law may be significant for social and economic inequality, which perhaps can be considered in another post.

Business Forecasting – Practical Problems and Solutions

Forecasts in business are unavoidable, since decisions must be made for annual budgets and shorter term operational plans, and investments must be made.

And regardless of approach, practical problems arise.

For example, should output from formal algorithms be massaged, so final numbers include judgmental revisions? What about error metrics? Is the mean absolute percent error (MAPE) best, because everybody is familiar with percents? What are plus’es and minus’es of various forecast error metrics? And, organizationally, where should forecasting teams sit – marketing, production, finance, or maybe in a free-standing unit?

The editors of Business Forecasting – Practical Problems and Solutions integrate dozens of selections to focus on these and other practical forecasting questions.

Here are some highlights.

In my experience, many corporate managers, even VP’s and executives, understand surprisingly little about fitting models to data.

So guidelines for reporting results are important.

In “Dos and Don’ts of Forecast Accuracy Measurement: A Tutorial,” Len Tashman advises “distinguish in-sample from out-of-sample accuracy,” calling it “the most basic issue.”

The acid test is how well the forecast model does “out-of-sample.” Holdout samples and cross-validation simulate how the forecast model will perform going forward. “If your average error in-sample is found to be 10%, it is very probable that forecast errors will average substantially more than 10%.” That’s because model parameters are calibrated to the sample over which they are estimated. There is a whole discussion of “over-fitting,” R2, and model complexity hinging on similar issues. Don’t fool yourself. Try to find ways to test your forecast model on out-of-sample data.

The discussion of fitting models when there is “extreme seasonality” broke new ground for me. In retail forecasting, there might be a toy or product that sells only at Christmastime. Demand is highly intermittent. As Udo Sglavo reveals, one solution is “time compression.” Collapse the time series data into two periods – the holiday season and the rest of the year. Then, the on-off characteristics of sales can be more adequately modeled. Clever.

John Mello’s “The Impact of Sales Forecast Game Playing on Supply Chains,” is probably destined to be a kind of classic, since it rolls up a lot of what we have all heard and observed about strategic behavior vis a vis forecasts.

Mello describes stratagems including

  • Enforcing – maintaining a higher forecast than actually anticipated, to keep forecasts in line with goals
  • Filtering – changing forecasts to reflect product on hand for sale
  • Hedging – overestimating sales to garner more product or production capability
  • Sandbagging – underestimating sales to set expectations lower than actually anticipated demand
  • Second-guessing – changing forecasts to reflect instinct or intuition
  • Spinning – manipulating forecasts to get favorable reactions from individuals or departments in the organization
  • Withholding – refusing to share current sales information

I’ve seen “sand-bagging” at work, when the salesforce is allowed to generate the forecasts, setting expectations for future sales lower than should, objectively, be the case. Purely by coincidence, of course, sales quotas are then easier to meet and bonuses easier to achieve.

I’ve always wondered why Gonik’s system, mentioned in an accompanying article by Michael Gilliland on the “Role of the Sales Force in Forecasting,” is not deployed more often. Gonik, in a classic article in the Harvard Business Review, ties sales bonuses jointly to the level of sales that are forecast by the field, and also to how well actual sales match the forecasts that were made. It literally provides incentives for field sales staff to come up with their best, objective estimate of sales in the coming period. (See Sales Forecasts and Incentives)

Finally, Larry Lapide’s “Where Should the Forecasting Function Reside?” asks a really good question.

The following graphic (apologies for the scan reproduction) summarizes some of his key points.

TTable

There is no fixed answer, Lapide provides a list of things to consider for each organization.

This book is a good accompaniment for Rob Hyndman and George Athanasopoulos’s online Forecasting: Principles and Practice.