Category Archives: probability distribution analysis

Distributions of Stock Returns and Other Asset Prices

This is a kind of wrap-up discussion of probability distributions and daily stock returns.

When I did autoregressive models for daily stock returns, I kept getting this odd, pointy, sharp-peaked distribution of residuals with heavy tails. Recent posts have been about fitting a Laplace distribution to such data.

I have recently been working with the first differences of the logarithm of daily closing prices – an entity the quantitative finance literature frequently calls “daily returns.”

It turns out many researchers have analyzed the distribution of stock returns, finding fundamental similarities in the resulting distributions. There are also similarities for many stocks in many international markets in the distribution of trading volumes and the number of trades. These similarities exist at a range of frequencies – over a few minutes, over trading days, and longer periods.

The paradigmatic distribution of returns looks like this:


This is based on closing prices of the NASDAQ 100 from October 1985 to the present.

There also are power laws that can be extracted from the probabilities that the absolute value of returns will exceed a certain amount.

For example, again with daily returns from the NASDAQ 100, we get an exponential distribution if we plot these probabilities of exceedance. This curve can be fit by a relationship ~x where θ is between 2.7 and 3.7, depending on where you start the estimation from the top or largest probabilities.


These magnitudes of the exponent are significant, because they seem to rule out whole classes, such as Levy stable distributions, which require θ < 2.

Also, let me tell you why I am not “extracting the autoregressive components” here. There are probably nonlinear lag effects in these stock price data. So my linear autoregressive equations probably cannot extract all the time dependence that exist in the data. For that reason, and also because it seems pro forma in quantitative finance, my efforts have turned to analyzing what you might call the raw daily returns calculated with price data and suitable transformations.

Levy Stable Distributions

At the turn of the century, Mandelbrot, then Sterling Professor of Mathematics at Yale, wrote an introductory piece for a new journal called Quantitative Finance called Scaling in financial prices: I. Tails and dependence. In that piece, which is strangely convoluted by my lights, Mandelbrot discusses how he began working with Levy-stable distributions in the 1960’s to model the heavy tails of various stock and commodity price returns.

The terminology is a challenge, since there appear to be various ways of discussing so-called stable distributions, which are distributions which yield other distributions of the same type under operations like summing random variables, or taking their ratios.

The Quantitative Finance section of Stack Exchange has a useful Q&A on Levy-stable distributions in this context.

Answers refer readers to Nolan’s 2005 paper Modeling Financial Data With Stable Distributions which tells us that the class of all distributions that are sum-stable is described by four parameters. The distributions controlled by these parameters, however, are generally not accessible as closed algebraic expressions, but must be traced out numerically by computer computations.

Nolan gives several applications, for example, to currency data, illustrated with the following graphs.


So, the characteristics of the Laplace distribution I find so compelling are replicated to an extent by the Levy-stable distributions.

While Levy-stable distributions continue to be the focus of research in some areas of quantitative finance – risk assessment, for instance – it’s probably true that applications to stock returns are less popular lately. There are two reasons in particular. First, Levy stable distributions apparently have infinite variance, and as Cont writes, there is conclusive evidence that stock prices have finite second moments. Secondly, Levy stable distributions imply power laws for the probability of exceedance of a given level of absolute value of returns, but unfortunately these power laws have an exponent less than 2.

Neither of these “facts” need prove conclusive, though. Various truncated versions of Levy stable distributions have been used in applications like estimating Value at Risk (VAR).

Nolan also maintains a webpage which addresses some of these issues, and provides tools to apply Levy stable distributions.

Why Do These Regularities in Daily Returns and Other Price Data Exist?

If I were to recommend a short list of articles as “must-reads” in this field, Rama Cont’s 2001 survey in Quantitative Finance would be high on the list, as well as Gabraix et al’s 2003 paper on power laws in finance.

Cont provides a list of11 stylized facts regarding the distribution of stock returns.

1. Absence of autocorrelations: (linear) autocorrelations of asset returns are often insignificant, except for very small intraday time scales (

20 minutes) for which microstructure effects come into play.

2. Heavy tails: the (unconditional) distribution of returns seems to display a power-law or Pareto-like tail, with a tail index which is finite, higher than two and less than five for most data sets studied. In particular this excludes stable laws with infinite variance and the normal distribution. However the precise form of the tails is difficult to determine.

3. Gain/loss asymmetry: one observes large drawdowns in stock prices and stock index values but not equally large upward movements.

4. Aggregational Gaussianity: as one increases the time scale t over which returns are calculated, their distribution looks more and more like a normal distribution. In particular, the shape of the distribution is not the same at different time scales.

5. Intermittency: returns display, at any time scale, a high degree of variability. This is quantified by the presence of irregular bursts in time series of a wide variety of volatility estimators.

6. Volatility clustering: different measures of volatility display a positive autocorrelation over several days, which quantifies the fact that high-volatility events tend to cluster in time.

7. Conditional heavy tails: even after correcting returns for volatility clustering (e.g. via GARCH-type models), the residual time series still exhibit heavy tails. However, the tails are less heavy than in the unconditional distribution of returns.

8. Slow decay of autocorrelation in absolute returns: the autocorrelation function of absolute returns decays slowly as a function of the time lag, roughly as a power law with an exponent β ∈ [0.2, 0.4]. This is sometimes interpreted as a sign of long-range dependence.

9. Leverage effect: most measures of volatility of an asset are negatively correlated with the returns of that asset.

10. Volume/volatility correlation: trading volume is correlated with all measures of volatility.

11. Asymmetry in time scales: coarse-grained measures of volatility predict fine-scale volatility better than the other way round.

There’s a huge amount here, and it’s very plainly and well stated.

But then why?

Gabraix et al address this question, in a short paper published in Nature.

Insights into the dynamics of a complex system are often gained by focusing on large fluctuations. For the financial system, huge databases now exist that facilitate the analysis of large fluctuations and the characterization of their statistical behavior. Power laws appear to describe histograms of relevant financial fluctuations, such as fluctuations in stock price, trading volume and the number of trades. Surprisingly, the exponents that characterize these power laws are similar for different types and sizes of markets, for different market trends and even for different countries suggesting that a generic theoretical basis may underlie these phenomena. Here we propose a model, based on a plausible set of assumptions, which provides an explanation for these empirical power laws. Our model is based on the hypothesis that large movements in stock market activity arise from the trades of large participants. Starting from an empirical characterization of the size distribution of those large market participants (mutual funds), we show that the power laws observed in financial data arise when the trading behaviour is performed in an optimal way. Our model additionally explains certain striking empirical regularities that describe the relationship between large fluctuations in prices, trading volume and the number of trades.

The kernel of this paper in Nature is as follows:


Thus, Gabraix links the distribution of purchases in stock and commodity markets with the resulting distribution of daily returns.

I like this hypothesis and see ways it connects with the Laplace distribution and its variants. Probably, I will write more about this in a later post.

Microsoft Stock Prices and the Laplace Distribution

The history of science, like the history of all human ideas, is a history of irresponsible dreams, of obstinacy, and of error. But science is one of the very few human activities perhaps the only one in which errors are systematically criticized and fairly often, in time, corrected. This is why we can say that, in science, we often learn from our mistakes, and why we can speak clearly and sensibly about making progress there. — Karl Popper, Conjectures and Refutations

Microsoft daily stock prices and oil futures seem to fall in the same class of distributions as those for the S&P 500 and NASDAQ 100 – what I am calling the Laplace distribution.

This is contrary to the conventional wisdom. The whole thrust of Box-Jenkins time series modeling seems to be to arrive at Gaussian white noise. Most textbooks on econometrics prominently feature normally distributed error processes ~ N(0,σ).

Benoit Mandelbrot, of course, proposed alternatives as far back as the 1960’s, but still we find aggressive application of Gaussian assumptions in applied work – as for example in widespread use of the results of the Black-Scholes theorem or in computing value at risk in portfolios.

Basic Steps

I’m taking a simple approach.

First, I collect daily closing prices for a stock index, stock, or, as you will see, for commodity futures.

Then, I do one of two things: (a) I take the natural logarithms of the daily closing prices, or (b) I simply calculate first differences of the daily closing prices.

I did not favor option (b) initially, because I can show that the first differences, in every case I have looked at, are autocorrelated at various lags. In other words, these differences have an algorithmic structure, although this structure usually has weak explanatory power.

However, it is interesting that the first differences, again in every case I have looked at, are distributed according to one of these sharp-peaked or pointy distributions which are highly symmetric.

Take the daily closing prices of the stock of the Microsoft Corporation (MST), as an example.

Here is a graph of the daily closing prices.


And here is a histogram of the raw first differences of those closing prices over this period since 1990.


Now in close reading of The Laplace Distribution and Generalizations I can see there are a range of possibilities in modeling distributions of the above type.

And here is another peaked, relatively symmetric distribution based on the residuals of an autoregressive equation calculated on the first differences of the logarithm of the daily closing prices. That’s a mouthful, but the idea is to extract at least some of the algorithmic component of the first differences.


That regression is as follows.


Note the deep depth of the longest lags.

This type of regression, incidentally, makes money in out-of-sample backcasts, although possibly not enough to exceed trading costs unless the size of the trade is large. However, it’s possible that some advanced techniques, such as bagging and boosting, regression trees and random forecasts could enhance the profitability of trading strategies.

Well, a quick look at daily oil futures (CLQ4) from 2007 to the present.


Not quite as symmetric, but still profoundly not a Gaussian distribution.

The Difference It Makes

I’ve got to go back and read Mandelbrot carefully on his analysis of stock and commodity prices. It’s possible that these peaked distributions all fit in a broad class including the Laplace distribution.

But the basic issue here is that the characteristics of these distributions are substantially different than the Gaussian or normal probability distribution. This would affect maximum likelihood estimation of parameters in models, and therefore could affect regression coefficients.

Furthermore, the risk characteristics of assets whose prices have these distributions can be quite different.

And I think there is a moral here about the conventional wisdom and the durability of incorrect ideas.

Top pic is Karl Popper, the philosopher of science

The Laplace Distribution and Financial Returns

Well, using EasyFit from Mathwave, I fit a Laplace distribution to the residuals of the regression on S&P daily returns I discussed yesterday.

Here is the result.


This beats a normal distribution hands down. It also appears to beat the Matlab fit of a t distribution, but I have to run down more details on forms of the t-distribution to completely understand what is going on in the Matlab setup.

Note that EasyFit is available for a free 30-day trial download. It’s easy to use and provides metrics on goodness of fit to make comparisons between distributions.

There is a remarkable book online called The Laplace Distribution and Generalizations. If you have trouble downloading it from the site linked here, Google the title and find the download for a free PDF file.

This book, dating from 2001, runs to 458 pages, has a good introductory discussion, extensive mathematical explorations, as well as applications to engineering, physical science, and finance.

The French mathematical genius Pierre Simon Laplace proposed the distribution named after him as a first law of errors when he was 25, before his later discussions of the normal distribution.

The normal probability distribution, of course, “took over” – in part because of its convenient mathematical properties and also, probably, because a lot of ordinary phenomena are linked with Gaussian processes.

John Maynard Keynes, the English economist, wrote an early monograph (Keynes, J.M. (1911). The principal averages and the laws of error which lead to them, J. Roy. Statist. Soc. 74, New Series, 322-331) which substantially focuses on the Laplace distribution, highlighting the importance it gives to the median, rather than average, of sample errors.

The question I’ve struggled with is “why should stock market trading, stock prices, stock indexes lead, after logarithmic transformation and first differencing to the Laplace distribution?”

Of course, the Laplace distribution can be generated as a difference of exponential distributions, or as combination of a number of distributions, as the following table from Kotz, Kozubowski, and Podgorski’s book shows.


This is all very suggestive, but how can it be related to the process of trading?

Indeed, there are quite a number of questions which follow from this hypothesis – that daily trading activity is fundamentally related to a random component following a Laplace distribution.

What about regression, if the error process is not normally distributed? By following the standard rules on “statistical significance,” might we be led to disregard variables which are drivers for daily returns or accept bogus variables in predictive relationships?

Distributional issues are important, but too frequently disregarded.

I recall a blog discussion by a hedge fund trader lamenting excesses in the application of the Black-Scholes Theorem to options in 2007 and thereafter.

Possibly, the problem is as follows. The residuals of autoregressions on daily returns and their various related transformations tend to cluster right around zero, but have big outliers. This clustering creates false confidence, making traders vulnerable to swings or outliers that occur much more frequently than suggested by a normal or Gaussian error distribution.

The Distribution of Daily Stock Market Returns

I think it is about time for another dive into stock market forecasting. The US market is hitting new highs with the usual warnings of excess circulating around. And there are interesting developments in volatility.

To get things rolling, consider the following distribution of residuals from an autoregressive model on the difference in the natural logarithms of the S&P 500.

This is what I am calling “the distribution of daily stock market returns.” I’ll explain that further in a minute.


Now I’ve seen this distribution before, and once asked in a post, “what type of distribution is this?”

Now I think I have the answer – it’s a Laplace distribution, sometimes known as the double exponential distribution.

Since this might be important, let me explain the motivation and derivation of these residuals, and then consider some implications.

Derivation of the Residuals

First, why not just do a histogram of the first differences of daily returns to identify the underlying distribution? After all, people say movement of stock market indexes are a random walk.

OK, well you could do that, and the resulting distribution would also look “Laplacian” with a pointy peak and relative symmetry. However, I am bothered in developing this by the fact that these first differences show significant, even systematic, autocorrelation.

I’m influenced here by the idea that you always want to try to graph independent draws from a distribution to explore the type of distribution.

OK, now to details of my method.

The data are based on daily closing values for the S&P 500 index from December 4, 1989 to February 7, 2014.

I took the natural log of these closing values and then took first differences – subtracting the previous trading day’s closing value from the current day’s closing value. This means that these numbers encode the critical part of the daily returns, which are calculated as day-over-day percent changes. Thus, the difference of natural logs is in fact a ratio of the original numbers – what you might look at as the key part of the percent change from one trading day to the next.

So I generate a conventional series of first differences of the natural log of this nonstationary time series. This transforms the original nonstationary series to a  one that basically fluctuates around a level – essentially zero. Furthermore, the log transform tends to reduce the swings in the variability of the series, although significant variability remains.


Removing Serial Correlation

The series graphed above exhibits first order serial correlation. It also exhibits second order serial correlation, or correlation between values at a lag of 2.

Based on the correlations for the first 24 lags, I put together this regression equation. Of course, the “x’s” refer to time dated first differences of the natural log of the S&P daily closing values.


Note that most of the t-statistics pass our little test of significance (which I think is predicted to an extent on the error process belonging to certain distributions..but). The coefficient of determination or R2 is miniscule – at 0.017. This autoregressive equation thus explains only about 2 percent of the variation in this differenced log daily closing values series.

Now one of the things I plan to address is how, indeed, that faint predictive power can exert significant influence on earnings from stock trading, given trading rules.

But let me leave that whole area – how you make money with such a relationship – to a later discussion, since I’ve touched on this before.

Instead, let me just observe that if you subtract the predicted values from this regression from the actuals trading day by trading day, you get the data for the pointy, highly symmetric distribution of residuals.

Furthermore, these residuals do not exhibit first or second, or higher, autocorrelation, so far as I am able to determine.

This means we have separated out algorithmic components of this series from random components that are not serially correlated.

So you might jump to the conclusion that these residuals are then white noise, and I think many time series modelers have gotten to this point, simply assuming they are dealing with Gaussian white noise.

Nothing could be further from the truth, as the following Matlab fit of a normal distribution to some fairly crude bins for these numbers.


A Student-t distribution does better, as the following chart shows.


But the t-distribution still misses the pointed peak.

The Laplace distribution is also called the double exponential, since it can be considered to be a composite of exponentials on the right and to the left of the mean – symmetric but mirror images of each other.

The following chart shows how this works over the positive residuals.


Now, of course, there are likelihood ratios and those sorts of metrics, and I am busy putting together a comparison between the t-distribution fit and Laplace distribution fit.

There is a connection between the Laplace distribution and power laws, too, and I note considerable literature on this distribution in finance and commodities.

I think I have answered the question I put out some time back, though, and, of course, it raises other questions in its wake.

Predicting the Market Over Short Time Horizons

Google “average time a stock is held.” You will come up with figures that typically run around 20 seconds. High frequency trades (HFT) dominate trading volume on the US exchanges.

All of which suggests the focus on the predictability of stock returns needs to position more on intervals lasting seconds or minutes, rather than daily, monthly, or longer trading periods.

So, it’s logical that Michael Rechenthin, a newly minted Iowa Ph.D., and Nick Street, a Professor of Management, are getting media face time from research which purportedly demonstrates the existence of predictable short-term trends in the market (see Using conditional probability to identify trends in intra-day high-frequency equity pricing).

Here’s the abstract –

By examining the conditional probabilities of price movements in a popular US stock over different high-frequency intra-day timespans, varying levels of trend predictability are identified. This study demonstrates the existence of predictable short-term trends in the market; understanding the probability of price movement can be useful to high-frequency traders. Price movement was examined in trade-by-trade (tick) data along with temporal timespans between 1 s to 30 min for 52 one-week periods for one highly-traded stock. We hypothesize that much of the initial predictability of trade-by-trade (tick) data is due to traditional market dynamics, or the bouncing of the price between the stock’s bid and ask. Only after timespans of between 5 to 10 s does this cease to explain the predictability; after this timespan, two consecutive movements in the same direction occur with higher probability than that of movements in the opposite direction. This pattern holds up to a one-minute interval, after which the strength of the pattern weakens.

The study examined price movements of the exchange traded fund SPY, during 2005, finding that

.. price movements can be predicted with a better than 50-50 accuracy for anywhere up to one minute after the stock leaves the confines of its bid-ask spread. Probabilities continue to be significant until about five minutes after it leaves the spread. By 30 minutes, the predictability window has closed.

Of course, the challenges of generalization in this world of seconds and minutes is tremendous. Perhaps, for example, the patterns the authors identify are confined to the year of the study. Without any theoretical basis, brute force generalization means riffling through additional years of 31.5 million seconds each.

Then, there are the milliseconds, and the recent blockbuster written by Michael Lewis – Flash Boys: A Wall Street Revolt.

I’m on track for reading this book for a bookclub to which I belong.

As I understand it, Lewis, who is one of my favorite financial writers, has uncovered a story whereby high frequency traders, operating with optical fiber connections to the New York Stock Exchange, sometimes being geographically as proximate as possible, can exploit more conventional trading – basically buying a stock after you have put in a buy order, but before your transaction closes, thus raising your price if you made a market order.


The LA Times  has a nice review of the book and ran the above photo of Lewis.

Tornado Frequency Distribution

Data analysis, data science, and advanced statistics have an important role to play in climate science.

James Elsner’s blog Hurricane & Tornado Climate offers salient examples, in this regard.

Yesterday’s post was motivated by an Elsner suggestion that the time trend in maximum wind speeds of larger or more powerful hurricanes is strongly positive since weather satellite observations provide better measurement (post-1977).

Here’s a powerful, short video illustrating the importance of proper data segmentation and statistical characterization for tornado data – especially for years of tremendous devastation, such as 2011.

Events that year have a more than academic interest for me, incidentally, since my city of birth – Joplin, Missouri – suffered the effects of a immense supercell which touched down and destroyed everything in its path, including my childhood home. The path of this monster was, at points, nearly a mile wide, and it gouged out a track several miles through this medium size city.

Here is Elsner’s video integrating data analysis with matters of high human import.

There is a sort of extension, in my mind, of the rational expectations issue to impacts of climate change and extreme weather. The question is not exactly one people living in areas subject to these events might welcome. But it is highly relevant to data analysis and statistics.

The question simply is whether US property and other insurance companies are up-to-speed on the type of data segmentation and analysis that is needed to adequately capture the probable future impacts of some of these extreme weather events.

This may be where the rubber hits the road with respect to Bayesian techniques – popular with at least some prominent climate researchers, because they allow inclusion of earlier, less-well documented historical observations.