The Role and Nature of High Impact Events (Black Swans): Technical Commentary and Empirical Data - Written by N. N. Taleb
This is an appendix to the Edge piece. It is striking how some simple, simple tests (of stability of the 4th moment and failures of stress testing) can invalidate tens of thousand of research papers on prediction using “least squares”, and those based on “standard deviation”, “variance”, “correlation”, “GARCH”, “VAR”, etc. Indeed one or two tests can transform anything quantitative/statistical in social science (outside psychology) into facade of knowledge.
Data: Note that the analysis here is exhaustive: it is done systematically on almost ALL transacted macro data representing >98% of worldwide volume. I used interest rates, commodities (oil, agricultural), all available equity indices (US, UK, Continental Europe, Russia, Indonesia, Brazil), main traded currencies. I selected tradability because of its “cleanliness” compared to merely computed data. I also added some micro data: although indices encompass single equities, I processed >18 million pieces of single stock daily data, and select industry datasuch as drug sales, movie returns, etc. (what “clean” data I could find). While we have a plethora of data with business variables, we don’t have enough in epidemics, terrorism, wars, etc.
Logical and Mathematical Commentary
1) Telescope problem, insufficiency of data in the tails; consequence on left-skewed and right-skewed distributions;
2) Preasymptotics of probability distributions, classification of convergence, or why the central limit theorem is too Platonic;
What empirical data shows:
1) The severity of the fatness of the tails –and our inability to say “how” fat (my central problem). Not only kurtosis is> 3 everywhere;but it is unstable. One single observation in 10,000 represents 80% of the total fourth moment. Aside from the unpredictability,this means that notions in Norm- (like variance,standard deviation,correlation) are meaningless as an expression of any of the attributes of the probability distributions.
2) The “atypicality” of moves discussed in the article or why “stress testing” is dangerous–and why the data cannot be captured by conventional “stress” tests or a Poisson except after the fact). Also while we are certain of power laws, we just can’t see the tails very clearly.
3) Past deviations (expressed in shortfalls) do not predict future deviations–at any lag you use collectively.There was no need to do it but I tried anyway out of curiosity
Acknowledgments:These went into three pieces of formal technical work:a paper in the journal Complexity ( about the problem of separation of fratal power laws into two basins and the effect of the preasymptotics of fat-tailed processes), another in the International Journal of Forecasting ( about the role of measurement error with fat tails& what to do about it ), another under review (about the problems of model selection when we only observe the data,not the process). The arguments were presented at a special panel at the American Statistical Association Joint Statistical Meeting on August 6, 2008, in Denver.I thank Peter Westfall,Aaron Brown,Stan Young, Donald Rubin,Robert Lund, for helpful discussions. I also thank Benoit Mandelbrot,David Freedman,and Philip Stark for comments,and my long time colleague Pallop Angsupun for help with data.I thank David Shaywitz for help with payoffs from innovation in drugs. I also thank Scott Patterson for alerting me to the “great moderation” theories.
Figure1 -The stochastic rectangle: probability times deviation shows the contribution of an event to the total properties. With low probabilities the rectangle becomes very unstable.
Technical Difference between Fat Tails and Thin Tails: Another way to recover power laws. To take again my metaphor of the stochastic rectangle, but complicating it by considering a mth power of the payoff , there are two types of distributions –two types of distincts basins:
1) those for which declines rapidly so become insignificant (as becomes smaller) for all values of m. If you move to a continuous variable you get as a solution, exponential decline: for large λ, f(λ) = K , which bring us (thanks to a convolution) to the Gaussian as a special limiting case.
2) others for which these terms stay significant enough, so here you get as a unique solution, for large λ, f(λ) = K . The value of mfor which the rectangle explodes to infinity becomes 1 minus the exponent of the power law.
In other words, if higher terms E < , the usual expansion around values of X does not work –higher order increase in importance.
The problem of lack of knowledge of the distribution. It is a fact rarely noticed that absence of knowledge of the parameters of the distribution generates different classes of fat-tails (pending on the structure of ignorance). Stochastic volatility models, for instance, can come out of simply not knowing the standard deviation of a Gaussian –and having to estimate it. See the section on preasymptotics.
Preasymptotics of Platonic Distributions
Background: People discuss central limit: how the sum of N random variables (with finite variance and some independence) converge to the Gaussian basin. This is mathematically wrong. You converge –but not at a reasonable speed, and not in the tails. Fat tails implies that higher moments implode –not just the 4th.
The additivity of the Log of the Characteristic function under convolution makes it easy to examine the speed of the convergence to the Gaussian basin. Some distributions have strong asymptotics, others don’t.
Table of Normalized Cumulants -Speed of Convergence: Take the Log of the Fourier transform of the distribution, divide by where m is the order of the cumulant (and σ the variance). Derive at 0 m times. You would observe convergence to the Gaussian when higher scaled moments > 2 go to 0 when N becomes large (in a way to facilitate collapse of higher orders of the distribution). We can see that some distributions reach the Gaussian easily (the 4th cumulant of the exponential is and that, slower, of the Poisson is ) –others (power laws under any parametrization) NEVER do so for some higher moment, finite or no finite variance. Later, looking at the data, I will examine the empirical cumulant (N from 1 to 50) and show how we typically observe NO convergence outside of the small sample effect.
Now Bouchaud and Potters showed the slowness of convergence for power laws ( you converge to the Gaussian only within ± N meaning the tails stay heavy). Mandelbrot and I used extreme value theory to get the same result: the Extrememum/Mean stays significant until we hit a huge N.
Table 1 : Behavior under convolution of common distributions in the Gaussian family.
|N-convoluted Log Characteristic|
|2nd Cum (scaled)||1||1||1||1|
Table 2: Behavior under convolution of more complicated distributions
(Stoch Vol style)
We see from Table 2 a huge qualitative difference between stochastic vol and student T.
Note: What do we mean by “Infinite Kurtosis” or “infinite moment”? It simply means that the number is unstable; it does not converge as observations lengthen; its measurement is sample dependent. I typically use “indeterminate”.
Letting the Data Speak
Sampling Error of the Fourth Moment
Kurtosis in the normal framework implies “departure from Gaussian”. So can you imagine that people talk about “kurtosis” –and measure it — when one single observation in 40 years (10,000 data points) can represent 90% of the properties!
The implication is that 1) most of the work about fat tails 2) any measure of “volatility” in L2 is just inoperative!
I take here the maximum variable to the fourth power to see its contribution to the kurtosis. For a Gaussian,with N~ the number is expected to be ridiculously small, ~.008.
Implication: we don’t know how “fat” the tails are –if we want to stay in the regular world of assuming that a distribution has three attributes: centrality, dispersion, symmetry. But we need a fourth dimension: tail indicator, and power laws have it. So, again, we need to escape the L2 norm.
This also tells us that GARCH should not work –indeed it DOES NOT work out of sample.
In the Gaussian World it has a small dispersion, around .008 for N=10,000 (see simulation of times the N, for a total of ). Even then one observation in 10,000 synthetic securities reoresented a max ~ .037.
Saying “Fat Tails” Implies Difficulties with the Distribution
The instability of The Fourth moment. We have KURT the “raw” Kurtosis for daily observations, KURT10 for biweekly ones, and KURT66 for 3 month observations of Log changes in the macro variables. ” Max Quartic” is the measure of the maximal contribution to the fourth moment coming from one single observation
|Australia TB 10y||7.5||6.2||3.5||0.08||25.|
|Australia TB 3y||7.5||5.4||4.2||0.06||21.|
|Eurodollar Depo 1M||41.5||28.||6.||0.31||19.|
|Eurodollar Depo 3M||21.1||8.1||7.||0.25||28.|
|Jakarta Stock Index||40.5||6.2||4.2||0.19||16.|
Behavior of the Fourth Moment under temporal aggregation
The discussion of the preasymptotics table shows the theoretical effect of Central Limit if it worked. –Yet we see NONE beyond the regular sampling error with “infinite” (i.e. non existing) moments.
With Δt as the lag in days (here the lag is 1 through 45):
Slight technicity: I avoid the notion of ex post “mean” in the computation of Kurtosis. Most of the data is continuous futures with 0 expected mean. The data I used is mostly “continuous future”
Note Some data is “controlled”, making it less wild, owing to circuit breakers (markets shut down if they move more than, say 3 points), which causes an artificial thinning of the tails and lowers the Maximum Quartic contrbution. For instance Oct 20 1987, the 30y bond moved 10 points in the real market, but only a move of 3 was registered as the circuit breakers were activated.
Longitudinal 4th moment: no sign of stability. Typical graph.
First Conclusion: Avoid the use of “variance” metrics. mean-variance is inadequate.
Evidence of Scalability – or Why Observed “Fat Tails” are not (Standard) Poisson and why there is no TYPICAL deviation
Thanks to the need for the probabililities add up to 1 (something even economists seem to agree with), scalability in the tails is the sole possible model for such data. We may not be able to write the model for the full distribution –but we know how it looks like in the tails, where it matters.
The Behavior of Conditional Averages: With a scalable (or “scale-free”) distribution, when K is “in the tails” (say you reach the point when f[x,α]=C , where C is a constant and α the power law exponent), the relative conditional expectation of X (knowing that x>K) divided by K, that is, is a constant, and does not depend on K. More precisely, it is .
This provides for a handy way to ascertain scalability by raising K and looking at the averages in the data.
Note further that, for a standard Poisson, (too obvious for a Gaussian): not only the conditional expectation depends on K, but it “wanes”, i.e.
Other Decompositions: The result of course would cancel the kind of representations such as the model called Duffie-Pan-Singleton, which decomposes generating processes into a sum of jumps and some diffusion. Unless they have an infinity of power-law sized jumps, the conditional average would lose its scalability beyond the worst jump.
Calibrating Tail Exponents. In addition, we can calibrate power laws. Using K as the cross-over point, we get the α exponent above it –the same as if we used the Hill estimator or ran a regression above some point.
Individual Stocks Data
Stocks are interesting because there are so many. This test using 12 million pieces of exhaustive single stock returns shows how equity prices do not have a characteristic scale.No other possible method than a Paretan tail,albeit of unprecise calibration,can charaterize them.
Data: Pallop Angsupun ran the following test: We collected the most recent 10 years of daily prices for stocks (no survivorship bias effect as we included companies that have been delisted up to the last trading day), n= 11,674,825 , deviations expressed in logarithmic returns.
We focused on negative deviations. For instance,in the Table x below,the average move below “10 standard deviations”,-10,is -15.6 standard deviations, that is a multiple of 1.56.We kept moving K up until to 100 “sigmas” equivalent (indeed) –and we still had observations.
Note the tail estimator
Daily Returns (Stocks)
I normalized by STD (to communicate the result in the lingo) but we get the same results with MAD
Longer Window (Stocks)
A longer window, by taking time-aggregates, such as weeks, and months, do not show any different result –which is an additional evidence of the failure of Poisson. For instance weekly tails exhibit thickening instead of flattening: the implied α drops!
I used the set –again same pattern, particularly with the large deviations.
Positive Domain (Cond Exp is the expectation of the excess over a certain number)
Negative Domains: Drops below a certain Threshold
EuroDollars Front Month 1986-2006
UK Rates 1990-2007
Literally, you do not even have a large number K for which scalability drops from a small sample effect.
USD-JPY (1971-2007) (Negative Domain)
We get scalability as far as meets the eye. Usually small sample effects cause us to not observe much of the tails, with the consequence of “thinning” the upper bound. We do not even witness such effect.
Past ShortFall Does not Predict Future ShortFall -at all lags
The picture shows the predictability of a 7% shortfall, i.e. on , = if X <-.07. With discrete data, we see if a given shortfall after a date t can be predicted from data before that date t. Here X= Log[/]. The result is presented in Log space. Note that here Δt = 1day and I lagged by 252 days. But the result does not change in a perceptible way when I change the observation period or vary the lag (next Graph).
Lagging does not help. Except that one my datamining might be able to find some “rule” but these have failed out of sample.
However regular events tend to predict regular events
The graph shows the predictability of mean deviation between one period (252 days) and the next.
A Brief Discussion of Drug and Movie Successes
I look at drug sales for existing drugs. The problem is that when the Max is 167 STD away from the mean, you have a problem. That number could double if I include some marginal drugs not in my sample as these would affect the mean. I coulf not get a convincing tail exponent.
With movies it is even worse. We don’t know the baseline.
But I can derive conclusions: there is a ” potential” in the tails that I could fill-in –which would raise the expected mean considerably. But how much? I don’t know and I don’t want to play like academic charlatans.