Why it's necessary

Why it's necessary

Uncertainty matters

Humans don’t like uncertainty. They would prefer to ignore it if possible. Psychologists have observed that humans avoid situations in which they have to face uncertainties. Presented with uncertainty or ambiguity about possible choices, they often refuse to make decisions, or decide based on clearly suboptimal criteria.

Yet evidence also shows that being aware of the uncertainties about predictions improves decision making among experts and nonexperts (Nadav-Greenberg and Joslyn 2009). Clear interpretation requires consideration of the underlying uncertainties. Uncertainty provides context in which one can make comparisons and tradeoffs. Ignoring uncertainty in statistical calculations produces specious patterns that are not reliable characterizations of reality, or even of the available data (cf., Jones 2013).

Flaws of modern predictive analytics

A critical misconception about predictive analytics is that huge data sets are the keys to unlocking any statistical mystery. This misconception festers into what the The New York Times has called “big data hybris” (Lohr 2014) when it is accompanied by massive overfitting, neglect of structural uncertainty, and statistical self-deception.

Predictive failures are far more common than they should be in probabilistic and statistical modeling. There have been many examples:

NASA estimated the risk of catastrophic failure of the Shuttle to be 1/10,000 per flight, but its observed failure rate was about 1 per 70 flights (Hamlin et al. 2011).
The National Weather Service and the Army Corps of Engineers strongly underestimate risks of “100-year floods” and miscommunicate uncertainties about flooding risks (Morss and Wahl 2007; NRC 2006; NWS 1998).
Observed failures and near-misses found at the Diablo Canyon Nuclear Power Plant reveal gross understatement of its assessed risks (Lochbaum 2011).
Failure cascades in electricity distribution systems are much more common than they are forecasted to be (RAWG 2005; USCPSOTF 2006).
Probabilistic assessments grossly and systematically understating risks in the financial industry by factors of hundreds precipitated the 2008 recession (Savage 2012).
The error that caused the loss of NASA’s Mars Climate Orbiter was the confusion of English and metric units (Isbell et al. 1999; JPL 1999).
The spectacular failure of the Google Flu Trends was apparently due to massive overfitting and unaccounted autocorrelation (Lazer et al. 2014).
The financial debacle of 2008 has been attributed by Nate Silver to the failure to distringuish between aleatory and epistemic ncertainty. “...the ratings agencies' problem was in being unable or uninterested in appreciating the distinction between risk and uncertainty" (Silver 2012, chapter 1).
The Kansai International Airport was built on a man-made island in Osaka Bay. Engineers knew that the fill material making up the island would settle over time and lower the elevation of the island, but planners ignored the uncertainty of this forecast. The airport will sink below sea level over the next 50 years (Mesri and Funk 2014).
The very successful drug Vioxx was voluntarily withdrawn from the pharmaceutical market because uncertainty about possible side effects was misjudged. FDA practices in fact shift the burden of proof about such risks when drugs are reconsidered after original approval.

These are not random rare events in disciplines that estimate and control risks well overall. They are not the unlucky but expected tail events sometimes called “noble failures”. They seem instead to be the result of pervasive and systemic methodological errors that undermine the credibility of all predictive analytics. The public comment by a prominent engineer that the 2007 bridge collapse in Minneapolis was a “billion to one” event typifies the raw fact that uncertainties are sometimes not well understood in engineering.

Form over function

With the rise of advanced graphics and high-performance computing, predictive analytics in engineering has become beautiful as a result of impressive and sometimes dazzling visualizations, but this has largely been a triumph of style over substance, of format over content. If the predictions miss things that can happen, or seriously underestimate their chances of happening, the models are really only Potemkin Villages of proper analyses. They fail to show us just how bad things could be, and thus we are unprepared when exigencies arise. Even worse, they offer a pretense of serious analysis where there is none, which precludes expenditures of effort that would otherwise be recognized as useful.

These predictive failures are intolerable when the stakes involved in analyses are large. What can be done about this situation? Is a source of this problem the complexity of the models? Are we just unable to compute enough samples to reasonably describe the breadth of possible outcomes? These are, no doubt, persistent problems, but there are other, more important reasons for the predictive failures. We believe that a pervasive cause of these predictive failures is using analytic methods without the empirical information that they require.

There are twin myths in predictive analytics that inhibit benefits from the broader application of uncertainty quantification and analysis. The first myth is that the results of calculations are reliable and precise merely because they are expressed with 7 or 14 decimal places spit out by computers. It is unlikely that many analysts actually believe this myth, but they behave as though they do. Few modelers construct comprehensive uncertainty or sensitivity analyses that might explore the full implications of the limits of knowledge about model structure and parameter values. Without such analyses, we cannot really say whether or to what extent our model outputs should be trusted. The second myth is that a full proper accounting of the uncertainty inherent in the data and models would “blow up” to a vacuous conclusion that says nothing because any signal would be lost in the noise of uncertainty. Many people actually seem to believe this pernicious myth which is as misguided as the first myth, albeit in the opposite direction. What is needed are convenient software tools that automatically undertake comprehensive sensitivity and uncertainty analyses whenever models are executed and do it correctly without falling into the trap of either myth.

Such careful analyses are not always required. Sometimes approximate answers are good enough. But more rigor is needed in important practical cases. For instance, only a rigorous analysis that considers all potential errors in data and modeling can tell us definitely that a newly discovered asteroid will not impact the planet. Data scientists need tools that can provide results whose surety can be calibrated.

References

Jones, B. (2013). Holy mackerel, uncertainty matters! DataRemixed [blog, retrieved 2 March 2016].

Lohr, S. (2014) Google flu trends: the limits of big data. The New York Times 31 March 2014, B6, under the title “The limits of big data in a flu tracker”.

Lazer, D., R. Kennedy, G. King, and A. Vespignani (2014). “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (14 March): 1203-1205. Copy at http://j.mp/1ii4ETo. See also Lazer, D., R. Kennedy, G. King, and A. Vespignani (2014). “Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season”. Copy at http://j.mp/1m6JBX6

Mesri, G., and J.R. Funk (2014). Settlement of the Kansai International Airport islands. Journal of Geotechnical and Geoenvironmental Engineering 141 (2). 10.1061/(ASCE)GT.1943-5606.0001224, 04014102.

Nadav-Greenberg, L., and S.L. Joslyn (2009). Uncertainty forecasts improve decision making among nonexperts. Journal of Cognitive Engineering and Decision Making 3: 209-226.

NASA Says Shuttle Risk Overstated; Yet Some Risk Unavoidable

Savage, S.L. (2012). The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. Wiley.

Slver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail―but Some Don't . Penguin Group.

Common uncertainty approaches are flawed

Uncertainty: Introduction

Engineers and risk analysts now commonly distinguish between two main forms of uncertainty: epistemic and aleatory.

Aleatory uncertainty refers to the variability or stochastic fluctuations in a quantity through time, variation across space, manufacturing differences among components or individuals, or similar heterogeneity within some ensemble or population. This is considered to be a form of uncertainty because the value of the quantity can change each time one looks, and we cannot predict precisely what the next value will be (although the distribution of values may be known).

Epistemic uncertainty, on the other hand, refers to the lack of full knowledge about a quantity that arises from imperfect measurement, limited sampling effort, or incomplete scientific understanding about the underlying processes that govern a quantity.

These two forms of uncertainty have important differences. Epistemic uncertainty can in principle be reduced by empirical effort; investing more in measurement and study of a system should yield better precision. Aleatory uncertainty, in contrast, can sometimes be better characterized, but cannot generally be reduced by empirical effort. Epistemic uncertainty depends on the observer and the observations made. Aleatory uncertainty does not depend on an observer. Although epistemic and aleatory uncertainty can sometimes be like ice and snow in that their distinction can be difficult to discern through complicating details (and sometimes one can change into the other depending on the scale and perspective of the analyst), the macroscopic differences between these two forms of uncertainty are usually obvious and often significant in practical settings.

The Monte Carlo approach

A traditional sampling approach to uncertainty projection in engineered systems routinely and often profoundly underestimates the overall uncertainty that should be associated with outcomes (Atwood 1985; Ferson and Ginzburg 1996; Oberkampf and Roy 2010; Beer er al. 2013; Savage 2012). The figure below illustrates this underestimation, showing Monte Carlo results in black and the true uncertainty about the outcome circumscribed by blue bounds. The actual uncertainty is often much wider, sometimes orders of magnitude larger than that predicted by ordinary probabilistic simulation schemes such as Monte Carlo simulation, Latin hypercube simulation, second-order Monte Carlo or other methods.

The outer bounds are not necessarily worst-case outcomes, but they may be envelopes of the plausible scenarios in a proper accounting of uncertainty that, unlike traditional Monte Carlo approaches, does not make unjustified assumptions about independence among variables, and does not use equiprobability or uniform distribution assumptions to represent incertitude (empirical ignorance).

The difference between the Monte Carlo simulations and the outer bounds can sometimes be over several orders of magnitude. Sometimes the Monte Carlo results are in the middle of the true possible range of outcomes on some scale as depicted in the figure, but this is not always the case. Sometimes, or for some time intervals, the Monte Carlo results seem to ‘fill up’ the possible range, but this is rare and, when it does happen in time-dependent models, it is usually an ephemeral occurrence at the start of a simulation or at pinch points or ranges where the directions of trajectories shift. Monte Carlo sees uncertainty through a glass darkly in general. Variants of Monte Carlo methods are only marginally different. Latin hypercube sampling, for example, is usually worse at identifying extreme events. Second-order Monte Carlo does generally produce slightly wider results, but these trajectories form a halo around the regular Monte Carlo results, and they also come nowhere near to filling up the range of possible outcomes in most cases.

Reasons traditional techniques underestimate uncertainty

Why do regular probabilistic analyses often badly underestimate? There are several reasons, but the most common and significant ones are

Unjustified use of independence assumptions or overly precise dependence assumption,
Inappropriate use of equiprobability and uniformity assumptions,
Underestimation of the uncertainties of original measurements,
Modeling volitional choices with distributions as though they are random,
Using averaging as an aggregation, and
Making assumptions for the sake of mathematical convenience (wishful thinking).

These general issues are discussed in the technical paper “What Monte Carlo methods cannot do”.

Another way uncertainty can arise:

probabilistic dilation

Question: Is it ever possible for an analyst to face a situation where there is available data which will necessarily increase uncertainty about some key quantity in such a way that the information has negative value to the analyst? If so, doesn't this turn the analyst into a money-pump? The analyst should pay to avoid receiving the data, and would need to keep paying to avoid receiving the data.

Probabilistic dilation

The question is asking about a phenomenon described by Seidenfeld and Wasserman (1993) known as probabilistic dilation. It occurs when new evidence leads different Bayesian investigators into greater disagreement than they had prior to their getting the new evidence. Such evidence is not merely surprising in the sense that it contradicts one's prior conceptions; it expands everyone's uncertainty. This effect is counter-intuitive because it does not depend on what the new information is actually saying.

A simple example

It's hard to explain dilation with a simple example, but let me try. Suppose Lucius Malfoy tosses a fair coin twice, but the second 'toss' depends on the outcome of the first toss. It could be that Malfoy just lets the coin ride, and the second outcome is exactly the same as the first outcome. Or he could just flip the coin over so that the second outcome is the opposite of the first. You don't know what he will do. The outcome of the first toss is either heads H1 or tails T1. Because the first toss is fair (and no spells are cast midair), you judge the probability P(H1) = ½. Whether Malfoy lets the coin ride or flips it, you judge the probability the second 'toss' ends up heads to be the same, P(H2) = ½. So what happens when you see the outcome of Malfoy's first toss? Suppose it was a head. What is your probability now that the second 'toss' will also be a head? It turns out that once you condition on the first observation, the probability of the second toss being a head dilates. It is now either zero or one, but you don't know which. It doesn't depend on chance now; it depends on Malfoy's choice, about which you have no knowledge (unless maybe you too dabble in the dark arts). Dilation occurs because the observation H1 has caused the earlier precise unconditional probability P(H2) = ½ to devolve into the vacuous interval P(H2 | H1) = [0,1].

A practical example

A medical example of dilation described by de Cooman and Zaffalon (2004) can perhaps convince you of the importance of this issue. Suppose 1 out of 100 people in a population has a disease that is easy to test for. In fact, let's say the test has perfect sensitivity and perfect specificity so that, if the test result is positive, the patient surely has the disease, and if it's negative the patient surely doesn't. If we take a random person from the population, what is the probability before any tests are done that he or she has the disease? Well, we said the prevalence was 1 out of 100, so the probability would be 1 out of 100. Now suppose we are told that the person has been tested for the disease but that the test result has gone missing.

What can we now say about the probability that the person has the disease? You might think it would be reasonable to revert to the earlier answer that it's just 1 out of 100, but that conclusion is wrong. It's wrong because that conclusion depends on knowing something about why the test went missing. It would only be reasonable to say that the probability is still 1 out of 100 if the reason the test result went missing had nothing to do with its value, that is, it was missing at random. Unfortunately, it is generally quite hard to be confident about why information is not available in such cases.

Suppose there's a stigma associated with having the disease, and a positive result was hidden because of the stigma. In this case, the only reason a test result might go missing could be that the test was positive. If so, then the fact that it's missing reveals that the patient certainly has the disease. But, in our ignorance, it might just as well be the case that the result was unobservable because it showed a negative value, in which case the patient is surely disease-free. Intermediate cases are also possible and thus the probability of disease may, for all we know, be anywhere in the interval [0,1], and we cannot say that one value in that range is more likely than another.

This is an example of dilation because our initial probability of 1 out of 100 dilates because of the information that the patient has been tested to the vacuous statement that the probability might now be either zero or one. We started out with some pretty good knowledge about the chance the patient was healthy, but some seemingly irrelevant information forces us into total ignorance about whether the patient has the disease or not.

The Monty Hall problem

Another example of dilation is the infamous Monty Hall problem. Suppose you are a contestant on the game show Let’s Make a Deal and the host Monty Hall shows you three doors. You get to choose a door. There's a car behind one of the doors and goats behind the other two. If you pick the door concealing the car, you get the car as a prize. If you pick one of the doors with the goats, you get nothing. You pick a door and tell Monty, but before the door you picked is opened, Monty opens one of the other two doors revealing a goat. Monty then asks whether you’d like to change your pick to the other closed door. Should you switch to the other door to improve your chances of getting the car, or stick with the one you first picked? Many people believe that switching makes no difference, and several prominent people publicly embarrassed themselves arguing the point (Crocket 2015).

The Monty Hall problem is actually pretty subtle. When you first pick a door to open, it's reasonable to say you have a ⅓ chance of winning the car with that choice, that is, before Monty opens the door showing the goat. Once the goat is revealed, there are two doors left. As explained by de Cooman and Zaffalon (2004), the probability of getting the car depends not only on whether you stick with or switch your door, but also on why Monty chose the door he opened. It could go different ways. For instance, suppose Monty has previously decided he will open door three (if you haven't picked it) whenever the car is behind door one. In that case, if you pick door one and Monty opens door two, then the car must be behind door three and you should definitely switch doors. However, if Monty decided beforehand to open door two whenever the car is behind door one, and he opens door two after you pick door one, then there are two equally likely possibilities. The car is either behind door one and you should stick with it, or the car is behind door three and you should switch. Because you can't read Monty's mind and you don't know which plan he used, once he opens a door the probability the car is behind the door you first picked is now somewhere in the interval [0, ½], and the probability it is behind the other door is now in the interval [½, 1].

The reason the probabilities in the Monty Hall problem are intervals rather than scalar values is because we don't know Monty's decision process when he has a choice of doors to open. Notice what seeing Monty open the door has wrought. The probabilities before we see the goat were evenly spread, but they were precise. After we see the goat, the probabilities are imprecise. There's more knowledge—the three doors have been whittled down to two—but there's also a sense in which there's more uncertainty, which is that imprecision about the probabilities. Thus this is another example of dilation.

As a game show contestant you certainly wouldn't mind your increase in uncertainty that comes from Monty opening a door to show the goat. You couldn't become a money pump in the Bayesian sense to prevent this, even theoretically, because the imprecision is not the same as probability. Such behavior would only be reasonable in this case if you believed that all uncertainty, no matter its source or nature, must be expressed as precise probability. It seems clear, however, that uncertainties come in different flavors that are not really directly interchangeable.

Cooman et al. 2010, §7). Although dilation seems highly counterintuitive to some people, others consider it a natural consequence of the interactions of partial knowledge (Walley 1991, 298f). Perhaps the phenomenon is evidence that uncertainty is a much richer idea than is usually assumed in probability theory.

One way to avoid dilation is not to use conditionalization as the updating rule for new information. Interestingly, it is possible to do this with imprecise probabilities. Grove and Halpern (1998) point out that the standard justifications for conditionalization may no longer apply when we consider sets of probabilities. And it may turn out that conditionalization may not be the most natural way to update sets of probabilities in the first place (de Cooman and Zaffalon 2004). Instead, a constraint-based updating rule may sometimes be more sensible. It's also interesting to note that dilation does not occur in interval analysis (Seidenfeld and Wasserman 1993), which is a kind of constraint analysis.

References

de Cooman, G., F. Hermans, A. Antonucci, M. Zaffalon (2010). Epistemic irrelevance in credal nets: the case of imprecise Markov trees.International Journal of Approximate Reasoning 51: 1029–1052. http://dx.doi.org/10.1016/j.ijar.2010.08.011

de Cooman, G., and M. Zaffalon (2004). Updating beliefs with incomplete observations. Artificial Intelligence 159(1−2): 75−125.http://www.sciencedirect.com/science/article/pii/S0004370204000827, http://arxiv.org/pdf/cs/0305044v2.pdf

Crocket, Z. 2015. The time everyone “corrected” the world’s smartest woman. Priceonomics [blog] http://priceonomics.com/the-time-everyone-corrected-the-worlds-smartest/

Ferson, S., and J. Siegrist (2012). Verified computation with probabilities. Uncertainty Quantification in Scientific Computing, edited by Andrew Dienstfrey and R.F. Boisvert, pages 95–122, Springer, New York.

Grove, A.J., and J.Y. Halpern (1998). Updating sets of probabilities. Proceedings of the Fourteenth Conference on Uncertainty in AI, pages 173−182, http://arxiv.org/abs/0906.4332

Seidenfeld, T., and L. Wasserman (1993). Dilation for sets of probabilities. The Annals of Statistics 21: 1139−1154. http://www.hss.cmu.edu/philosophy/seidenfeld/relating%20to%20Dilation/Dilation%20for%20Sets%20of%20Probabilities.pdf

Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London.

Why it's necessary

Uncertainty matters

NASA estimated the risk of catastrophic failure of the Shuttle to be 1/10,000 per flight, but its observed failure rate was about 1 per 70 flights (Hamlin et al. 2011).

The National Weather Service and the Army Corps of Engineers strongly underestimate risks of “100-year floods” and miscommunicate uncertainties about flooding risks (Morss and Wahl 2007; NRC 2006; NWS 1998).

Observed failures and near-misses found at the Diablo Canyon Nuclear Power Plant reveal gross understatement of its assessed risks (Lochbaum 2011).

Failure cascades in electricity distribution systems are much more common than they are forecasted to be (RAWG 2005; USCPSOTF 2006).

Probabilistic assessments grossly and systematically understating risks in the financial industry by factors of hundreds precipitated the 2008 recession (Savage 2012).

The error that caused the loss of NASA’s Mars Climate Orbiter was the confusion of English and metric units (Isbell et al. 1999; JPL 1999).

The spectacular failure of the Google Flu Trends was apparently due to massive overfitting and unaccounted autocorrelation (Lazer et al. 2014).

The financial debacle of 2008 has been attributed by Nate Silver to the failure to distringuish between aleatory and epistemic ncertainty. “...the ratings agencies' problem was in being unable or uninterested in appreciating the distinction between risk and uncertainty" (Silver 2012, chapter 1).

The very successful drug Vioxx was voluntarily withdrawn from the pharmaceutical market because uncertainty about possible side effects was misjudged. FDA practices in fact shift the burden of proof about such risks when drugs are reconsidered after original approval.

Common uncertainty approaches are flawed

The Monte Carlo approach

​

Reasons traditional techniques underestimate uncertainty

Unjustified use of independence assumptions or overly precise dependence assumption,

Inappropriate use of equiprobability and uniformity assumptions,

Underestimation of the uncertainties of original measurements,

Modeling volitional choices with distributions as though they are random,

Using averaging as an aggregation, and

Making assumptions for the sake of mathematical convenience (wishful thinking).

Another way uncertainty can arise:

probabilistic dilation

​

​

Probabilistic dilation

​

​

A simple example

​

​

A practical example

​

​

The Monty Hall problem

​

References

​