Saturday, November 14, 2009

Bayesian 'Reasoning' and Fallacies

That the process of rational reasoning works cannot be justified by reason. The simplest way of looking at why I have to use scare quotes on Bayesian 'reasoning' is that it attempts to justify itself. Why would anyone hold up Bayesian thinking as better than regular thinking? Because they think it's a good idea - they come to the conclusion that Bayesian conclusions are better than their regular conclusions.

Oops.

Bayesian reasoning pre-supposes regular reasoning, and further supposes that it can be reasonably justified, which puts Bayesian reasoning entirely inside regular reasoning. Indeed, Bayes worked out and proved his theorem as a special case with existing tools in an existing framework, just like all mathematical theorems.

While boring to do so, I must disclaim that I do use simple Bayesian kinds of inference, though it seems I've just found the natural Bayesian network in the brain and I use it - I don't ever actually do the Bayes theorem calculation.

If you haven't considered it deeply before, there's an introduction I can recommend, both for showcasing the advantages of Bayes' theorem, and diving with both feet into many of the associated fallacies.


Tribes, Grass, and Rain

As lethal as my single point is, were I Bayesian Bicurious, I would find it unsatisfying.[1] The rest of this section is illustration. The goal is that each point will also be fatal, but be a special case of the overall error.

To start with, Bayes' theorem requires three pieces of information, (as you can see in La Wik) the prior probability of A, the same of B, and the probability of B given A. That scans as utter gibberish to me, so here's an example.

I'm going to use wet grass as evidence of rain. To calculate the probability it has rained if the grass is wet, p(A|B), (as opposed to the probability it's because someone has been washing their car) you need the probability the grass is wet in general, p(B), the probability it has rained in general, p(A), and the probability that the grass gets wet if it has rained, p(B|A). The probability p(B) can also be expressed like this: p(B|A)*p(A) + p(B|~A)*p(~A), which has applications when you don't know p(B) per se but you do know of a lot of possibilities that aren't rain - like car washing - that second part essentially reads, "the probability the grass gets wet from not-rain, times the probability it hasn't rained."

The first problem is that is an awful lot of probabilities to need. The second, more serious problem is that in a controversy, it is easy to the point of inevitability that these probabilities are going to be fudged.

Notice how this computation spirals out of control as more priors are added; p(A|BCDE), isn't too bad, as it's essentially just four of these calculations, but what if, say, C and D are also in dispute, and you need to calculate p(C|XY) as well, and so on? It rapidly becomes a hierarchy of tears, except maybe for a few specialists in the field of A-ology. You, poor schmuck layman, are forced to simply accept the probability calculations of expert A-ologists. (Oh wait, don't we already do this? Err...revolution where now?) To be fully 'rational,' apparently you need a multimillion research budget.

However, the second problem is the pernicious one. Take a moment and realize how ridiculous it is that water falls from the sky. What, do little leprechauns take it up there with buckets? And it just hangs around, ignoring gravity for a while, before it gets into the mood to hang around on the ground again? If it weren't so pervasive, and thus familiar and normalized, rain would be far more mysterious and mystical to stone-age humans than silly little volcanoes. I strongly recommend it - open a door or window next time it rains, and consider how bizarre it is this liquid is just falling willy nilly out of nowhere.

Right, in that frame of mind, let's insert a rain controversy - democracy in the stone age. One clan thinks that when the rain fairies alight on the ground, they leave behind drops on the grass. Another clan thinks that the grass fairies excrete water when it rains, basically to say 'hi,' like they do to greet the morning. These lead to different calculations of p(A|B) - how likely it is that it has rained if the grass is wet. (Through convoluted logic, most of the clans believing one or the other determines who gets to go yak hunting.)

The first clan is certainly less wrong, and they see that the chance of wet grass if it rains is 100%: p(B|A) = 1. The second clan is not about to go down without fighting; observing grass on a hot, dry day, (only slightly missing the actual end of the storm) they notice the grass is dry. They declare that the grass fairies were just unhappy that day, conclusively showing that p(B|A) is actually somewhat less than one. The first clan is outraged, "You were obviously wrong about the rain! There's no way it can rain without the grass getting wet!"

(Technically you're not supposed to ever use probabilities 1 or 0, because they break Bayes' theorem, something I'll detail further down.)

This conflict cannot be resolved by Bayesian reasoning. To work out p(B|A) requires p(A|B) - the very thing that was in dispute to begin with. It's one equation with two unknowns. This is an example of the general problem of assigning probabilities to various pieces of data; the assigned probability unavoidably depends on that non-Bayesian reasoning stuff Bayesians are trying to improve upon.

Note also that neither p(A) nor p(B) can even be properly collected by the clansmen. Their sample size is too small and the demands of hunting and gathering don't help. When our society attempts to probe questions at the limit of our understanding, we run into the same problem - plus, just like the clansmen, it is difficult to even know that our sample is somehow flawed.

In the end, the first clan was mostly from a warm, wet sub region, while the second clan's region had days that were significantly hotter and drier. Their error was not in their logical progression - the first thought that they could tell from the grass if it rained while they slept, the second realized that they couldn't tell for sure - but in how they tried to state their observations. Ultimately, both were right, and both were wrong. And, obviously, I should be the one who gets to go yak hunting.

This is just not compatible with how human reasoning naturally works. Setting aside true random events - for which Bayes' theorem undoubtedly works better than human reasoning - the actual probability of any hypothesis being true is either 0 or 1. (Within certain tolerances - otherwise all hypotheses are false, probability 0, because they're not exactly true.) Bayesians aside, all human reasoning reflects this by backing one horse above all others. Personally I find this handy, as for some reason once I've strongly stated the hypothesis that wins given my available information, I find it much easier to gather and remember related information, often poking holes in my own ideas within days.


Fallacies

Hot Hand Fallacy

So rather than focusing on how it might be true, (ironically, the best way to activate one's confirmation bias) let's turn this fallacy around and see what it might be useful for. (As inspired by a comment.)

How many fair rolls were there in the ancestral environment? Look at how much effort has to go into making fair dice and roulette wheels, and how much effort goes into certifying and inspecting them. Fair gambling is high tech; the stone age would never have seen a truly fair gamble.

However, look at the exact detail of people's beliefs about dice. A die that is giving up wins easily is likely to give up more wins, but each win deplete the die, making further wins less likely. I can't think of a place that kind of belief would have not been useful for a hunter-gatherer. Plants grow in patches. Animals need to find each other to breed. But, every time you actually bag a particularly juicy plant or animal, that's one less left to find.

Even setting that aside, I can bring up the issue of how random distributions actually look:

Random noise in 2d

(from here)

Notice that even if a plant is actually randomly distributed, it will still end up in clumps and veins. There will be hot spots and cold spots. Similarly, if animal activity is linked to weather, the good days and bad days will often occur in clumps. In general, if your hunting or gathering is going well, it is highly rational to expend some extra effort that day, because it will likely be rewarded more efficiently than usual.


The Base Rate Fallacy actually has a point, as these kinds of calculations come up almost exclusively in professional contexts. However...I can get them right, and it's not by trying Bayes' theorem. I instead use the physicist approach. "Probability is defined as the prevalence divided by the total population," I say to myself. Then, using this definition, I simply have to find the relevant numbers, which is straightforward as long as I understand their definitions. In the case linked, it's the number of homosexuals with the disease divided by everyone with the disease, and indeed they also dodge Bayes' theorem by doing the calculation explicitly.

Moreover, when I get the wrong, the reason is because I'm using shortcuts learned during probability classes in public school. I strongly suspect that a physician mis-estimating, say, the odds of a positive mammography to mean cancer, is being mislead by exactly the same training. The cure is not Bayesianism. The cure is to teach math properly.

Conjunction Fallacy

This is a fun one. First, I will forward a hypothesis as to why framing is important. It's because survey subjects will always read things into the question that aren't there, and survey designers not only have no idea what those things are, but are often ignorant that they need to design surveys to account for this. My source for this hypothesis is something I think every philosopher will empathize with; how long it took me to train myself to only read what was there, and not hallucinate all sorts of interesting random things, and subsequently ascribe them to the author.

With this in mind, between the possibilities,
  1. The United States will withdraw all troops from Iraq.
  2. The United States will withdraw all troops from Iraq and bomb Iranian nuclear facilities.
Why would survey subjects respond that #2 is more likely? I can think of myriad possibilities. Here's one: into option #1, they read, "will withdraw for no reason at all," while into #2 they read, "will withdraw because they want to bomb Iranian nuclear facilities." Both are narratives, but only the second makes any damn sense.

Here's another: in amateur introspectors, the positive feeling of reading a narrative and the positive feeling of high probability are not distinguishable, as they're quite similar.

Oh great Bayesian survey-writer, Accept your Ignorance. You have no idea what your surveys are actually testing, which kind of means more surveys will not help very much.

As below, I like testing things. Let's take these ideas for a spin.
"Is it more likely that Linda is a bank teller, or a bank teller and feminist?"
The average person will read that as,
  1. Linda is a bank teller and not a feminist.
  2. Linda is a bank teller and a feminist.
Oddly, they ascribe more likelihood to #2.

Why will they read it this way? Because that's how the average person talks. For me, it took quite a bit of mathematical education before I read 'x > 3' as what it means; x can be four, or ten, or a million, or 29 436 502 974, or infinity. Before that point, I read 'x is more than three' as meaning something like eight or so, but probably not beyond ten. As indeed, if you want to talk about things that are in the millions, you need to say, 'x is somewhere over a million.' If it's absolutely necessary to talk about things with a wide range, people say so explicitly, "The variable 'x' can vary widely, sometimes as low as four, but can reach heights vastly exceeding a billion." Further, what non-mathematical object has properties like that? Certainly, that quoted sentence is not one a journalist will ever have cause to use.

Note that you probably just made the exact error I'm detailing. 'have cause to use' does not mean it's the best option, only that it could be accurately used. I'm saying that a journalist will never cover a subject where this sentence will even be useful. (Point of prose: I could say that, but it doesn't seem any less likely to produce the same error, and is drier.)

These ideas spin well. Let's do it again.

"Consider a regular six-sided die with four green faces and two red faces. The die will be rolled 20 times and the sequences of greens (G) and reds (R) will be recorded. You are asked to select one sequence, from a set of three, and you will win $25 if the sequence you chose appears on successive rolls of the die. Please check the sequence of greens and reds on which you prefer to bet.

1. RGRRR
2. GRGRRR
3. GRRRRR

65% of the subjects chose sequence 2, which is most representative of the die, since the die is mostly green and sequence 2 contains the greatest proportion of green rolls. However, sequence 1 dominates sequence 2, because sequence 1 is strictly included in 2. 2 is 1 preceded by a G; that is, 2 is the conjunction of an initial G with 1. This clears up possible misunderstandings of "probability", since the goal was simply to get the $25."



Um...no it doesn't? Again, I have no idea what the subjects are actually perceiving when they read those options, but it's almost certainly not strictly what's there.

Perhaps they're taking the examples as characteristic of a longer sequence.

Or...it's quite difficult to set aside one's general problem solving habits - which must work, or they'd be rapidly changed - just because some researcher is offering you $25. They may solve the problem this way: "What does the most probable sequence look like?" (In me, this part is involuntary.) They then compare the presented sequences to their ideal sequence. This is much, much faster and more efficient than actually solving the problem at hand. (Employers constantly complain that their employees can't follow directions, incidentally.) The gains from efficiency, in real life, outweigh the costs in efficacy. As a bonus, this interpretation post-dicts their responses.

Or...as amateur introspectors, they're solving it by comparing the greenness feeling of the dice to the greenness feeling of the sequences. For some bizarre reason, this process works better with ratios than with precise numbers. This error is solvable with better math education; mine makes me accept that, oddly, my feelings don't perform math well, and that I have to do the calculation explicitly if I want a precise answer. Err, kind of like all logic.

Moreover, it's not even guaranteed they have an accurate definition of 'probability.' So...are we assuming people are born knowing what 'probability' actually means, and we just later learn the word for it? Are we assuming that, since everyone assumes their own reason is reliable, (as indeed you must) that the default assumption is that each person will assume they can accurately calculate probabilities? Before speculating at the assumptions of the subjects, it's kind of necessary to know the assumptions of the researchers.

While this section is largely made up of just-so stories, these plausible scenarios are not even acknowledged, let alone addressed, by Bayesian proponents.

Ironically, all these fallacies are the result of a Bayesian process. Evolution picks priors by chance, and then the priors are decremented by killing people holding the wrong ones. Literally, your possible ancestors who thought more 'rationally' all got killed. At the very least they got killed in the contest of baby counts - our actual ancestors were just better baby factories. While it may throw up some interesting bugs in a modern environment, are you really comfortable saying that all those people who died were right, and the ones who lived were wrong? So what...does that mean their rightness or wrongness was irrelevant at the time, or have humans, as a species, just been monumentally unlucky?

Bayesian Reasoning Tells Us About Ignorance

The standard thing to do when you have multiple possibilities and no further information is to assign equal possibilities to each one. Say I find my car is flattened, and the possibilities are bigfoot and alien landing site, but as apparently alien ships are foot-shaped, there's no way to tell the difference, and thus they get 50% each. I can sue bigfoot, but the alien landing means we're being invaded, so it matters to get it right.

From this, I can narrow my search - if I find no bigfoot scat, it's probably aliens, and if I find no green glowing alien exhaust, it's probably bigfoot. However, as additional possibilities accumulate - 33% each, 20%, 16%, and so on, the first effect it that it starts telling me more about my own mind than about the singular event that actually happened. The possibilities I come up with start being less about my smushed car and more about what sheet metal reminds me of, or even ideas I just happen to find evocative. (Heh heh.)

What is the probability of dark matter, given our gravitational lensing pictures of colliding galaxies? The converse, the probabily of not-dark-matter, given same, is basically a function of how many other theories you can come up with - as a Bayesian, you have to share out equal probabilities in the case of ignorance, and as you add more possibilities, dark matter starts out as a smaller and smaller piece of the pie, but the relative growth from the pictures is the same...

More importantly, as possibilities multiply, the odds that even one of them are like-correct drops. So it's not really 33-33-33, it's more like there's three possibilities, plus the possibility I have no idea what I'm talking about, so 20-20-20-40. As ignorance grows, the chance you don't even know enough to properly quantify your ignorance grows as well.

Our ignorance of dark matter is probably on par with the ancient Greeks' ignorance of regular matter. It turns out Democritus was less wrong about this, but even he was wrong all over the place. Similarly, the odds that dark matter is neither MACHOs nor WIMPs is pretty high.

So whenever I learn something new, I want to try it out. I'm going to try this one out on extra-terrestrial civilization.

A lot of people want to know if there's other intelligence out there.

Actually, SETI is kind of ironic, as they're so sure ETI is out there, it's hardly worth their time to actually find it - a bit like the first clan was so sure the grass has to be wet after rain.

But, looking around the universe, we don't see evidence of any engineered structures. Now, p(ETI|solar-scale engineering) equals pretty much one, but what about p(ETI|~solar-scale engineering)? Well, it's not like we produce any solar scale structures, but who knows what we'll be capable of the in the future?

So, err, precisely - who knows what we will be capable of a million years in the future? Who knows what we'll find worthwhile a million years in the future?

Going back to the original equation - so what's p(solar engineering) - the general odds of looking at a solar system and finding engineering there? What's p(aliens) - the general odds of looking at a planet and seeing complex life there? We can't even gather these numbers. Drawing in the many-possibilities thread, we don't even know if life has to look all carbony like we do or not. You can run spectrographic analysis on exoplanets that transit the star, and you might find both water vapour and methane in the atmosphere, but it doesn't tell you very much.

If you limit yourself to Bayesian reasoning, these undefineable numbers will remain so forever. Sure, you can define any hypothesis[2] of p(aliens) and p(~aliens) you want, such as 50% each, and revise p(aliens) downward every time you see a planet with no life on it, but you'll run into something of a barrier in knowing how much to revise it downward; calculating p(aliens|lifeless planet) uses p(lifeless planet|aliens) which is zero, and this zero reduces Bayes' theorem to the trivial 0=0. (This is why you're not supposed to use 1 or 0 as probabilities, but it's not like 0.0001 is much better, and indeed even how many zeros to put after the decimal is a judgement call in this case.) Without knowing beforehand the actual p(aliens) and p(~aliens) for a random planet, Bayes' theorem is powerless. In other words, you have to already know the answer to get the right answer.

The summation of this debacle is thus: are you a philosopher? No? Then, if you have a philosophical thing to say, you should Google 'philosophy forums' and pick one, then sign up and post, "Hey there good Mr. and Ms. philosophy dudes, does this make any bloody sense at all?" They'll generally reply, "No! And get off my lawn!" In addition, they'll be happy to tell you what you should be doing instead, as vetted by thousands of years of the smartest people on the planet. (On many issues, philosophy hasn't advanced beyond Aristotle because he got it right.) If that doesn't work, I understand professors are more than happy to answer the curiosity of the public - find one's email and email them. (At least, it works for me. If it doesn't for you, try a second philosopher.) I don't run around saying I know statistics better than you, so how about you do me the honour of not pretending you know philosophy better than I do?

Tangents:

[1] Actual Bayesians, I would guess, have roughly zero percent chance of being persuaded, p(open-mind-on-this-issue|Bayesian) -> 0.

[2] Hypotheses come, surprisingly enough, from outside Bayesian reasoning, even according to Bayesians. How one comes up with a good hypothesis is not addressed.

As an epistemologist, I almost always should avoid probability. That particular problem is a problem for ontologists.