« True Ending: Sacrificial Fire (7/8) | Main | Epilogue: Atonement (8/8) »

February 05, 2009

Comments

This was a very helpful post.

The article implies that, in order to combine likelihood ratios, you need to reason from the likelihood ratio, together with your knowledge of the other person's prior, to obtain the outcome of the other person's experiment. Even this procedure isn't given algebraically, only the "forward" direction, from the outcome of the other person's experiment to the likelihood ratio.

Is there an algebraic way to combine likelihood ratios, without going through the outcome of the other person's experiment?

in our example with the balls

Should this be "coin"?

To combine likelihood ratios, you just multiply them. That's the simplicity of it.

If you see evidence with odds of 4 to 1 if a hypothesis is true versus false, and I see evidence with odds of 2 to 1 if the hypothesis is true versus false, then together we have seen evidence with a likelihood ratio of 8 to 1 in favor of the hypothesis's truth.

Didn't Jaynes suggest logging the likelihood ratios as well? Then you could just add them. Of course, maybe logging is trickier to do in the bar.

Chris Hibbert's method seems simpler, and I like Hal Finney's suggestions for updating. Maybe Chris & Hal will weigh in with their pleasant methods soon. Overall, very nice Anna!

Anna and Steve: Oh, hey, that's cool, thanks. Never really realized trading likelihoods makes it that easy, though in retrospect...

Anyways, thanks for that!

Carlie: logging may be good when the numbers are well behaved and when the conversion to nats or bits or whatever is the "obviously right thing to do"... However, from numerical analysis type stuff, it's known that when actually computing stuff, and working with finite amount of precision and thus trunctuation error and so on, adding things incurs a much higher error rate than multiplying them, as a general rule. So, especially with limited precision, better to do multiplying of the likelihood rations than adding of the logs, IMHO

These calculations seem to assume that the evidence available to the different parties is non-overlapping. Outside of cases like coin tossing or forming judgments about a population from random samples, that's a dubious assumption.

Which, of course, you mention at the end.

This may well be the most useful post ever on Overcoming Bias. It is also right on the target defined by the title...

One problem. The post assumes readers know what likelihood ratios are. Random people trying to make good decisions won't know (and won't know the point Eliezer makes either). Also as you pointed out and several commenters noted the hard part is independence.

So you need to back out a bit and write a post that explains how to know your likelihood ratio and how to judge the independence of LRs from different sources. Preferably write the explanation mostly with examples and without relying on the term "likelihood".

Then XKCD can translate it into a cartoon poster and the world will be saved.

One other point. This seems very closely related to "saving the appearances" and similar observations from the history of science. Basically it looks like scientific revolutions change the priors of the scientists involved, but rarely change evidence that has been accepted by consensus up to then. Of course the resulting beliefs can change radically... but if everyone is talking in LRs this is less of a problem.

Aumann assumes common priors so the posterior estimates contain the same info as the likelihood ratios. I agree LRs are more arithmetically convenient in cases like this where background assumptions cause convergence in a single step, but what happens with many iterations?

Another advantage of using likelihood ratios is that they're well-defined even in situations where prior probabilities cannot be sensibly quantified.
‹can of worms›

(In more general examples, combining likelihood ratios may not lead to more extreme beliefs, but it almost always leads to more specific beliefs.)

What does "more specific beliefs" mean? "The Flying Spaghetti Monster personally directly causes every individual particle's movement constantly" is a very specific belief.

I think it's easier to see the role of likelihood ratios if you look at the Bayes' formula expressed in the right way. Here I shamelessly plug an old introductory blog post of mine that contains those.

"James, to end up with a 39% posterior on X being heads-weighted, must have seen four heads and one tail:"

Or 81 heads and 46 tails ~ 39.45%

Your perfect Bayesian needs a prior on the number of trial seen by each participant.

(and since ln(2/3)/ln(2) is irrational there's an infinite number of infinitely close approximations of 39%, you need to find a and b such that a ln(2/3) + b ln(2) ~ ln((1/0.39 -1) * 0.2/0.8)

(16 and 8 -> ~ 39.08% is even closer)

steven: If there are common priors, and Jane and James want to know the value of θ and are communicating likelihood ratios or relative likelihood functions p(data|θ), and background assumptions do not cause convergence in a single step, then there must be another relevant variable ζ whose value Jane and James do not know, and Jane and James must be communicating marginal likelihoods of the data with uncertainty about ζ integrated out. What you have described is almost parallel Aumann updating of conditional beliefs about ζ for each value of θ as ζ relates to the data. We don't know how to write that post yet. Until then, Jane and James should share conditional likelihood functions p(data|θ;ζ).

Aaron, thanks. Fixed.

Arthur, if your Bayesians trade posterior beliefs (as Jane and James do in our initial example) then, as you point out, they need a prior on how many coins the other party has seen. (We hoped the "five times" would be read as part of the problem specification, but our writing was ambiguous, so it's good you pointed it out.) Also, you may well know this already, but for anyone else: if Jane and James instead trade likelihood ratios (as they do in our second example), they don't need to know how many coins the other party has seen. It's another nice feature of working with likelihood ratios. Likelihood ratios combine "how much data have you seen?" and "how strongly did your data point to [the coin's unfairness / Jack's amazingness / whatever] into a single number.

Greg, we mean for example that the region of "how much amazingness Jack might plausibly have" will shrink as you pool more data about Jack (e.g., after a while, maybe we're 90% certain that Jack's amazingness is between the 73rd and 74th percentiles). The more data you pool, the more sharply your data can distinguish between differing theories, including theories that are fairly close to one another (e.g., "Jack is at the 73rd percentile of amazingness" vs "at the 74th"), and so, in that sense, the more sharp (which we were glossing as "specific") your posterior is likely to be.

A helpful and well-written post. Aumann's agreement theorem gets mentioned so much, I'm surprised we haven't had an example like this on OB earlier. In particular, I had been wondering whether the agreement theorem says that the two parties can end up with an estimate which is not between their two individual estimate.

I want to write an OB post too...

Under what conditions does the naive "average posterior probability weighted by expertise" heuristic work?

Liron: "Aumann's agreement theorem gets mentioned so much, I'm surprised we haven't had an example like this on OB earlier."

See Hal Finney's "Coin Guessing Game" from two years ago.

>>> Likelihood ratios combine "how much data have you seen?" and "how strongly did your data point to [the coin's unfairness / Jack's amazingness / whatever] into a single number.

Suppose we have two sets of observations:

Set 1: 2000 heads, 1000 tails
Set 2: 2 heads, 1 tail

If I understand the term 'likelihood ratio' correctly, the likelihood ratios here are the same for the both observation sets, 2:1. If so, I can't tell "how much data have you seen" judging from the ratio alone. Yes, I can get a good guess for a ratio like 1562:1, but that won't work with ratios like 2:1.

Vladimir,
A "likelihood ratio" is how likely your observations are under alternative theories. In the weighted coin example, it is
your likelihood ratio for your first set of observations is P( Set 1 | weighted coin ) / P ( Set 1 | fair coin ) = ( .75^2000 * .25^1000 ) / (.5^2000 * .5^1000) = 10^51. (I.e., that first set of observations is *very* strong support for the theory that the coin is weighted.)

In contrast, the likelihood ratio for your second observation set is P( Set 2 | weighted coin) / P( Set 2 | fair coin ) = (.75^2 * .25) / (.5^2 * .5) = 1.12, i.e. the second set of observations is 1.12 times as likely to occur if you have the weighted coin as if you have the fair coin.

That said, I did not mean to say you can infer "how much data I've seen" from my likelihood ratio. What I meant to say is that everything the two of you need, if you are to correctly update your posterior beliefs, is contained in your prior plus your and the other person's likelihood ratios. That is, with likelihood ratios you do not need to keep separate track of "how much data have I seen?" and "how extreme was the data?" -- a single number tells you the important part of both measurements.

Vladimir: Likelihood ratio isn't "guess the probability the coin lands heads", it's "which is more likely, a fair coin or a 75% heads coin?"

P(Set1|75%) = (0.75)^2000*(0.25)^1000*C(3000,2000) and
P(Set1|50%) = (0.5)^3000*C(3000,2000)

so the likelihood ratio from Set 1 is P(Set1|75%)/P(Set1|50%) = 1.4 x 10^51, and from Set 2 it's only 1.125. Set 1 is wildly improbable but it's hugely more likely to result from a 75% coin than a 50% coin.

Jed, Thanks for the encouragement and suggestions. I'll play with your suggestions, re: scientists and re: a more general audience. Do you know any good writeups to draw from? By far the best I've found are Eliezer's An intuitive explanation and A technical explanation (and Jaynes, if we include books, though I haven't yet read most of it).

Frelkins, thanks. Could you point me to the explanations by Hal and by Chris Hibbert?

Liron, good question. One simple example where the heuristic roughly works is if you have a weighted coin, with a uniform prior over coin-weights, and you and your partner are each estimating the probability that the coin will come up heads on the next toss. ("More expert" here equates to the number of coins you have each seen). A second is if you and your partner are both estimating a random variable (e.g., a person's "true math ability") and each of your measurements is the sum of the person's "true math ability" and a normally distributed random error term. (The "more expert" of you either has more of these measurements or a smaller error term). Anyone want to step in here with a general analysis?

On the SAT score adjustment, most people do not know that male scores have a higher variance than female scores, nor do they know how much more variance, nor do they know how to combine gender means and variances with one or more particular scores to produce a posterior estimate. So in practice just publishing raw scores will mostly result in ignoring those differing distributions. Yes, you'd want to adjust an estimate based on multiple tests differently that from a single test, but it still seems to me that in practice we'd be better off if the testing agency did this math and published a single best estimate. After all, if you really knew what you are doing, you could use your knowledge of the means and variances to invert their calculation to obtain the likelihood ratio you seek.

In a comment to my old posting on the coin guessing game linked to above by Z.M. Davis, I gave an example I'd like some help with (slightly modified here):

Jane and James each privately toss a coin. They want to guess the probability that both coins land heads. Let's suppose the coins are in fact both heads.

The prior for both heads is 1/4, but having observed that their own coin is heads, each estimates the probability to be 1/2. So round one goes:

Jane: P(both heads) = 1/2
James: P(both heads) = 1/2

They seemingly agree. However, upon exchanging these values, however, each can immediately change their probability estimate to 1. That's because hearing "1/2" from the other player means their coin must have landed heads, since if it had been tails they would have known the probability for both heads was 0. Round 2:

Jane: P(both heads) = 1
James: P(both heads) = 1

So this is another example where exchanging Bayesian estimates leads to an updated estimate outside the range of the two. And it's even more curious to me, because they seemingly agreed from the first, and yet they both changed. In my article I raised the question of whether, when exchanging Bayesian estimates, one might see several rounds of disagreement, then agreement, then more disagreement? I also claimed that in the famous 3-hats puzzle, and its generalization to N hats, one might see multiple rounds of estimates that agree, followed by a change in estimate (but I haven't tried to verify that claim). This leads to the question of how Bayesians can know that they have truly reached agreement.

I tried to work my problem with likelihood ratios, but I got the wrong answer. P(I observe heads | both are heads) = 1. P(I observe heads | not both are heads) = 1/3, because there are 3 non-both-heads possibilities: TH, HT, TT. This gives a likelihood ratio of 3. Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

This leads to the question of how Bayesians can know that they have truly reached agreement.

I think they know, when they could have predicted each others' stated estimates with certainty. Because that means the estimates provided no new information. In this example, they couldn't have predicted each others' estimates in the first round, which could have been 0 or 1/2 with equal probability. But they could have predicted each others' estimates in the second round.

Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

Independence is missing in this case, so you can't just multiply them. If you want independence, you have to let Jane and James each observe a coin toss chosen randomly and independently from two coin tosses, instead of letting them each observe a different coin toss.

I think in general, it's easier to constructively force likelihood ratios to be independent, than to know that two arbitrary likelihood ratios are independent. It's a bit similar to how you can write a program with certain properties, but can't know whether an arbitrary program has that property.

Wei, thanks, that makes sense about convergence in Bayesian updating. That's very surprising that Jane and James each observing a private coin flip is not independent! Of course their observations are not independent of the outcome, but then that would always be the case for relevant information. I certainly would have thought that observing a private coin flip would be independent information.

For the example you describe, we have 2 coins flipped out of sight, then each player is shown a randomly chosen coin, and they don't know if they saw the same coin or different ones? Let's assume again that both coins are heads. I think the likelihood ratio, upon seeing heads, is still 3. A priori odds are 1:3. This checks out, multiplying prior odds times likelihood gives odds of 1:1 or 50% for heads, which is correct. Exchanging likelihood ratios and multiplying them gives 9, for final odds of 3:1 in favor of both heads, or probability of 3/4.

But I get a different answer if I count. There are 16 possibilities for the two coins and each observing the Left or Right coin:
HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, HTRL, HTRR, THLL, THLR, THRL, THRR, TTLL, TTLR, TTRL, TTRR.
After observing heads, Jane (the 1st player) knows it is one of:
HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, THRL, THRR.
This gives P=1/2 for both heads, which is correct. After learning that James also saw heads (otherwise he would have said P=0), she knows it is one of:
HHLL, HHLR, HHRL, HHRR, HTLL, THRR.
Of these 6, 4 are both heads, giving P=2/3 for both heads, or odds ratio of 2:1, not the same as what I got before. This answer seems more likely to be correct.

Maybe I'm making a dumb mistake, or perhaps I misunderstood your example for independence?

Hal, you're right and my example isn't independent either. This is trickier than it seems. In order to multiply odds ratios, we need

P(we both observe heads | not both are heads) =
P(I observe heads | not both are heads) * P(you observe heads | not both are heads)

In general,

P(we both observe heads | not both are heads) =
P(I observe heads | not both are heads and you observe heads) * P(you observe heads | not both are heads)

So we need

P(I observe heads | not both are heads) = P(I observe heads | you observe heads and not both are heads)

which fails to hold in both examples. In Hal's example, knowing you observed heads makes it less likely (actually impossible) for me to observe heads if not both are heads. In my example, knowing you observed heads makes it more likely for me to observe heads because it rules out the "both tails" possibility.

I certainly would have thought that observing a private coin flip would be independent information.

In order to multiply odds ratios, we need our individual observations to be independent conditional on the hypothesis being true, and independent conditional on the hypothesis being false. In Hal's example, the observations are unconditionally independent, but not independent when conditioned on "not both heads". (In my example, the two observations are just not independent, period. Don't know what I was thinking!)

One example of multiplying odds ratios Eliezer gave in http://yudkowsky.net/rational/bayes is three independent tests for breast cancer. But in real life, it is impossible to find three lab tests that are independent, conditional on both breast cancer, and on no breast cancer. In the no breast cancer case, especially, getting one false positive should increase the probability of the lab being sloppy, or having one's blood mixed up, or having a benign tumor, or something else that increases the probability of getting false positive on another test.

I'm not sure what lesson can be drawn from these examples, except "beware dependence"?

Great post! I would like to read more posts like this one dependent on math and less fiction.

Since people are asking for help, I'll take the liberty of asking for help on the problem of the cab fare and the extra twenty.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31