« Predicting the Future with Futures | Main | Just A Smile? »

April 11, 2007

Comments

I'm confused when you say that the prior represents all your starting information plus the way you learn from experience. Isn't the way you learn from experience fixed, in this framework? Given that you are using Bayesian methods, so that the idea of a prior is well defined, then doesn't that already tell how you will learn from experience?

Hal, with a poor prior, "Bayesian updating" can lead to learning in the wrong direction or to no learning at all. Bayesian updating guarantees a certain kind of consistency, but not correctness. (If you have five city maps that agree with each other, they might still disagree with the city.) You might think of Bayesian updating as a kind of lower level of organization - like a computer chip that runs programs, or the laws of physics that run the computer chip - underneath the activity of learning. If you start with a maxentropy prior that assigns equal probability to every sequence of observations, and carry out strict Bayesian updating, you'll still never learn anything; your marginal probabilities will never change as a result of the Bayesian updates. Conversely, if you somehow had a good prior but no Bayesian engine to update it, you would stay frozen in time and no learning would take place. To learn you need a good prior and an updating engine. Taking a picture requires a camera, light - and also time.

This probably deserves its own post.

Another thing I don't fully understand is the process of "updating" a prior. I've seen different flavors of Bayesian reasoning described. In some, we start with a prior, get some information and update the probabilities. This new probability distribution now serves as our prior for interpreting the next incoming piece of information, which then causes us to further update the prior. In other interpretations, the priors never change; they are always considered the initial probability distribution. We then use those prior probabilities plus our sequence of observations since then to make new interpretations and predictions. I gather that these can be considered mathematically identical, but do you think one or the other is a more useful or helpful way to think of it?

In this example, you start off with uncertainty about which process put in the balls, so we give 1/3 probability to each. But then as we observe balls coming out, we can update this prior. Once we see 6 red balls for example, we can completely eliminate Case 1 which put in 5 red and 5 white. We can think of our prior as our information about the ball-filling process plus the current state of the urn, and this can be updated after each ball is drawn.

Hal,

You are being a bad boy. In his earlier discussion Eliezer made it clear that he did not approve of this terminology of "updating priors." One has posterior probability distributions. The prior is what one starts with. However, Eliezer has also been a bit confusing with his occasional use of such language as a "prior learning." I repeat, agents learn, not priors, although in his view of the post-human computerized future, maybe it will be computerized priors that do the learning.

The only way one is going to get "wrong learning" at least somewhat asymptotically is if the dimensionality is high and the support is disconnected. Eliezer is right that if one starts off with a prior that is far enough off, one might well have "wrong learning," at least for awhile. But, unless the conditions I just listed hold, eventually the learning will move in the right direction and head towards the correct answer, or probability distribution, at least that is what Bayes' Theorem asserts.

OTOH, the reference to "deep Bayesianism" raises another issue, that of fundamental subjectivism. There is this deep divide among Bayesians between the ones that are ultimately classical frequentists but who argue that Bayesian methods are a superior way of getting to the true objective distribution, and the deep subjectivist Bayesians. For the latter, there are no ultimately "true" probability distributions. We are always estimating something derived out of our subjective priors as updated by more recent information, wherever those priors came from.

Also, saying a prior should the known probability distribution, say of cancer victims, assumes that this probability is somehow known. The prior is always subject to how much information the assumer of a prior has when they being their process of estimation.

Eliezer ,

Just to be clear . . . going back to your first paragraph, that 0.5 is a prior probability for the outcome of one draw from the urn (that is, for the random variable that equals 1 if the ball is red and 0 if the ball is white). But, as you point out, 0.5 is not a prior probability for the series of ten draws. What you're calling a "prior" would typically be called a "model" by statisticians. Bayesians traditionally divide a model into likelihood, prior, and hyperprior, but as you implicitly point out, the dividing line between these is not clear: ultimately, they're all part of the big model.

Barkley, I think you may be regarding likelihood distributions as fixed properties held in common by all agents, whereas I am regarding them as variables folded into the prior - if you have a probability distribution over sequences of observables, it implicitly includes beliefs about parameters and likelihoods. Where agents disagree about prior likelihood functions, not just prior parameter probabilities, their beliefs may trivially fail to converge.

Andrew's point may be particularly relevant here - it may indeed be that statisticians call what I am talking about a "model". (Although in some cases, like the Laplace's Law of Succession inductor, I think they might call it a "model class"?) Jaynes, however, would have called it our "prior information" and he would have written "the probability of A, given that we observe B" as p(A|B,I) where I stands for all our prior beliefs including parameter distributions and likelihood distributions. While we may often want to discriminate between different models and model classes, it makes no sense to talk about discriminating between "prior informations" - your prior information is everything you start out with.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31