« Building Something Smarter | Main | The Evil Pleasure »

November 03, 2008

Comments

Can you give an example of the kind of argument that you are trying to refute here?

The argument:

"a (real-world) agent such as a human or AI cannot self improve because the K-complexity of a *closed* system is constant"

seems obviously silly. A human or AI is not a closed system, it receives sensory inputs, and no-one would argue that self-improvement has to happen without any sensory inputs.

Aside: I would caution, echoing Eliezer, that theoretical results about what is computable and what isn't, and about Turing machines with infinite memory and how many 1's they can output before they halt are hard to apply to the real world without making subtle errors causing you to falsely pump your intuition. The real world problems that we face rely upon finite machines operating in finite time.

"Then the "Kolmogorov complexity" of that entire universe - throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...would be 500 bits."

I must be missing something... what exactly do you mean by the phrase "the 'complexity' of that entire universe... would be 500 bits."

I would think the "complexity of the universe" would include the location of every bit of matter at all times, among other things. So, maybe I just don't understand what you mean by the "complexity of" something.

Also, I'm not sure whether you believe that all human behavior is purely deterministic in the most extreme sense. Or, do you think people can actually make their own decisions, i.e., "free will"? Obviously, if someone can turn either left or right at any given time, then that decision is still not explained (in any predictive sense) by the laws of physics.

Wouldn't the 500 bits have to include understanding all psychological/economic behavior?

Roko: here.

Henry: Kolmogorov complexity.

what exactly do you mean by the phrase "the 'complexity' of that entire universe... would be 500 bits."

He means 500 bits plus log2 the size of the initial conditions, plus log2 the number of instants elapsed since then - which is the number of bits you would need to specify a current multiverse state under the specified assumptions.

Roko: here.

Henry: Kolmogorov complexity.

Tim, I already specified I was talking about the whole universe throughout time and space (and inflation and decoherence). The laws probably don't have any initial conditions under that specification, even if different things happen in different places.

Henry, follow the link to the Mandelbrot set, then look up how the Mandelbrot set is computed (a single equation).

You're right about what you said about the instants. However, we don't know how big the initial conditions were - and I'm not clear on what it would mean for them not to exist. Maybe you are talking about eternal return? Even then there's a specification of which path you are on.

@Nick Tarleton:

Thanks, Nick. I can see I'm going to have to spend an exciting evening or 10 alone with the SL4 archives...

Perhaps one could consider Eliezer's OB posts as an "edited and sanitized summary of the morass of SL4"?

Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...

Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more "complex" than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can't be any more "complex" than that.

and later: ...then this intrauniversal civilization - and everything it's ever learned by theory or experiment over the last up-arrow years - is said to contain 400 bits of complexity, or however long the original program was.

"You say it as if it were a good thing."

But this seems, on the contrary, to indicate that Kolmogorov complexity might not be a good candidate for the formalized notion of intuitive complexity, including the one used by Occam's Razor. The fact that it often can be anti-intuitive doesn't necessarily mean that it's our intuition of what is simple and what is complicated that must change; the mere fact that the notion of Kolmogorov complexity is itself very simple and easy to play with mathematically doesn't by itself entitle it to anything.

There exists a proof of Fermat's Last Theorem with very small Kolmogorov complexity, much smaller than, say, the size of this post of yours in characters, or the Kolmogorov complexity of some well-known formal proof of a basic theorem in calculus. Does that mean that this proof of FLT is a much simpler object than these others? It does if you insist on interpreting "simple" to mean "small Kolmogorov complexity", but why must you so insist? If you show this proof to a mathematician, he or she won't say "oh, this is very simple, simpler than basic calculus proofs." If you then tell the mathematician that in fact this very complex proof was obtained by running this very short Turing machine, they'll just shrug in response - so what? As a piece of mathematics, the proof is still evidently extremely complicated. Maybe your notion of complexity just isn't so goo d at capturing that.

The "500 bits" only works if you take a hidden variable or Bohmian position on
quantum mechanics. If (as the current consensus would say) non-linear dynamics can amplify quantum noise then enormous amounts of new information are being "produced" locally everywhere all the time. The current state of the universe incorporates much or all of that information. (Someone who understands the debates about black holes and the holographic principle should chime in with more precise analysis.)

I couldn't follow the whole argument so I'm not sure how this affects it, but given that Eliezer keeps referring to this claim I guess it is important.

Tim: The initial wavefunction may be completely determined by the laws, and under MWI that's all you need.

Roko: I actually don't notice much duplication between OB and SL4. It's definitely worth reading.

Will: Why don't you take it seriously? You know the Mandelbrot set is generated from very few bits, and I don't imagine you expect there to be a simple derivation of its structures. Taken literally, your request is highly unreasonable because it involves so many levels of poorly understood processes - inflation, baryogenesis, galaxy formation... and the answer is probably infinite anyway. Still, I wouldn't be surprised if a better physicist could produce a reasonable first-principles estimate of the number of baryons in the observable universe.

Jed: Or many-worlds.

Will Pearson: Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...

Been done. It didn't work out, and Eddington's theory is now only history, but the example shows that it is not out of the question that there be such laws.

It might be worthwhile to note that cogent critiques of the proposition that a machine intelligence might very suddenly "become a singleton Power" do not deny the inefficacies of the human cognitive architecture offering improvement via recursive introspection and recoding, nor do they deny the improvements easily available via hardware substitution and expansion of more capable hardware and I/O.

The do, however, highlight the distinction between a vastly powerful machine madly exploring vast reaches of a much vaster "up-arrow" space of mathematical complexity, and a machine of the same power bounded in growth of intelligence -- by definition necessarily relevant -- due to starvation for relevant novelty in its environment of interaction.

If, Feynman-like, we imagine the present state of knowledge about our world in terms of a distribution of vertical domains, like silos, some broader with relevance to many diverse facets of real-world interaction, some thin and towering into the haze of leading-edge mathematical reality, then we can imagine the powerful machine quickly identifying and making a multitude of latent connections and meta-connections, filling in the space between the silos and even somewhat above -- but to what extent, given the inevitable diminishing returns among the latent, and the resulting starvation for the novel?

Given such boundedness, speculation is redirected to growth in ecological terms, and the Red Queen's Race continues ever faster.

I would drop dead of shock if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states.
There exists a constant C so that if you want to solve the halting problem for all Turing machines up to N states, you can use a particular Turing machine with N+C states. Moreover, this constant C is small (definitely less than 100). In other words, there exists a 500-state Turing machine that solves the halting problem for all the Turing machines up to 400 states.

Here's the algorithm: given as input a Turing machine M with less than 400 states,
1. Compute a number k greater than BB(400).
2. Simulate M for k steps.
3. If it halts by then, then output "M halts". If it doesn't, then output "M doesn't halt".

Correctness proof: If it halts in less than k steps, then obviously it halts. If it doesn't, then by the definition of BB(400), it never will.

Analysis of number of states: This is possible with 400+C states because you use about 400 states for step 1, and a constant number of overhead states for simulating Turing machines. Because there exist universal Turing machines with less than 25 states, you can arrange for C to be less than 100.

There's a small amount of subtlety in actually doing step 1. Let me know if you want more detail. However, I believe the overall result and method should be clear; moreover, I hope the result is unsurprising once you see the method. As such, please don't drop dead of shock.

I thought you had some reason for giving us *extra* detail, why the number 500. Yeah it could be short as 500 bits, but it could equally be very long.

I will like shorter explanation for the universe if I am presented with one compared to a longer one, but there is no reason to expect one of a certain length. Since we don't have any that actually manage to do so, why attach any number to it?

I don't think there is a Theory of Everything for the universe. Just as that there isn't one for Mathematics.

What if the universe is not continuous, the simulation does lazy evaluation, and the laws of nature can only be described by an infinite number of bit? How would this change your reasoning?

*drops dead of shock*

Oh... well, at least there's no reasonable way for a superintelligence that can perform any finite computation, to obtain that particular 500-bit Turing machine, through any amount of internal thinking or induction from sensory experience, without using a reasoning method guaranteed at some point to make a mistake.

I don't understand. Am I too dumb or is this gibberish?

@pk I don't understand. Am I too dumb or is this gibberish?

It's not so complicated; it's just that we're so formal...

Kolmogorov complexity of universe is more than just a description of its physical laws.

First of all, besides the physical laws, K should include the initial state of the universe. It may be large or even infinite. And it should be properly expressed in qubits.

Second, you'd need to pick the "present state", whatever that means, our of all possible future and past states. In a classical universe it may be only a 100bits or so, in a quantum multiverse, it's at least 1 bit per every branch.

Third, it is not at all obvious that the laws of universe are computable. Maybe it uses an oracle (the multiverse branching process can provide such an oracle) or maybe it uses real values in a non-trivial way. if this is true then we should either say that K is infinite, or use K relative to an oracle.

Fourth, K is defined only up to an additive constant, so it's not really correct to talk about The Kolmogorov Complexity. You can only talk about Kolmogorov complexity relative to a fixed Turing machine. For different turning machines the difference between the Ks will be roughly equal to the number of bits in a translator between their machine languages. In the extreme case when the reference Turing machine IS the universe then K(universe) is just 0.

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

Eliezer, you're making an important error here. I don’t think it affects the main argument you're making in this article (that considerations of "complexity" doesn't rule out self-improving minds), but this error may have grave consequences elsewhere. The error is that while the environment does have to be magic, you don't need to have faith in this, just not have faith that it's impossible.

Suppose you get a hold of a black box that seems to act as a halting-problem oracle. You’ve thrown thousands of problems at it, and have never seen in incorrect or inconsistent answer. What are the possibilities here? Either (A) the environment really is magic (i.e. there is uncomputable physics that enables implementation of actual halting-problem oracles), or (B) the box is just giving random answers that happen to be correct by chance, or (C) you’re part of a simulation where the box is giving all possible combinations of answers and you happen to be in the part of the simulation where the box is giving correct answers. As long as your prior probability for (A) is not zero, as you do more and more tests and keep getting correct answers, it’s posterior probability will eventually dominate (B) and (C).

Why is this so important? Well in standard Solomonoff Induction, the prior for (A) is zero, and if we program that into an AI, it won’t do the right thing in this situation. This may have a large effect on expected utility (of us, people living today), because while the likelihood of us living in an uncomputable universe with halting-problem oracles is low, the utility we gain from correctly recognizing and exploiting such a universe could be huge.

Wei, essentially the black box has to be such that it seems more likely for it to be a halting oracle than something only slightly smarter than you are - since when it says "doesn't halt", and you don't know if it halts or not, it's not like you can easily check.

Suppose, though, that the black box has the apparent qualities of a closed timelike curve in a universe that permits CTCs to be constructed with perfection (even if our own universe permitted CTCs they would break down well before up-arrow time). Then it would seem pretty plausible, after constructing the CTC and verifying its apparent performance on the cases you knew how to test, that the CTC wasn't a trickster SI in disguise.

But -

- in this case, you can just build a CTC inside yourself, as a closed system, and examine the results with your eyes shut. Your "complexity" won't change, it'll just have to be defined outside the Turing formalism.

So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn't theoretically do with eyes closed.

Good point, but when the box says "doesn't halt", how do I know it's correct?

If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon?

Suppose they run a BB evaluator for all of time. They would, indeed, have no way at any point of being certain that the current champion 100-bit program is the actual champion that produces BB(100). However, if they decide to anthropically reason that "for any time t, I am probably alive after time t, even though I have no direct evidence one way or the other once t becomes too large", then they will believe (with arbitrarily high probability) that the current champion program is the actual champion program, and an arbitrarily high percentage of them will be correct in their belief.

Boris:
There's a small amount of subtlety in actually doing step 1.

Isn't it simply impossible?
That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.

Nick wrote: Good point, but when the box says "doesn't halt", how do I know it's correct?

A halting-problem oracle can be used for all kinds of things besides just checking whether an individual Turing machine will halt or not. For example you can use it to answer various mathematical questions and produce proofs of the answers, and then verify the proofs yourself. You should be able to obtain enough proofs to convince yourself that the black box is not just giving random answers or just being slightly smarter than you are.

If P!=NP, you should be able to convince yourself that the black box has at least exponentially more computational power than you do. So if you are an AI with say the computational resources of a solar system, you should be able to verify that the black box either contains exotic physics or has access to more resources than the rest of the universe put together.

Eliezer wrote: So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn't theoretically do with eyes closed.

It seems to me that an AI should/can never completely rule out the possibility that the universe contains physics that is mathematically more powerful than what it has already incorporated into itself, so it should always keep its eyes open. Even if it has absorbed the entire universe into itself, it might still be living inside a simulation, right?

+1 to Anatoly Vorobey. Using K-complexity to capture the human notion of complexity seems to be even worse than using game-theoretic rationality to capture human rationality - something that's been attacked to death already.

Boris: There's a small amount of subtlety in actually doing step 1.

Douglas: Isn't it simply impossible? That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.

It's impossible to bound BB(n) computably in n, but that leaves open the following question, on which Boris's argument depends.

Let BBB(n), the Busy Beaver Bound function, be the size of the smallest Turing machine that, starting from a blank tape, prints out an upper bound for BB(n). Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.

Richard: Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.
c is the size of a universal Turing machine which, in addition to simulating its argument, keeps track of how many steps it simulates.
BB(n) is the maximum number of steps that any halting TM of size n performs, therefore there is some particular TM of size n that performs BB(n) steps. Give that TM as input the the aforementioned UTM, and it will compute BB(n).
The uncomputability of BB prevents you from finding the right TM, and from proving it correct if it's handed to you by a magic light in the sky. But the TM still exists even if you'll never know which it is.

Eliezer, I'm not quite sure why you are obsessed with the ability to find mathematical truths. You can different sets of mathematical truths if you take the parallel postulate true or not. Which ones are applicable in the real world depends where you are.

Unless you know where you are in the universe can not know what is friendly for you to do (or even useful).

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

As Wei mentioned, P≠NP is basically the conjecture that this isn't true: i.e., that you can exponentially increase your ability to solve math problems by your environment being magic and your not being born with faith in that fact. So for example, if your environment immediately inverted any one-way function, that would be evidence (no faith required) that your environment is not merely 'slightly' smarter than you are, but astoundingly smarter. In qualitative terms, I think it would be almost as astounding as if the environment solved the halting problem.

I would drop dead of shock

Eliezer, just as it was interesting to ask what probability estimate 'Nuts!' amounted to, I think it would be very useful for the forum of Overcoming Bias to ask what your implicit probability estimate for a 500 state TM being able to solve the halting problem for all TMs of up to 50 states.

I imagine that 'I would drop dead of shock' was intended to convey a probability of less than 1 in 10,000, or maybe 1 in 1,000,000?

@Toby: Why, yes, I was feeling rather grateful at that point that I hadn't quantified the probability. It's fair to assume that it would have been low enough that I couldn't plausibly recover from the calibration hit, like 1% or something.

@Scott: This entire discussion assumes unlimited finite internal computing power, so P and NP cut no dice here. Otherwise, of course a larger environment can outsmart you mathematically.

@Will: Mathematical truths are about which axioms imply which theorems.

@Wei: A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?

Also @Wei: I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.

In other words, if there's any computable reaction that you have to discovering what looks like a black box halting solver - any computable reasoning that decides that "this looks like an uncomputable halting solver" and produces new distributions over computably related events as a result - then that's in the Solomonoff mixture.

Solomonoff is not really as bad as it sounds.

But when it comes to making use of the results to incorporate the halting oracle via self-modification - then Solomonoff blows a fuse, of course, because it was never designed for self-modification in the first place; it's a Cartesian formalism that puts the universe irrevocably on the outside.

Let me expand my point. Assume you run your hyper civilisation for 1 googol years, you ask it, "Do the angles in a three straight sided shape add up to 180?"

This can be answered yes or no, depending upon what axiom scheme you adopt, cartesian, hyperbolic or something more obscure. Having a system that has thought of all mathematical truths doesn't get you a Powerful system, unless it knows which apply to solving the problems you ask.

Beautiful post, Eli. Seriously, one of the better --- and more useful, at least pedagogically --- ones. Should be required reading for, well, just about everyone.

Oops, I mis-stated the result. If we fix a universal Turing machine (equivalently, a programming language), then there exists a constant C so that deciding the halting problem for programs of length less than N can be done by a program of length less than N+C. Pengvado's solution works here.

Unfortunately, as Richard observed, you can't do this with Turing machines, essentially because Turing machines can not inspect their own finite control, unlike the stored program on a von Neumann machine. In fact, it appears that my claim---in Richard's notation, that BBB(N) is N+O(1)---is open. It is known that BBB(N) is O(N), and moreover, a 2002 paper "Improved Bounds for Functions Related to Busy Beavers" by Ben-Amran and Petersen showed that BBB(N) is N+o(N). I haven't found better results, but on the off-chance that I do, I'll post another comment here.

Now, because my claim is open, we still have to check if Eliezer's original intuition (that 500-state Turing machines can't solve the halting problem for 50-state ones) holds up. Luckily for me, it doesn't. It's known that BBB(N) is less than 3N+6 (that's the O(N) result above), so at worst, there exists a 500-state Turing machine that solves the halting problem for 100-state ones. This is less impressive than what I originally claimed, because it's off by a multiplicative constant rather than an additive one, but it still does the trick.

What's the moral here? I think it's that you really should define Kolmogorov complexity in terms of the number of bits in the shortest program doing whatever you're analyzing, as is standard, rather than the number of states in a Turing machine. Indeed, this is how Eliezer defines it in this post; furthermore, Eliezer alludes to it when he responds to my post by mentioning a "500-bit Turing machine" [emphasis mine]. Only in the aside beginning "I would drop dead of shock" was the number of states actually mentioned.

Otherwise, of course a larger environment can outsmart you mathematically.

No, not of course. For example, suppose P were equal to PSPACE. Then even though a larger environment could fundamentally outsmart you mathematically (say by solving the halting problem), it couldn't prove to you that it was doing so. In other words, the situation with polynomial-time computation would be more-or-less the same as it is with unlimited computation: superintelligent machines could only prove their superintelligence to other superintelligent machines.

That the situation with efficient computation appears to be different---i.e., that it appears superintelligent machines can indeed prove their superintelligence to fundamentally dumber machines---is (if true) a profound fact about the world that seems worth calling attention to. Sure, of course you can nullify it by assuming away all complexity considerations, but why? :-)

Scott, while I don't mean to deflate in any sense the shattering import of your wisdom...

...there's a pretty large segment of the planetary population upon whom the distinction between "polynomially more computing power" and "exponentially more computing power" would be lost.

These unenlightened fools, if they had computing power N, would be just as impressed by a superintelligence that had N^googol computing power as a superintelligence that had 2^N computing power. They'd just lump both of those together as "the environment is smarter than me, and can prove things I can't and then exhibit the proof". Poor bastards.

Um, except that we also don't know whether there are computations that can be checked in N time but only performed in Ngoogol time. The situation is qualitatively the same as for N versus 2N.

(the superscripts didn't show up: that was N^googol and 2^N)

The initial wavefunction may be completely determined by the laws, and under MWI that's all you need.

Bundling the initial state into the laws of physics seems like a confused way of looking at things to me. The state and the laws are different things - if you muddle them together conceptually, your accounting is likely to get equally muddled. Also, most of the rationale for the size of the "laws of physics" being small evaporates. The initial state might be enormous, for all anyone knows. The laws of physics could be infinite too - but there the hope that they are small and finite seems slightly more reasonable.

Eliezer and Nick: I don't think "Kolmogorov complexity" and "Mandelbrot set" actually answers what I was trying to understand.

Both of those concepts seem completely apt for describing *perfectly* deterministic systems. But, in describing the "complexity" of the universe even in something as simple as the "pattern of stars that exists" one would still have to take into account potential non-deterministic factors such as human behavior. Unless you are a strict determinist, you have to allow for the potential for a sufficiently advanced human (or non-human) intelligence to eradicate a particular star, or even a galaxy, or to produce an artificial black hole that alters the pattern of stars.

How are these factors accounted for in a "500 bit" code? Or, are you saying that you are a strict determinist?

Henry: "Both of those concepts seem completely apt for describing *perfectly* deterministic systems. But, in describing the "complexity" of the universe even in something as simple as the 'pattern of stars that exists' one would still have to take into account potential non-deterministic factors such as human behavior. [...] [A]re you saying that you are a strict determinist?"

I'll take this one. Yes, we're presuming determinism here, although the determinism of the Many Worlds Interpretation is a little different from the single-world determinism we're used to thinking about. Also, I notice that in an earlier comment, you spoke of free will and determinism as if the concepts were opposed, but depending on exactly what you mean by free will, this does is not necessarily the case. For Eliezer's take, see, e.g., "Timeless Control," or for the philosophical mainstream, google compatibilism.

POSTSCRIPT-- Strike that stray does in the penultimate sentence. And re compatibilism, the short version is that no, you don't have free will, but you shouldn't feel bad about it.

A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?

It's easy to use such an oracle to produce proofs and halting times. The following assumes that the oracle outputs 1 if the input TM halts, and 0 otherwise.

For proofs: Write a program p which on inputs x and i, enumerates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, otherwise it loops forever. Now query the oracle with (p,x,0), (p,x,1), ..., and you get a proof for x if it has a proof.

Halting times: Write a program p which on inputs x and i, runs x for i steps. If x halts before i steps, then it halts, otherwise it loops forever. Now query the oracle with (p,x,2), (p,x,4), (p,x,8), ..., until you get an output of "1" and then use binary search to get the exact halting time.

I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.

Eliezer, if what you say is true, then it shouldn't be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a counterexample. Suppose the input sequence consists of the unary encodings of Busy Beaver numbers BB(1), BB(2), BB(3), …, that is, BB(1) number of 1s followed by a zero, then BB(2) number of 1s followed by a 0, and so on. Let’s ask the predictor, after seeing n input symbols, what is the probability that it will eventually see a 0 again, and call this p(n). With Solomonoff Induction, p(n) will approach arbitrarily close to 0 as you increase n. A human mathematician on the other hand will recognize that the input sequence may not be computable and won’t let p(n) fall below some non-zero bound.

I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.

Eliezer, if what you say is true, then it shouldn't be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a counterexample. Suppose the input sequence consists of the unary encodings of Busy Beaver numbers BB(1), BB(2), BB(3), …, that is, BB(1) number of 1s followed by a zero, then BB(2) number of 1s followed by a 0, and so on. Let’s ask the predictor, after seeing n input symbols, what is the probability that it will eventually see a 0 again, and call this p(n). With Solomonoff Induction, p(n) will approach arbitrarily close to 0 as you increase n. A human mathematician on the other hand will recognize that the input sequence may not be computable and won’t let p(n) fall below some non-zero bound.

I don't quite see this. With Solomonoff induction, as with a computable human mathematician, the probability of the next symbol being 0 will approach 0. I don't see why a Solomonoff inductor using mixtures (that is, evaluating computable probability distributions rather than computable sequences) will ever assign a probability arbitrarily close to 0 of seeing another 0, ever.

Ask the human mathematician, over and over, what their probability of the next symbol being 0 is. They're computable, so this distribution is in the mixture. What other distribution is it necessarily dominated by, in the Solomonoff mixture?

Or are we allowing the human mathematician to have an inconsistent probability distribution where he says "You'll see another 0 eventually, I'm sure of it, but I'm also pretty sure that no matter how large a number of 1s I pick, it won't be high enough." If so, to be fair, we should factor out the symbol for "see another 0 eventually" and just ask the Solomonoff inductor about that separately via some input encoding, the same way we ask the human about it separately.

Good question, Eliezer. If the human mathematician is computable, why isn't it already incorporated into Solomonoff Induction? It seems to me that the human mathematician does not behave like a Bayesian. Let H be the hypothesis that the input sequence is the unary encodings of Busy Beaver numbers. The mathematician will try to estimate, as best as he can, P(next symbol is 0|H). But when the next symbol turns out to be 1, he doesn't do a Bayesian update and decrease P(H), but instead says "Ok, so I was wrong. The next Busy Beaver number is bigger than I expected."

I'm not sure I understand what you wrote after "to be fair". If you think a Solomonoff inductor can duplicate the above behavior with an alternative setup, can you elaborate how?

@ Wei. Thanks for the response. I will look at the refs but haven't yet done so. I'm skeptical about whether they'll change my mind on the subject, but I'll take a look.

It seems to me that the belief in pure determinism is axiomatic (I'm not even sure what it means to be a Bayesian in a purely deterministic system!), so most of this discussion appears to be pure conjecture. I'm not sure that it has any meaningful application.

@Wei: p(n) will approach arbitrarily close to 0 as you increase n.

This doesn't seem right. A sequence that requires knowledge of BB(k), has O(2^-k) probability according to our Solomonoff Inductor. If the inductor compares a BB(k)-based model with a BB(k+1)-based model, then BB(k+1) will on average be about half as probable as BB(k).

In other words, P(a *particular* model of K-complexity k is correct) goes to 0 as k goes to infinity, but the conditional probability, P(a *particular* model of K-complexity k is correct | a sub-model of that particular model with K-complexity k-1 is correct), does not go to 0 as k goes to infinity.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31