**Followup to**: Building Something Smarter , Say Not "Complexity", That Alien Message

One of the Godel-inspired challenges to the idea of self-improving minds is based on the notion of "complexity".

Now "complexity", as I've previously mentioned, is a dangerous sort of word. "Complexity" conjures up the image of a huge machine with incomprehensibly many gears inside - an *impressive* sort of image. Thanks to this impressiveness, "complexity" sounds like it could be explaining *all sorts* of things - that *all sorts* of phenomena could be happening because of "complexity".

It so happens that "complexity" also names another meaning, strict and mathematical: the *Kolmogorov complexity* of a pattern is the size of the program code of the shortest Turing machine that *produces the pattern as an output*, given unlimited tape as working memory.

I immediately note that this mathematical meaning, is *not* the same as that intuitive image that comes to mind when you say "complexity". The vast impressive-looking collection of wheels and gears? That's not what the math term means.

Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe - not just the region of space we see around us, but all regions of space and all quantum branches. (There's strong indications our universe may be effectively discrete, but if not, just calculate it out to 3^^^3 digits of precision.)

Then the "Kolmogorov complexity" of that entire universe - throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...

...would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.

The Kolmogorov complexity of just a *single* planet, like Earth, would of course be much* higher* than the "complexity" of the entire universe that contains it.

"Eh?" you say. "What's this?" you say. "How can a single planet contain more wheels and gears, more *complexity,* than the whole vast turning universe that embeds it? Wouldn't one planet contain fewer books, fewer brains, fewer species?"

But the new meaning that certain mathematicians have formulated and attached to the English word "complexity", is not like the visually impressive complexity of a confusing system of wheels and gears.

It is, rather, the size of the smallest Turing machine that *unfolds into* a certain pattern.

If you want to print out the entire universe from the beginning of time to the end, you only need to specify the laws of physics.

If you want to print out just Earth by itself, then it's not enough to specify the laws of physics. You also have to point to *just Earth* within the universe. You have to narrow it down somehow. And, in a big universe, that's going to take a lot of information. It's like the difference between giving directions to a city, and giving directions for finding a single grain of sand on one beach of one city. Indeed, with all those quantum branches existing side-by-side, it's going to take around as much information to *find* Earth in the universe, as to just directly specify the positions of all the atoms.

Kolmogorov complexity is the sense at which we zoom into the endless swirls of the Mandelbrot fractal and think, not "How complicated", but rather, "How simple", because the Mandelbrot set is defined using a very simple equation. But if you wanted to find a subset of the Mandelbrot set, one *particular* swirl, you would have to specify *where* to look, and that would take more bits.

That's why we use the *Kolmogorov* complexity in Occam's Razor
to determine how "complicated" or "simple" something is. So that, when
we think of the *entire* universe, all the stars we can see and all the
implied stars we can't see, and hypothesize laws of physics
standing behind it, we will think "what a simple universe" and not "what a complicated universe" - just like looking into the Mandelbrot fractal and thinking "how simple". We could never accept a theory of physics as probably
true, or even remotely possible, if it got an Occam penalty the size of the universe.

As a logical consequence of the way that Kolmogorov complexity has been defined, no closed system can ever increase in Kolmogorov complexity. (At least, no closed system without a 'true randomness' source.) A program can pattern ever more interacting wheels and gears into its RAM, but nothing it does from within itself can increase "the size of the smallest computer program that unfolds into it", by definition.

Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an *intuitive* sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more "complex" than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can't be any more "complex" than that.

But if, on the other hand, you reached into the biology and made a few million random changes here and there, the *Kolmogorov complexity* of the whole system would shoot way up: anyone wanting to specify it exactly, would have to describe the random changes that you made.

I specify "random" changes, because if you made the changes with beneficence aforethought - according to some criterion of goodness - then I could just talk about the compact criterion you used to make the changes. Only random information is incompressible on average, so you have to make *purposeless* changes if you want to increase the Kolmogorov complexity as fast as possible.

So! As you've probably guessed, the argument against self-improvement is that since closed systems cannot increase their "complexity", the AI must look out upon the world, demanding a high-bandwidth sensory feed, in order to grow up.

*If,* that is, you believe that "increasing Kolmogorov complexity" is prerequisite to increasing real-world effectiveness.

(We will dispense with the idea that if system A builds system B, then system A must be "by definition" as smart as system B. This is the "Einstein's mother must have been one heck of a physicist" sophistry. Even if a future planetary ecology is in some sense "implicit" in a single self-replicating RNA strand in some tidal pool, the ecology is a lot more impressive in a real-world sense: in a given period of time it can exert larger optimization pressures and do more neat stuff.)

Now, how does one argue that "increasing Kolmogorov complexity" has something to do with increasing intelligence? Especially when small machines can unfold into whole universes, and the maximum Kolmogorov complexity is realized by random noise?

One of the other things that a closed computable system provably can't do, is solve the general halting problem - the problem of telling whether *any* Turing machine halts.

A similar proof shows that, if you take some given solver, and consider the maximum size bound such that the solver can solve the halting problem for *all* machines of that size or less, then this omniscience is bounded by at most the solver's own complexity plus a small constant.

So... isn't *increasing your Kolmogorov complexity* through outside sensory inputs, the key to learning to solve the halting problem for ever-larger systems?

And doesn't this show that no closed system can "self-improve"?

In a word, no.

I mean, if you were to try to write it out as logic, you'd find that one of the steps involves saying, "If you can solve all systems of complexity N, you must be of complexity greater than N (maybe minus a small constant, depending on the details of the proof). Therefore, by increasing your complexity, you increase the range of systems you can solve." This is formally a non-sequitur.

It's also a non-sequitur in practice.

I mean, sure, if we're not dealing with a closed system, you can't *prove* that it won't solve the halting problem. You could be looking at an external bright light in the sky that flashes on or off to reveal the halting solution.

But unless you *already* have that kind of mathematical ability yourself, you won't *know* just from looking at the light that it's giving you true solutions to the halting problem. You must have just been constructed with faith in the light, and the light must just happen to work.

(And in any computable universe, any light in the sky that you see *won't* happen to solve the halting problem.)

It's not easy for "sensory information" to give you *justified new mathematical knowledge* that you could not *in principle* obtain with your eyes closed.

It's not a matter, for example, of seeing written in the sky a brilliant proof, that you would never have thought of on your own. A closed system with infinite RAM can close its eyes, and write out *every possible sensory experience it could have, along with its own reactions to them,* that could occur within some bounded number of steps. Doing this does not increase its Kolmogorov complexity.

So the notion can't be that the environment tells you something that you recognize as a proof, but didn't think of on your own. Somehow, having that sensory experience in particular, has to increase your mathematical ability *even after you perfectly imagined that experience and your own reaction to it in advance.*

Could it be the healing powers of having a larger universe to live in, or other people to talk to? But you can simulate incredibly large universes - vastly larger than anything we see in our telescopes, up-arrow large - *within* a closed system without increasing its Kolmogorov complexity. Within that simulation you could peek at people watching the stars, and peek at people interacting with each other, and plagiarize the books they wrote about the deep wisdom that comes from being embedded in a larger world.

What *justified novel mathematical* knowledge - about the halting problem in particular - could you gain from a sensory experience, that you could not gain from perfectly imagining that sensory experience and your own reaction to it, nor gain from peeking in on a simulated universe that included someone having that sensory experience?

Well, you can always suppose that you were born trusting the light in the sky, and the light in the sky always happens to tell the truth.

But what's actually going on is that the non-sequitur is coming back to bite: Increasing your Kolmogorov complexity doesn't necessarily increase your ability to solve math problems. Swallowing a bucket of random bits will increase your Kolmogorov complexity too.

You aren't likely to learn any *mathematics*
by gazing up at the external stars, that you couldn't learn from "navel-gazing" into an equally large simulated universe. Looking at the *actual* stars around you is good for figuring out *where*
in the universe you are (the *extra* information that specifies your location) but not much good for learning *new math*
that you couldn't learn by navel-gazing into a simulation as large as our universe.

In fact, it's just *bloody hard* to *fundamentally* increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

Saying that a 500-state Turing machine might be able to solve all problems up to at most 500 states plus a small constant, is misleading. That's an upper bound, not a lower bound, and it comes from having a constructive way to build a specific unsolvable Turing machine out of the solver. ~~In reality, you'd expect a 500-state Turing machine to get ~~ The vast majority of 500-state Turing machines that implement something that looks like a "halting problem solver" will go nowhere near 500 states (but see this comment).*nowhere near* solving the halting problem up to 500. I would *drop dead of shock* if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states.

Suppose you write a relatively short Turing machine that, by virtue of its unlimited working memory, creates an entire universe containing googols or up-arrows of atoms...

...and within this universe, life emerges on some planet-equivalent, and evolves, and develops intelligence, and devises science to study its ecosphere and its universe, and builds computers, and goes out into space and investigates the various physical systems that have formed, and perhaps encounters other life-forms...

...and over the course of trillions or up-arrows of years, a transgalactic intrauniversal economy develops, with conduits conducting information from one end of the universe to another (because you didn't pick a physics with inconvenient lightspeed limits), a superWeb of hyperintelligences all exchanging information...

...and finally - after a long, long time - your computer program blinks a giant message across the universe, containing a problem to be solved and a specification of how to answer back, and threatening to destroy their universe if the answer is wrong...

...then this intrauniversal civilization - and everything it's ever learned by theory or experiment over the last up-arrow years - is said to contain 400 bits of complexity, or however long the original program was.

But where did it get its math powers, from inside the closed system?

If we trace back the origins of the hypergalactic civilization, then every belief it ever adopted about math, came from updating on some observed event. That might be a star exploding, or it might be the output of a calculator, or it might be an observed event within some mind's brain... but in all cases, the update will occur because of a logic that says, "If I see this, it means that". Before you can learn, you must have the prior that structures learning. If you see something that makes you decide to change the way you learn, then you must believe that seeing X implies you should learn a different way Y. That's how it would be for superintelligences, I expect.

If you keep tracing back through that simulated universe, you arrive at something *before* the dawn of superintelligence - the *first* intelligent beings, produced by evolution. These first minds are the ones who'll look at Peano Arithmetic and think, "This has never produced a contradiction, so it probably never will - I'll go ahead and program that into my AI." These first minds are the ones who'll think, "Induction seems like it works pretty well, but how do I formalize the notion of induction?" And these first minds are the ones who'll think, "If I build a self-improving AI, how should it update itself - including changing its own updating process - from the results of observation?"

And how did the first minds get the ability to think those thoughts? From natural selection, that generated the adaptations that executed to think all those thoughts, using the simple evolutionary rule: "keep what works".

And in turn, natural selection in this universe popped out of the laws of physics.

So everything that this hypergalactic civilization ever believes about math, is really just induction in one form or another. All the evolved adaptations that do induction, produced by inductive natural selection; and all the generalizations made from experience, including generalizations about how to form better generalizations. It would all just unfold out of the inductive principle...

...running in a box sealed as tightly shut as our own universe appears to be.

And I don't see how we, in our own closed universe, are going to do any better. Even if we have the ability to look up at the stars, it's not going to give us the ability to go outside that inductive chain to obtain justified mathematical beliefs.

If you wrote that 400-bit simulated universe over the course of a couple of months
using human-level intelligence and some mysterious source of unlimited
computing power, then *you* are much more complex than that
hypergalactic civilization. *You* take much more than 400 bits to find within the space of possibilities, because you are only *one particular* brain.

But y'know, I think that your mind, and the up-arrow mind of that
inconceivable civilization, would still be worth distinguishing as Powers. Even if *you* can figure out how to ask *them*
questions. And even if you're asking them questions by running an
internal simulation, which makes it all part of your own "complexity" as defined in the math.

To locate a up-arrow-sized mind within an up-arrow-sized civilization, would require up-arrow bits - even if the *entire*
civilization unfolded out of a 400-bit machine as compact as our own
laws of physics. But which would be more powerful, that one "complex"
mind, or the "simple" civilization it was part of?

None of this violates Godelian limitations.* *You can transmit to the hypergalactic civilization a similar Turing machine to the one that built it, and ask it how *that*
Turing machine behaves. If you can fold a hypergalactic civilization
into a 400-bit Turing machine, then even a hypergalactic civilization
can confront questions about the behavior of 400-bit Turing machines
that are *real stumpers.*

And 400 bits is going to be an overestimate. I bet there's at least one up-arrow-sized hypergalactic civilization folded into a halting Turing machine with 15 states, or something like that. If that seems unreasonable, you are not acquainted with the Busy-Beaver problem.

You can get a hell of a lot of mathematical ability out of small
Turing machines that unfold into pangalactic hypercivilizations. But
unfortunately, there are other small Turing machines that are hellishly
difficult problems - perhaps unfolding into hypercivilizations
themselves, or things even less comprehensible. So even the tremendous
mathematical
minds that can unfold out of small Turing machines, won't be able to
solve
all Turing machines up to a larger size bound. Hence no Godelian
contradiction.

(I wonder: If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon? 15? Maybe even as large as 20? Surely not 100 - you could encode a civilization of similar origins and equivalent power into a smaller Turing machine than that.)

Olie Lamb said: "I don't see anything good about complexity. There's nothing artful about complexity. There's nothing mystical about complexity. It's just complex." This is true even when you're talking about wheels and gears, never mind Kolmogorov complexity. It's simplicity, not complexity, that is the admirable virtue.

The *real* force behind this whole debate is that the word "complexity" sounds impressive and can act as an explanation
for anything you don't understand. Then the word gets captured by a mathematical property that's spelled using the same letters, which happens to be provably constant for closed systems.

That, I think, is where the argument really comes from, as it rationalizes the even more primitive intuition of some blind helpless thing in a box.

This argument is promulgated even by some people who can write proofs about complexity - but frankly, I do not think they have picked up the habit of visualizing what their math symbols mean in real life. That thing Richard Feynman did, where he visualized two balls turning colors or growing hairs? I don't think they do that. On the last step of interpretation, they just make a quick appeal to the sound of the words.

But I *will* agree that, if the laws we know are true, then a self-improving mind which lacks sensory inputs, shouldn't be able to improve its mathematical abilities beyond the level of a up-arrow-sized civilization - for example, it shouldn't be able to solve Busy-Beaver(100).

It might *perhaps* be more limited than this in mere *practice,* if it's just running on a laptop computer or something. But if *theoretical* mathematical arguments about closed systems show anything, *that* is what they show.

Can you give an example of the kind of argument that you are trying to refute here?

The argument:

"a (real-world) agent such as a human or AI cannot self improve because the K-complexity of a *closed* system is constant"seems obviously silly. A human or AI is not a closed system, it receives sensory inputs, and no-one would argue that self-improvement has to happen without any sensory inputs.

Aside: I would caution, echoing Eliezer, that theoretical results about what is computable and what isn't, and about Turing machines with infinite memory and how many 1's they can output before they halt are hard to apply to the real world without making subtle errors causing you to falsely pump your intuition. The real world problems that we face rely upon finite machines operating in finite time.

Posted by: Roko | November 03, 2008 at 04:02 PM

"Then the "Kolmogorov complexity" of that entire universe - throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...would be 500 bits."

I must be missing something... what exactly do you mean by the phrase "the 'complexity' of that entire universe... would be 500 bits."

I would think the "complexity of the universe" would include the location of every bit of matter at all times, among other things. So, maybe I just don't understand what you mean by the "complexity of" something.

Also, I'm not sure whether you believe that all human behavior is purely deterministic in the most extreme sense. Or, do you think people can actually make their own decisions, i.e., "free will"? Obviously, if someone can turn either left or right at any given time, then that decision is still not explained (in any predictive sense) by the laws of physics.

Wouldn't the 500 bits have to include understanding all psychological/economic behavior?

Posted by: Henry V | November 03, 2008 at 04:25 PM

Roko: here.

Henry: Kolmogorov complexity.

Posted by: Nick Tarleton | November 03, 2008 at 04:33 PM

He means 500 bits plus log2 the size of the initial conditions, plus log2 the number of instants elapsed since then - which is the number of bits you would need to specify a current multiverse state under the specified assumptions.

Posted by: Tim Tyler | November 03, 2008 at 04:34 PM

Roko: here.

Henry: Kolmogorov complexity.

Posted by: Nick Tarleton | November 03, 2008 at 04:35 PM

Tim, I already specified I was talking about the whole universe throughout time and space (and inflation and decoherence). The laws probably don't have any initial conditions under that specification, even if different things happen in different places.

Henry, follow the link to the Mandelbrot set, then look up how the Mandelbrot set is computed (a single equation).

Posted by: Eliezer Yudkowsky | November 03, 2008 at 04:43 PM

You're right about what you said about the instants. However, we don't know how big the initial conditions were - and I'm not clear on what it would mean for them not to exist. Maybe you are talking about eternal return? Even then there's a specification of which path you are on.

Posted by: Tim Tyler | November 03, 2008 at 05:44 PM

@Nick Tarleton:

Thanks, Nick. I can see I'm going to have to spend an exciting evening or 10 alone with the SL4 archives...

Perhaps one could consider Eliezer's OB posts as an "edited and sanitized summary of the morass of SL4"?

Posted by: Roko | November 03, 2008 at 05:59 PM

Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...

Posted by: Will Pearson | November 03, 2008 at 06:08 PM

Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more "complex" than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can't be any more "complex" than that.and later:

...then this intrauniversal civilization - and everything it's ever learned by theory or experiment over the last up-arrow years - is said to contain 400 bits of complexity, or however long the original program was."You say it as if it were a good thing."

But this seems, on the contrary, to indicate that Kolmogorov complexity might not be a good candidate for the formalized notion of intuitive complexity, including the one used by Occam's Razor. The fact that it often can be anti-intuitive doesn't necessarily mean that it's our intuition of what is simple and what is complicated that must change; the mere fact that the notion of Kolmogorov complexity is itself very simple and easy to play with mathematically doesn't by itself entitle it to anything.

There exists a proof of Fermat's Last Theorem with very small Kolmogorov complexity, much smaller than, say, the size of this post of yours in characters, or the Kolmogorov complexity of some well-known formal proof of a basic theorem in calculus. Does that mean that this proof of FLT is a much simpler object than these others? It does if you insist on interpreting "simple" to mean "small Kolmogorov complexity", but why must you so insist? If you show this proof to a mathematician, he or she won't say "oh, this is very simple, simpler than basic calculus proofs." If you then tell the mathematician that in fact this very complex proof was obtained by running this very short Turing machine, they'll just shrug in response - so what? As a piece of mathematics, the proof is still evidently extremely complicated. Maybe your notion of complexity just isn't so goo d at capturing that.

Posted by: Anatoly Vorobey | November 03, 2008 at 06:15 PM

The "500 bits" only works if you take a hidden variable or Bohmian position on

quantum mechanics. If (as the current consensus would say) non-linear dynamics can amplify quantum noise then enormous amounts of new information are being "produced" locally everywhere all the time. The current state of the universe incorporates much or all of that information. (Someone who understands the debates about black holes and the holographic principle should chime in with more precise analysis.)

I couldn't follow the whole argument so I'm not sure how this affects it, but given that Eliezer keeps referring to this claim I guess it is important.

Posted by: Jed Harris | November 03, 2008 at 06:25 PM

Tim: The initial wavefunction may be completely determined by the laws, and under MWI that's all you need.

Roko: I actually don't notice much duplication between OB and SL4. It's definitely worth reading.

Will: Why don't you take it seriously? You know the Mandelbrot set is generated from very few bits, and I don't imagine you expect there to be a simple derivation of its structures. Taken literally, your request is highly unreasonable because it involves so many levels of poorly understood processes - inflation, baryogenesis, galaxy formation... and the answer is probably infinite anyway. Still, I wouldn't be surprised if a better physicist could produce a reasonable first-principles estimate of the number of baryons in the observable universe.

Jed: Or many-worlds.

Posted by: Nick Tarleton | November 03, 2008 at 06:34 PM

Will Pearson:

Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...Been done. It didn't work out, and Eddington's theory is now only history, but the example shows that it is not out of the question that there be such laws.

Posted by: Richard Kennaway | November 03, 2008 at 06:40 PM

It might be worthwhile to note that cogent critiques of the proposition that a machine intelligence might very suddenly "become a singleton Power" do not deny the inefficacies of the human cognitive architecture offering improvement via recursive introspection and recoding, nor do they deny the improvements easily available via hardware substitution and expansion of more capable hardware and I/O.

The do, however, highlight the distinction between a vastly powerful machine madly exploring vast reaches of a much vaster "up-arrow" space of mathematical complexity, and a machine of the same power bounded in growth of intelligence -- by definition necessarily

relevant-- due to starvation forrelevantnovelty in its environment of interaction.If, Feynman-like, we imagine the present state of knowledge about our world in terms of a distribution of vertical domains, like silos, some broader with relevance to many diverse facets of real-world interaction, some thin and towering into the haze of leading-edge mathematical reality, then we can imagine the powerful machine quickly identifying and making a multitude of latent connections and meta-connections, filling in the space between the silos and even somewhat above -- but to what extent, given the inevitable diminishing returns among the latent, and the resulting starvation for the novel?

Given such boundedness, speculation is redirected to growth in ecological terms, and the Red Queen's Race continues ever faster.

Posted by: Jef Allbright | November 03, 2008 at 06:46 PM

there exists a 500-state Turing machine that solves the halting problem for all the Turing machines up to 400 states.Here's the algorithm: given as input a Turing machine M with less than 400 states,

1. Compute a number k greater than BB(400).

2. Simulate M for k steps.

3. If it halts by then, then output "M halts". If it doesn't, then output "M doesn't halt".

Correctness proof: If it halts in less than k steps, then obviously it halts. If it doesn't, then by the definition of BB(400), it never will.

Analysis of number of states: This is possible with 400+C states because you use about 400 states for step 1, and a constant number of overhead states for simulating Turing machines. Because there exist universal Turing machines with less than 25 states, you can arrange for C to be less than 100.

There's a small amount of subtlety in actually doing step 1. Let me know if you want more detail. However, I believe the overall result and method should be clear; moreover, I hope the result is

There exists a constant C so that if you want to solve the halting problem for all Turing machines up to N states, you can use a particular Turing machine with N+C states. Moreover, this constant C is small (definitely less than 100). In other words,unsurprisingonce you see the method. As such, please don't drop dead of shock.Posted by: Boris | November 03, 2008 at 07:21 PM

I thought you had some reason for giving us *extra* detail, why the number 500. Yeah it could be short as 500 bits, but it could equally be very long.

I will like shorter explanation for the universe if I am presented with one compared to a longer one, but there is no reason to expect one of a certain length. Since we don't have any that actually manage to do so, why attach any number to it?

Posted by: Will Pearson | November 03, 2008 at 07:30 PM

I don't think there is a Theory of Everything for the universe. Just as that there isn't one for Mathematics.

What if the universe is not continuous, the simulation does lazy evaluation, and the laws of nature can only be described by an infinite number of bit? How would this change your reasoning?

Posted by: gmlk | November 03, 2008 at 07:31 PM

*drops dead of shock*

Oh... well, at least there's no reasonable way for a superintelligence that can perform any finite computation, to obtain that particular 500-bit Turing machine, through any amount of internal thinking or induction from sensory experience, without using a reasoning method guaranteed at some point to make a mistake.

Posted by: Eliezer Yudkowsky | November 03, 2008 at 07:38 PM

I don't understand. Am I too dumb or is this gibberish?

Posted by: PK | November 03, 2008 at 08:02 PM

@pk

I don't understand. Am I too dumb or is this gibberish?It's not so complicated; it's just that we're so formal...

Posted by: Jef Allbright | November 03, 2008 at 08:33 PM

Kolmogorov complexity of universe is more than just a description of its physical laws.

First of all, besides the physical laws, K should include the initial state of the universe. It may be large or even infinite. And it should be properly expressed in qubits.

Second, you'd need to pick the "present state", whatever that means, our of all possible future and past states. In a classical universe it may be only a 100bits or so, in a quantum multiverse, it's at least 1 bit per every branch.

Third, it is not at all obvious that the laws of universe are computable. Maybe it uses an oracle (the multiverse branching process can provide such an oracle) or maybe it uses real values in a non-trivial way. if this is true then we should either say that K is infinite, or use K relative to an oracle.

Fourth, K is defined only up to an additive constant, so it's not really correct to talk about The Kolmogorov Complexity. You can only talk about Kolmogorov complexity relative to a fixed Turing machine. For different turning machines the difference between the Ks will be roughly equal to the number of bits in a translator between their machine languages. In the extreme case when the reference Turing machine IS the universe then K(universe) is just 0.

Posted by: Oleg | November 03, 2008 at 08:59 PM

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.Eliezer, you're making an important error here. I don’t think it affects the main argument you're making in this article (that considerations of "complexity" doesn't rule out self-improving minds), but this error may have grave consequences elsewhere. The error is that while the environment does have to be magic, you don't need to have faith in this, just

not have faith that it's impossible.Suppose you get a hold of a black box that seems to act as a halting-problem oracle. You’ve thrown thousands of problems at it, and have never seen in incorrect or inconsistent answer. What are the possibilities here? Either (A) the environment really is magic (i.e. there is uncomputable physics that enables implementation of actual halting-problem oracles), or (B) the box is just giving random answers that happen to be correct by chance, or (C) you’re part of a simulation where the box is giving all possible combinations of answers and you happen to be in the part of the simulation where the box is giving correct answers. As long as your prior probability for (A) is not zero, as you do more and more tests and keep getting correct answers, it’s posterior probability will eventually dominate (B) and (C).

Why is this so important? Well in standard Solomonoff Induction, the prior for (A)

iszero, and if we program that into an AI, it won’t do the right thing in this situation. This may have a large effect on expected utility (of us, people living today), because while the likelihood of us living in an uncomputable universe with halting-problem oracles is low, the utility we gain from correctly recognizing and exploiting such a universe could be huge.Posted by: Wei Dai | November 03, 2008 at 11:15 PM

Wei, essentially the black box has to be such that it seems more likely for it to be a halting oracle than something only slightly smarter than you are - since when it says "doesn't halt", and you don't know if it halts or not, it's not like you can easily check.

Suppose, though, that the black box has the apparent qualities of a closed timelike curve in a universe that permits CTCs to be constructed with perfection (even if our own universe permitted CTCs they would break down well before up-arrow time). Then it would seem pretty plausible, after constructing the CTC and verifying its apparent performance on the cases you knew how to test, that the CTC wasn't a trickster SI in disguise.

But -

- in this case, you can just build a CTC inside yourself, as a closed system, and examine the results with your eyes shut. Your "complexity" won't change, it'll just have to be

definedoutside the Turing formalism.So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn't

theoreticallydo with eyes closed.Posted by: Eliezer Yudkowsky | November 03, 2008 at 11:32 PM

Good point, but when the box says "doesn't halt", how do I know it's correct?

Posted by: Nick Tarleton | November 03, 2008 at 11:36 PM

If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon?Suppose they run a BB evaluator for all of time. They would, indeed, have no way at any point of being

certainthat the current champion 100-bit program is the actual champion that produces BB(100). However, if they decide to anthropically reason that "for any time t, I am probably alive after time t, even though I have no direct evidence one way or the other once t becomes too large", then they will believe (with arbitrarily high probability) that the current champion program is the actual champion program, and an arbitrarily high percentage of them will be correct in their belief.Posted by: Rolf Nelson | November 04, 2008 at 12:11 AM

Boris:

There's a small amount of subtlety in actually doing step 1.Isn't it simply impossible?

That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.

Posted by: Douglas Knight | November 04, 2008 at 12:25 AM

Nick wrote:

Good point, but when the box says "doesn't halt", how do I know it's correct?A halting-problem oracle can be used for all kinds of things besides just checking whether an individual Turing machine will halt or not. For example you can use it to answer various mathematical questions and produce proofs of the answers, and then verify the proofs yourself. You should be able to obtain enough proofs to convince yourself that the black box is not just giving random answers or just being slightly smarter than you are.

If P!=NP, you should be able to convince yourself that the black box has at least exponentially more computational power than you do. So if you are an AI with say the computational resources of a solar system, you should be able to verify that the black box either contains exotic physics or has access to more resources than the rest of the universe put together.

Eliezer wrote:

So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn't theoretically do with eyes closed.It seems to me that an AI should/can never completely rule out the possibility that the universe contains physics that is mathematically more powerful than what it has already incorporated into itself, so it should always keep its eyes open. Even if it has absorbed the entire universe into itself, it might still be living inside a simulation, right?

Posted by: Wei Dai | November 04, 2008 at 01:51 AM

+1 to Anatoly Vorobey. Using K-complexity to capture the human notion of complexity seems to be even worse than using game-theoretic rationality to capture human rationality - something that's been attacked to death already.

Posted by: Vladimir Slepnev | November 04, 2008 at 03:10 AM

Boris:

There's a small amount of subtlety in actually doing step 1.Douglas:

Isn't it simply impossible? That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.It's impossible to bound BB(n) computably in n, but that leaves open the following question, on which Boris's argument depends.

Let BBB(n), the Busy Beaver Bound function, be the size of the smallest Turing machine that, starting from a blank tape, prints out an upper bound for BB(n). Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.

Posted by: Richard Kennaway | November 04, 2008 at 03:10 AM

Richard:

Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.c is the size of a universal Turing machine which, in addition to simulating its argument, keeps track of how many steps it simulates.

BB(n) is the maximum number of steps that any halting TM of size n performs, therefore there is some particular TM of size n that performs BB(n) steps. Give that TM as input the the aforementioned UTM, and it will compute BB(n).

The uncomputability of BB prevents you from finding the right TM, and from proving it correct if it's handed to you by a magic light in the sky. But the TM still exists even if you'll never know which it is.

Posted by: Pengvado | November 04, 2008 at 04:22 AM

Eliezer, I'm not quite sure why you are obsessed with the ability to find mathematical truths. You can different sets of mathematical truths if you take the parallel postulate true or not. Which ones are applicable in the real world depends where you are.

Unless you know where you are in the universe can not know what is friendly for you to do (or even useful).

Posted by: Will Pearson | November 04, 2008 at 06:01 AM

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.As Wei mentioned, P≠NP is basically the conjecture that this isn't true: i.e., that you can exponentially increase your ability to solve math problems by your environment being magic and your

notbeing born with faith in that fact. So for example, if your environment immediately inverted any one-way function, that would be evidence (no faith required) that your environment is not merely 'slightly' smarter than you are, butastoundinglysmarter. In qualitative terms, I think it would be almost as astounding as if the environment solved the halting problem.Posted by: Scott Aaronson | November 04, 2008 at 07:56 AM

I would drop dead of shockEliezer, just as it was interesting to ask what probability estimate 'Nuts!' amounted to, I think it would be very useful for the forum of Overcoming Bias to ask what your implicit probability estimate for a 500 state TM being able to solve the halting problem for all TMs of up to 50 states.

I imagine that 'I would drop dead of shock' was intended to convey a probability of less than 1 in 10,000, or maybe 1 in 1,000,000?

Posted by: Toby Ord | November 04, 2008 at 09:20 AM

@Toby: Why, yes, I was feeling rather grateful at that point that I hadn't quantified the probability. It's fair to assume that it would have been low enough that I couldn't plausibly recover from the calibration hit, like 1% or something.

@Scott: This

entire discussionassumes unlimited finite internal computing power, so P and NP cut no dice here. Otherwise, of course a larger environment can outsmart you mathematically.@Will: Mathematical truths are about which axioms imply which theorems.

@Wei: A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?

Also @Wei: I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the

truthof its models. It just says thatany computableprobability distribution is in the mixture somewhere, so you can do as well as anycomputable form of cognitive uncertaintyup to a constant.In other words, if there's any

computablereaction that you have to discovering what looks like a black box halting solver - anycomputablereasoning that decides that "this looks like an uncomputable halting solver" and produces new distributions over computably related events as a result - then that's in the Solomonoff mixture.Solomonoff is not really as bad as it sounds.

But when it comes to making use of the results to incorporate the halting oracle via self-modification - then Solomonoff blows a fuse, of course, because it was never designed for self-modification in the first place; it's a Cartesian formalism that puts the universe irrevocably on the outside.

Posted by: Eliezer Yudkowsky | November 04, 2008 at 10:24 AM

Let me expand my point. Assume you run your hyper civilisation for 1 googol years, you ask it, "Do the angles in a three straight sided shape add up to 180?"

This can be answered yes or no, depending upon what axiom scheme you adopt, cartesian, hyperbolic or something more obscure. Having a system that has thought of all mathematical truths doesn't get you a Powerful system, unless it knows which apply to solving the problems you ask.

Posted by: Will Pearson | November 04, 2008 at 11:17 AM

Beautiful post, Eli. Seriously, one of the better --- and more useful, at least pedagogically --- ones. Should be required reading for, well, just about everyone.

Posted by: jb | November 04, 2008 at 11:32 AM

Oops, I mis-stated the result. If we fix a universal Turing machine (equivalently, a programming language), then there exists a constant C so that deciding the halting problem for programs of length less than N can be done by a program of length less than N+C. Pengvado's solution works here.

Unfortunately, as Richard observed, you can't do this with Turing machines, essentially because Turing machines can not inspect their own finite control, unlike the stored program on a von Neumann machine. In fact, it appears that my claim---in Richard's notation, that BBB(N) is N+O(1)---is open. It

isknown that BBB(N) is O(N), and moreover, a 2002 paper "Improved Bounds for Functions Related to Busy Beavers" by Ben-Amran and Petersen showed that BBB(N) is N+o(N). I haven't found better results, but on the off-chance that I do, I'll post another comment here.Now, because my claim is open, we still have to check if Eliezer's original intuition (that 500-state Turing machines can't solve the halting problem for 50-state ones) holds up. Luckily for me, it doesn't. It's known that BBB(N) is less than 3N+6 (that's the O(N) result above), so at worst, there exists a 500-state Turing machine that solves the halting problem for 100-state ones. This is less impressive than what I originally claimed, because it's off by a multiplicative constant rather than an additive one, but it still does the trick.

What's the moral here? I think it's that you really should define Kolmogorov complexity in terms of the number of

bitsin the shortest program doing whatever you're analyzing, as is standard, rather than the number ofstatesin a Turing machine. Indeed, this is how Eliezer defines it in this post; furthermore, Eliezer alludes to it when he responds to my post by mentioning a "500-bitTuring machine" [emphasis mine]. Only in the aside beginning "I would drop dead of shock" was the number of states actually mentioned.Posted by: Boris | November 04, 2008 at 12:02 PM

Otherwise, of course a larger environment can outsmart you mathematically.No, not of course. For example, suppose P were equal to PSPACE. Then even though a larger environment could fundamentally outsmart you mathematically (say by solving the halting problem), it couldn't

proveto you that it was doing so. In other words, the situation with polynomial-time computation would be more-or-less the same as it is with unlimited computation:superintelligent machines could only prove their superintelligence to other superintelligent machines.That the situation with efficient computation appears to be

different---i.e., that it appears superintelligent machines can indeed prove their superintelligence to fundamentally dumber machines---is (if true) a profound fact about the world that seems worth calling attention to. Sure, of course you can nullify it by assuming away all complexity considerations, but why? :-)Posted by: Scott Aaronson | November 04, 2008 at 12:41 PM

Scott, while I don't mean to deflate in any sense the shattering import of your wisdom...

...there's a pretty large segment of the planetary population upon whom the distinction between "polynomially more computing power" and "exponentially more computing power" would be lost.

These unenlightened fools, if they had computing power N, would be just as impressed by a superintelligence that had N^googol computing power as a superintelligence that had 2^N computing power. They'd just lump both of those together as "the environment is smarter than me, and can prove things I can't and then exhibit the proof". Poor bastards.

Posted by: Eliezer Yudkowsky | November 04, 2008 at 12:57 PM

Um, except that we

alsodon't know whether there are computations that can be checked in N time but only performed in Ngoogol time. The situation is qualitatively the same as for N versus 2N.Posted by: Scott Aaronson | November 04, 2008 at 01:13 PM

(the superscripts didn't show up: that was N^googol and 2^N)

Posted by: Scott Aaronson | November 04, 2008 at 01:15 PM

Bundling the initial state into the laws of physics seems like a confused way of looking at things to me. The state and the laws are different things - if you muddle them together conceptually, your accounting is likely to get equally muddled. Also, most of the rationale for the size of the "laws of physics" being small evaporates. The initial state might be

enormous, for all anyone knows. The laws of physics could be infinite too - but there the hope that they are small and finite seems slightly more reasonable.Posted by: Tim Tyler | November 04, 2008 at 01:38 PM

Eliezer and Nick: I don't think "Kolmogorov complexity" and "Mandelbrot set" actually answers what I was trying to understand.

Both of those concepts seem completely apt for describing *perfectly* deterministic systems. But, in describing the "complexity" of the universe even in something as simple as the "pattern of stars that exists" one would still have to take into account potential non-deterministic factors such as human behavior. Unless you are a strict determinist, you have to allow for the potential for a sufficiently advanced human (or non-human) intelligence to eradicate a particular star, or even a galaxy, or to produce an artificial black hole that alters the pattern of stars.

How are these factors accounted for in a "500 bit" code? Or, are you saying that you are a strict determinist?

Posted by: Henry V | November 04, 2008 at 05:22 PM

Henry:

"Both of those concepts seem completely apt for describing *perfectly* deterministic systems. But, in describing the "complexity" of the universe even in something as simple as the 'pattern of stars that exists' one would still have to take into account potential non-deterministic factors such as human behavior. [...] [A]re you saying that you are a strict determinist?"I'll take this one. Yes, we're presuming determinism here, although the determinism of the Many Worlds Interpretation is a little different from the single-world determinism we're used to thinking about. Also, I notice that in an earlier comment, you spoke of

free willanddeterminismas if the concepts were opposed, but depending on exactly what you mean byfree will, this does is not necessarily the case. For Eliezer's take, see,e.g., "Timeless Control," or for the philosophical mainstream, googlecompatibilism.Posted by: Z. M. Davis | November 04, 2008 at 05:48 PM

POSTSCRIPT-- Strike that stray

doesin the penultimate sentence. And re compatibilism, the short version is that no, you don't have free will, but you shouldn't feel bad about it.Posted by: Z. M. Davis | November 04, 2008 at 05:54 PM

A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?It's easy to use such an oracle to produce proofs and halting times. The following assumes that the oracle outputs 1 if the input TM halts, and 0 otherwise.

For proofs: Write a program p which on inputs x and i, enumerates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, otherwise it loops forever. Now query the oracle with (p,x,0), (p,x,1), ..., and you get a proof for x if it has a proof.

Halting times: Write a program p which on inputs x and i, runs x for i steps. If x halts before i steps, then it halts, otherwise it loops forever. Now query the oracle with (p,x,2), (p,x,4), (p,x,8), ..., until you get an output of "1" and then use binary search to get the exact halting time.

I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.Eliezer, if what you say is true, then it shouldn't be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a counterexample. Suppose the input sequence consists of the unary encodings of Busy Beaver numbers BB(1), BB(2), BB(3), …, that is, BB(1) number of 1s followed by a zero, then BB(2) number of 1s followed by a 0, and so on. Let’s ask the predictor, after seeing n input symbols, what is the probability that it will eventually see a 0 again, and call this p(n). With Solomonoff Induction, p(n) will approach arbitrarily close to 0 as you increase n. A human mathematician on the other hand will recognize that the input sequence may not be computable and won’t let p(n) fall below some non-zero bound.

Posted by: Wei Dai | November 04, 2008 at 07:33 PM

I don't quite see this. With Solomonoff induction, as with a computable human mathematician, the probability of the

nextsymbol being 0 will approach 0. I don't see why a Solomonoff inductor using mixtures (that is, evaluating computable probability distributions rather than computable sequences) will ever assign a probability arbitrarily close to 0 of seeinganother 0, ever.Ask the human mathematician, over and over, what their probability of the next symbol being 0 is. They're computable, so this distribution is in the mixture. What other distribution is it necessarily dominated by, in the Solomonoff mixture?

Or are we allowing the human mathematician to have an inconsistent probability distribution where he says "You'll see another 0 eventually, I'm sure of it, but I'm also pretty sure that no matter how large a number of 1s I pick, it won't be high enough." If so, to be fair, we should factor out the symbol for "see another 0 eventually" and just ask the Solomonoff inductor about that separately via some input encoding, the same way we ask the human about it separately.

Posted by: Eliezer Yudkowsky | November 04, 2008 at 08:17 PM

Good question, Eliezer. If the human mathematician is computable, why isn't it already incorporated into Solomonoff Induction? It seems to me that the human mathematician does not behave like a Bayesian. Let H be the hypothesis that the input sequence is the unary encodings of Busy Beaver numbers. The mathematician will try to estimate, as best as he can, P(next symbol is 0|H). But when the next symbol turns out to be 1, he doesn't do a Bayesian update and decrease P(H), but instead says "Ok, so I was wrong. The next Busy Beaver number is bigger than I expected."

I'm not sure I understand what you wrote after "to be fair". If you think a Solomonoff inductor can duplicate the above behavior with an alternative setup, can you elaborate how?

Posted by: Wei Dai | November 04, 2008 at 10:18 PM

@ Wei. Thanks for the response. I will look at the refs but haven't yet done so. I'm skeptical about whether they'll change my mind on the subject, but I'll take a look.

It seems to me that the belief in pure determinism is axiomatic (I'm not even sure what it means to be a Bayesian in a purely deterministic system!), so most of this discussion appears to be pure conjecture. I'm not sure that it has any meaningful application.

Posted by: Henry V | November 04, 2008 at 11:41 PM

@Wei:

p(n) will approach arbitrarily close to 0 as you increase n.This doesn't seem right. A sequence that requires knowledge of BB(k), has O(2^-k) probability according to our Solomonoff Inductor. If the inductor compares a BB(k)-based model with a BB(k+1)-based model, then BB(k+1) will on average be about half as probable as BB(k).

In other words, P(a *particular* model of K-complexity k is correct) goes to 0 as k goes to infinity, but the conditional probability, P(a *particular* model of K-complexity k is correct | a sub-model of that particular model with K-complexity k-1 is correct), does not go to 0 as k goes to infinity.

Posted by: Rolf Nelson | November 05, 2008 at 12:28 AM