« Transparent Characters | Main | Wanting To Want »

October 27, 2008


"Intelligence" is efficient cross-domain optimization.

It is not yet clear what distinguishes this from wealth and power, which also allow people to achieve apriori unlikely goals.

And there goes Caledonian making pointless arguments again... Couldn't you pick a more frivolous objection?

(Caledonian's comment was deleted.)

Robin, that's coming up in tomorrow's post "Efficient Cross-Domain Optimization" - for today, I just wanted to be clear that I don't directly equate intelligence to some number of bits of raw optimization power in an arbitrary domain.

Caledonian, you're not disputing anything Eliezer said yet you manage to find a way to be disrespectful. I wish you wouldn't.

I would really enjoy this post more if it were in the context of cognitive neuroscience. Or at least some phenomena actually extracted from the brain. For example, how could we detect a difference in intelligence biologically? Could this inspire a different kind of intelligence measure?

It seems to me that there's an aspect to this that isn't getting much attention: the domain.

Example domains include chess and Go, certainly. But probabilistic games surely should not be excluded. There is a spectrum of domains which go from "fair roulette" (which is not manipulatable by intelligence), though blackjack (slightly manipulable), and only at one end reach highly manipulatable games like chess and Go.

I'm sure Eliezer understands this, but his presentation doesn't spend much time on it.

For example, how do the calculations change when you admit that the domain may make some desirable situations impossible?

Im not sure if i am echoing another post by shane legg (cant remember where).

Consider a three dimensional space (topography) and a preference ordering given by height.

An optimizer that climbs a "hill space" would seems intuitively less powerful than one that finds the highest peak in a "multi hill space", even if relative to a random selection, and given that both spaces are the same size, both points are equally likely.

David, I would categorize this under the heading of "trying to measure how much effort something takes". I'm just trying to integrate over a measure, not describe the structure of the space relative to a particular way of searching for solutions. One kind of search might bog down where another would succeed immediately - the neighborhood of a problem space as seen by a human is not like the neighborhood of DNA strands seen by natural selection. One mind's cheap problem that can be solved by a short program like water running downhill is another mind's stumper - to a transhuman mind, for example, chess might appear as a pointless exercise because you can solve it by a simple Deep-Blue like program.

Indeed, there was recently developed a program that plays provably correct checkers from the canonical starting position - so it is now clear that playing checkers requires no optimization power, since you can solve it as deterministically as water running downhill. At least that's where this argument seems to me to lead.

I think you just have to construct "impressiveness" in a more complex way than "optimization power".

The quantity we're measuring tells us how improbable this event is, in the absence of optimization, relative to some prior measure that describes the unoptimized probabilities. To look at it another way, the quantity is how surprised you would be by the event, conditional on the hypothesis that there were no optimization processes around. This plugs directly into Bayesian updating

This seems to me to suggest the same fallacy as the one behind p-values... I don't want to know the tail area, I want to know the probability for the event that actually happened (and only that event) under the hypothesis of no optimization divided by the same probability under the hypothesis of optimization. Example of how they can differ: if we know in advance that any optimizer would optimize at least 100 bits, then a 10-bit-optimized outcome is evidence against optimization even though the probability given no optimization of an event at least as preferred as the one that happened is only 1/1024.

I guess it works out if any number of bits of optimization being exerted given the existence of an optimizer is as probable as any other number, but if that is the prior we're starting from then this seems worth stating (unless it follows from the rest in a way that I'm overlooking).

haha, a new anthropic principle for the ID folks:
the existence of a universe highly optimized for life implies the existence of an optimizing agent.

This concept of "optimisation power" was previously mentioned here. To recap on some of the objections raised at that time:

Optimisation power suggests something useful - but the proposed metric contains no reference to the number of trials, the number of trials in series on the critical path - or most of the other common ways of measuring the worth of optimisation processes. It seems to be more a function of the size of the problem space than anything else - in which case, why "power" and not, say "factor".

Before christening the notion, there are some basic questions: Is the proposed metric any use? What is the point of it?

it is now clear that playing checkers requires no optimization power, since you can solve it as deterministically as water running downhill.

Surely not! Just because a problem has a known, deterministic solution, that doesn't mean it doesn't require optimisation to produce that solution.

It would be extremely odd epistemological terminology to classify problems as optimisation problems only if we do not already have access to their solutions.

I give this article a Mentifax rating of 7.0 for incoherency and pointlessness.


I agree with Tim that the presence of a deterministic solution shouldn't be enough to say whether there's an optimization process going on. But then, Eliezer didn't say "Since there's a deterministic solution, no optimization is going on": it's more like "it's possible that no optimization is going on". Optimization power isn't required, but might be useful.

From The Bedrock of Morality:

For every mind that thinks that terminal value Y follows from moral argument X, there will be an equal and opposite mind who thinks that terminal value not-Y follows from moral argument X.

Does the same apply to optimisation processes? In other words, for every mind that sees you flicking the switch to save the universe, does another mind see only the photon of 'waste' brain heat and think 'photon maximiser accidentally hits switch'? Does this question have implications for impartial measurements of, say, 'impressiveness' or 'efficiency'?

Emile, that's what I thought when I read Tim's comment, but then I immediately asked myself at what point between water flowing and neurons firing does a process become simple and deterministic? As Eliezer says, to a smart enough mind, we would look pretty basic. I mean, we weren't even designed by a mind, we sprung from simple selection! But yes, it's possible that optimisation isn't involved at all in water, whereas it pretty obviously is with going to the supermarket etc.

peeper, you score 2 on the comment incoherency criterion but an unprecedented 12 for pointlessness, giving you also an average of 7.0. Congrats!

What I would suggest to begin with (besides any further technical problems) is that optimization power has to be defined relative to a given space or a given class of spaces (in addition to relative to a preference ordering and a random selection)

This allows comparisons between optimizers with a common target space to be more meaningful. In my example above, the hill climber would be less powerful than the range climber because given a "mountain range" the former would be stuck on a local maximum. So for both these optimizers, we would define the target space as the class of NxN topographies, and the range climber's score would be higher, as an average.

I mean, we weren't even designed by a mind, we sprung from simple selection!

This is backwards, isn't it? Reverse engineering a system designed by a (human?) intelligence is a lot easier than reverse engineering an evolved system.

optimization power has to be defined relative to a given space or a given class of spaces.

One problem is that the search space is often unbounded. Looking for the shortest program that performs some specified task? The search space consists of all possible programs. Obviously you start with the short ones - but until you have solved the problem, you don't know how much of the space you will wind up having to search.

Another problem is that enlarging the search space doesn't necessarily make the problem any harder. Compressing the human genome? It probably doesn't make any difference if you search the space of smaller-than-1gb programs, or the space of smaller-than-10gb programs. Beyond a certain point, the size of the search space is often an irrelevance.

It is pretty common for search spaces to have these properties - so defining metrics relative to the size of the search space will often mean that your metrics may not be very useful.

In practice, if you are assessing agents, what you usually want to know is how "good" a specified agent is at solving randomly-selected members of a specified class of problems - where goodness is measured in evaluations, time, cost - or something like that.

If you are assessing problems, what you usually want to know is how easy they are to solve - either by a specified agent, or by a population of different agents.

Often, in the real world the size of a target tells you how difficult it is to hit. In optimisationverse, that isn't true at all - much depends on the lay of the surrounding land.

I mean, we weren't even designed by a mind, we sprung from simple selection!

Humans were largely built by sexual selection - which means that the selecting agents did have minds, and that the selection process was often extremely complex. Details are on http://alife.co.uk/essays/evolution_sees/.

Ben Jones: I think a process can be deterministic and (relatively) simple, yet still count as an optimization process. An AI that implements an A* algorithm to find the best path across a maze might quality as a (specialized) Optimization process. You can make more accurate predictions about it's final state than about which way it will turn at a particular intersection - and you can't always do so for a stone rolling down a hill.

But I'm not sure about this, because you could say something similar about the deterministic checkers player - maybe it uses A* too!

In case it wasn't clear, I consider the provable checkers solver to be an optimization process - indeed, the maximally powerful (if not maximally efficient) optimizer for the domain "checkers from the canonical starting point". That it is deterministic or provably correct is entirely irrelevant.

Tim, when I said relative to a space I did not mean relative to its size. This is clear in my example of a hill topography, where increasing the scale of the hill does not make it a qualitatively different problem, just move to positions that are higher will work. In fact, the whole motivation for my suggestion is the realization that the _structure_ of that space is what limits the results of a given optimizer. So it is relative to _all_ the properties of the space that the power of an optimizer should be defined, to begin with. I say begin with because there are many other technical difficulties left, but i think that measures of power for optimizers that operate on different spaces do not compare meaningfully.

I'm not sure that I get this. Perhaps I understand the maths, but not the point of it. Here are two optimization problems:

1) You have to output 10 million bits. The goal is to output them so that no two consecutive bits are different.

2) You have to output 10 million bits. The goal is to output them so that when interpreted as an MP3 file, they would make a nice sounding song.

Now, the solution space for (1) consists of two possibilities (all 1s, all 0s) out of 2^10000000, for a total of 9,999,999 bits. The solution space for (2) is millions of times wider, leading to fewer bits. However, intuitively, (2) is a much harder problem and things that optimized (2) are actually doing more of the work of intelligence, after all (1) can be achieved in a few lines of code and very little time or space, while (2) takes much more of these resources.

I agree with David's points about the roughness of the search space being a crucial factor in a meaningful definition of optimization power.

Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...

The reason you think one problem is simple is that you perceive a solution in closed form; you can imagine a short program, much shorter than 10 million bits, that solves it, and the work of inventing this program was done in your mind without apparent effort. So this problem is very trivial on the meta-level because the program that solves it optimally appears very quickly in the ordering of possible programs and is moreover prominent in that ordering relative to our instinctive transformations of the problem specification.

But if you were trying random solutions and the solution tester was a black box, then the alternating-bits problem would indeed be harder - so you can't be measuring the raw difficulty of optimization if you say that one is easier than the other.

This is why I say that the human notion of "impressiveness" is best constructed out of a more primitive notion of "optimization".

We also do, legitimately, find it more natural to talk about "optimized" performance on multiple problems than on a single problem - if we're talking about just a single problem, then it may not compress the message much to say "This is the goal" rather than just "This is the output."

1. One difference between optimization power and the folk notion of "intelligence": Suppose the Village Idiot is told the password of an enormous abandoned online bank account. The Village Idiot now has vastly more optimization power than Einstein does; this optimization power is not based on social status nor raw might, but rather on the actions that the Village Idiot can think of taking (most of which start with logging in to account X with password Y) that don't occur to Einstein. However, we wouldn't label the Village Idiot as more intelligent than Einstein.

2. Is the Principle of Least Action infinitely "intelligent" by your definition? The PLA consistently picks a physical solution to the n-body problem that *surprises* me in the same way Kasparov's brilliant moves surprise me: I can't come up with the exact path the n objects will take, but after I see the path that the PLA chose, I find (for each object) the PLA's path has a smaller action integral than the best path I could have come up with.

3. An AI whose only goal is to make sure such-and-such coin will not, the next time it's flipped, turn up heads, can apply only (slightly less than) 1 bit of optimization pressure by your definition, even if it vaporizes the coin and then builds a Dyson sphere to provide infrastructure and resources for its ongoing efforts to probe the Universe to ensure that it wasn't tricked and that the coin actually was vaporized as it appeared to be.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30