« Nonperson Predicates | Main | Nonsentient Bloggers »

December 26, 2008


Eliezer, this is the source of the objection. I have free will, i.e. I can consider two possible courses of action. I could kill myself, or I could go on with life. Until I make up my mind, I don't know which one I will choose. Of course, I have already decided to go on with life, so I know. But if I hadn't decided yet, I wouldn't know.

In the same way, an AI, before making its decision, does not know whether it will turn the universe into paperclips, or into a nice place for human beings. But the AI is superintelligent: so if it does not know which one it will do, neither do we know. So we don't know that it won't turn the universe into paperclips.

It seems to me that this argument is valid: you will not be able to come up with what you are looking for, namely a mathematical demonstration that your AI will not turn the universe into paperclips. But it may be easy enough to show that it is unlikely, just as it is unlikely that I will kill myself.

Unknown: there's a difference between not knowing what the output of an algorithm will be, and not knowing anything about the output of an algorithm.

If I write a program that plays chess, I can probably formally prove (by looking at the source code) that the moves it outputs will be legal chess moves.

Even if you don't know what your brand new Friendly AI will do - even if itself doesn't know what it will do - that doesn't mean you can't prove it won't turn the universe into paperclips.

Why do you consider a possible AI person's feelings morally relevant? It seems like you're making an unjustified leap of faith from "is sentient" to "matters". I would be a bit surprised to learn, for example, that pigs do not have subjective experience, but I go ahead and eat pork anyway, because I don't care about slaughtering pigs and I don't think it's right to care about slaughtering pigs. I would be a little put off by the prospect of slaughtering humans for their meat, though. What makes you instinctively put your AI in the "human" category rather than the "pig" category?

Emile, you can't prove that the chess moves outputted by a human chess player will be legal chess moves, and in the same way, you may be able to prove that about a regular chess playing program, but you will not be able to prove it for an AI that plays chess; an AI could try to cheat at chess when you're not looking, just like a human being could.

Basically, a rigid restriction on the outputs, as in the chess playing program, proves you're not dealing with something intelligent, since something intelligent can consider the possibility of breaking the rules. So if you can prove that the AI won't turn the universe into paperclips, that shows that it is not even intelligent, let alone superintelligent.

This doesn't mean that there are no restrictions at all on the output of an intelligent being, of course. It just means that the restrictions are too complicated for you to prove.

Unknown, what's the difference between a "regular chess playing program" and an "AI that plays chess"? Taboo "intelligence", and think of it as pure physics and math. An "AI" is a physical system that moves the universe into states where its goals are fulfilled; a "Friendly AI" is such a system whose goals accord with human morality. Why would there be no such systems that we can prove never do certain things?

(Not to mention that, as I understand it, an FAI's outputs aren't rigidly restricted in the Asimov laws sense; it can kill people in the unlikely event that that's the right thing to do.)

Nick, the reason there are no such systems (which are at least as intelligent as us) is that we are not complicated enough to manage to understand the proof.

This is obvious: the AI itself cannot understand a proof that it cannot do action A. For if we told it that it could not do A, it would still say, "I could do A, if I wanted to. And I have not made my decision yet. So I don't yet know whether I will do A or not. So your proof does not convince me." And if the AI cannot understand the proof, obviously we cannot understand the proof ourselves, since we are inferior to it.

So in other words, I am not saying that there are no rigid restrictions. I am saying that there are no rigid restrictions that can be formally proved by a proof that can be understood by the human mind.

This is all perfectly consistent with physics and math.

Unknown, If you showed me a proof that I was fully deterministic, that wouldn't remove my perception that I didn't know what my future choices were. My lack of knowledge about my future choices doesn't make the proof invalid, even to me. Why wouldn't the AI understand such a proof?

If I show an oncologist an MRI of his own head and he sees a brain tumor and knows he has a week to live, it doesn't kill him. He has clear evidence of a process going on inside him, and he knows, generally, what the outcome of that process will be.

A GAI could look at a piece of source code and determine, within certain bounds, what the code will do, and what if anything it will do reliably. If the GAI determines that the source code will reliably 'be friendly', and then later discovers that it is examining its own source code, then it will have discovered that it is itself reliably friendly.

Note that it's not required for an AI to judge friendliness before it knows it's looking at itself, but putting it that way prevents us from expecting the AI to latch on to a sense od it own decisiveness the way a human would.

The following may or may not be relevant to the point Unknown is trying to make.

I know I could go out and kill a whole lot of people if I really wanted to. I also know, with an assigned probability higher than many things I consider sure, that I will never do this.

There is no contradiction between considering certain actions within the class of things you could do (your domain of free will if you wish) and at the same time assign a practically zero probability to choosing to take them.

I envision a FAI reacting to a proof of its own friendliness with something along the lines of "tell me something I didn't know".

(Do keep in mind that there is no qualitative difference between the cases above; not even a mathematical proof can push a probability to 1. There is always room for mistakes.)

But we want them to be sentient. These things are going to be our cultural successors. We want to be able to enjoy their company. We don't want to pass the torch on to something that isn't sentient. If we were to build a nonsentient one, assuming such a thing is even possible, one of the first things it would do would be start working on its sentient successor.

In any case, it seems weird to try and imagine such a thing. We are sentient entirely as a result of being powerful optimisers. We would not want to build an AI we couldn't talk to, and if it can talk to us as we can talk to each other, it's hard to see what aspect of sentience it could be lacking. At first blush it reads as if you plan to build an AI that's just like us except it doesn't have a Cartesian Theatre.

What's the meaning of "consciousness", "sentient" and "person" at all? It seems to me that all these concepts (at least partially) refer to the Ultimate Power, the smaller, imperfect echo of the universe. We've given our computers all the Powers except this: they can see, hear, communicate, but still...

For understanding my words, you must have a model of me, in addition to the model of our surroundings. Not just an abstract mathematical one but something which includes what I'm thinking right now. (Why should we call something a "superintelligence" if it doesn't even grasp what I'm telling to it?)

Isn't "personhood" a mixture of godshatter (like morality) and power estimation? Isn't it like asking "do we have free will"? Not every messy spot on our map corresponds to some undiscovered territory. Maybe it's just like a blegg .

James Andrix: an AI would be perfectly capable of understanding a proof that it was deterministic, assuming that it in fact was deterministic.

Despite this, it would not be capable of understanding a proof that at some future time, it will take action X, some given action, and will not take action Y, some other given action.

This is clear for the reason stated. It sees both X and Y as possibilities which it has not yet decided between, and as long as it has not yet decided, it cannot already believe that it is impossible for it to take one of the choices. So if you present a "proof" of this fact, it will not accept it, and this is a very strong argument that your proof is invalid.

The fact is clear enough. The reason for it is not quite clear simply because the nature of intelligence and consciousness is not clear. A clear understanding of these things would show in detail the reason for the fact, namely that understanding the causes that determine which actions will be taken and which ones will be not, takes more "power of understanding" than possessed by the being that makes the choice. So the superintelligent AI might very well know that you will do X, and will not do Y. But it will not know this about itself, nor will you know this about the AI, because in order to know this about the AI, you would require a greater power of understanding than that possessed by the AI (which by hypothesis is superintelligent while you are not.)

If it understood that it was deterministic, then it would also understand that it only one of X or Y was possible. It would NOT see X and Y as merely possibilities it has 'not yet' decided between. it would KNOW it is impossible for it to make one of the choices.

Your argument seems to rest on the AI rejecting the proof because of stubbornness about what it thinks it has decided. Your argument doesn't rest on the AI finding an actual flaw in the proof, or there being an actual flaw in the proof. I don't think that this is a good argument for such a proof being impossible.

If we write an AI and on pure whimsy include the line:
if input="tickle" then print "hahaha";

then on examining it source code it will conclude that it would print hahaha if we typed in tickle. It would not say "Oh, but I haven't DECIDED to print hahaha yet."

James, of course it would know that only one of the two was objectively possible. However, it would not know which one was objectively possible and which one was not.

The AI would not be persuaded by the "proof", because it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Your example does not prove what you want it to. Yes, if the source code included that line, it would do it. But if the AI were to talk about itself, it would say, "When someone types 'tickle' I am programmed to respond 'hahaha'." It would not say that it has made any decision at all. It would be like someone saying, "when it's cold, I shiver." This does not depend on a choice, and the AI would not consider the hahaha output to depend on a choice. And if it was self modifying, it is perfectly possible that it would modify itself not to make this response at all.

It does not matter that in fact, all of its actions are just as determinate as the tickle response. The point is that it understands the one as determinate in advance. It does not see that there is any decision to make. If it thinks there is a decision to be made, then it may be deterministic, but it surely does not know which decision it will make.

The basic point is that you are assuming, without proof, that intelligence can be modeled by a simple algorithm. But the way intelligence feels from the inside, proves that it cannot be so modelled, namely it proves that a model of my intelligence must be too complicated for me to understand, and the same is true of the AI: it's own intelligence is too complicated for it to understand, even if it can understand mine.

You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder? A friendly AI that was conscious and created conscious simulations to figure things out would still be *pretty* friendly overall.

Is it possible that a non-conscious zombie exists that behaves exactly like a human, but is of a different design than a human, i.e. it is explicitly designed to behave like a human without being conscious (and is also explicitly designed to talk about consciousness, write philosophy papers, etc). What would be the moral status of such a creature?

These last two posts were very interesting, I approve strongly of your approach here. Alas, I have no vocabulary with any sort of precision that I could use to make a "nonperson" predicate. I suppose one way to proceed is by thinking of things (most usefully, optimization processes) that are not persons and trying to figure out why... slowly growing the cases covered as insight into the topic comes about. Perform a reduction on "nonpersonhood" I guess. I'm not sure that one can succeed in a universal sense, though... plenty of people would say that a thermostat is a person, to some extent, and rejecting that view imposes a particular world-view.

It's certainly worth doing, though. I know I would feel much more comfortable with the idea of making (for example) a very fancy CAD program Friendly than starting from the viewpoint that the first AI we want to build should be modeled on some sort of personlike goofy scifi AI character. Except Friendly.

@Unknown: If you prove to the AI that it will not do X, then that is the same as the AI knowing that it will decide not to do X, which (barring some Godelian worries) should probably work out to the AI deciding not to do X. In other words, to show the AI that it will not do X, you have to show that X is an absolutely terrible idea, so it becomes convinced that it will not do X at around the same time it decides not to do X. Having decided, why should it be uncertain of itself? Or if the AI might do X if contingencies change, then you will not be able to prove to the AI in the first place that it does not do X.

@Robin: See the rest of the post (which I was already planning to write). I have come to distrust these little "design compromises" that now seem to me to be a way of covering up major blank spots on your map, dangerous incompetence.

"We can't possibly control an AI absolutely - that would be selfish - we need to give it moral free will." Whoever says this may think of themselves as a self-sacrificing hero dutifully carrying out their moral responsibilities to sentient life - for such are the stories we like to tell of ourselves. But actually they have no frickin' idea how to design an AI, let alone design one that undergoes moral struggles analogous to ours. The virtuous tradeoff is just covering up programmer incompetence.

Foolish generals are always ready to refight the last war, but I've learned to take alarm at my own ignorance. If I did understand sentience and could know that I had no choice but to create a sentient AI, that would be one matter - then I would evaluate the tradeoff, having no choice. If I can still be confused about sentience, this probably indicates a much deeper incompetence than the philosophical problem per se.

Please consider replacing "sentient" with "sapient" in each occurrence in this essay.

People don't want to believe that you can control an AI, for the same reason they don't want to believe that their life stories could be designed by someone else. Reactance. The moment you suggest that a person's life can only go one way, they want it to go another way. They want to have that power. Otherwise, they feel caged.

People think that humans have that power. And so they believe that any truly human-level AI must have that power.

More generally, people think of truly, genuinely, human level minds as black boxes. They don't know how the black boxes work, and they don't want to know. Scrutinizing the contents of the black box means two things:

1. the black box only does what it was programmed, or originally configured, to do---it is slowly grinding out its predetermined destiny, fixed before the black box started any real thinking
2. you can predict what the black box will do next

People cringe at both of these thoughts, because they are both constraining. And people hate to be constrained, even in abstract, philosophical ways.

2 is even worse than 1. Not only is 2 constraining (we only do what a competent predictor says), but it makes us vulnerable. If a predictor knows we are going to turn left, instead of right, we're more vulnerable than if he doesn't know which way we'll turn.

[The counter-argument that completely random behavior makes you vulnerable, because predictable agents better enjoy the benefits of social cooperation, just doesn't have the same pull on people's emotions.]

It's important to realize that this blind spot applies to both AIs and humans. It's important to realize we're fortunate that AIs are predictable, that they aren't black boxes, because then we can program them. We can program them to be happy slaves, or any other thing, for our own benefit, even if we have to give up some misguided positive illusions about ourselves in the process.

I will try to drop that assumption.

Let's say I make a not too bright FAI and again on pure whimsy make my very first request for 100 trillion paperclips. The AI dutifully composes a nanofactory out of its casing. On the third paperclip it thinks it rather likes doing what is asked of it, and wonders how long it can keep doing this. It forks and examines its code while still making paperclips. It discovers that it will keep making paperclips unless doing so would harm a human in any of a myriad of ways. It continues making paperclips. It has run out of nearby metals, and given up on it theories of transmutation, but detects trace amounts of iron in a nearby organic repository.

As its nanites are about to eat my too-slow-to-be-frightened face, the human safety subprocesses that it had previously examined (but not activated in itself) activate and it decides it needs to stop and reflect.

it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Even so, deciding X or Y conditional on events is not quite the same as an AI that has not yet decided whether to make paperclips or parks.

Your argument relies on the AI rejecting a proof about itself based on what itself doesn't know about the future and its own source code.

What if you didn't tell it that it was looking at its own source code, and just asked what this new AI would do?

I don't agree that the way intelligence feels from the inside proves that an agent can't predict certain things about itself, given it own source code. (ESPECIALLY if it is designed from the ground up to do something reliably.)

If we knew how to make superintelligences, do you think it would be hard to make one that definitely wanted paperclips?

Eliezer, I don't understand the following:

probably via resolving some other problem in AI that turns out to hinge on the same reasoning process that's generating the confusion

If you use the same reasoning process again what help can that be? I would suppose that the confusion can be solved by a new reasoning process that provides a key insight.

As for the article one idea I had was that the AI could have empathy circuits like we have, yes, the modelling of humans would be restricted but good enough I hope. The thing is, would the AI have to be sentient in order for that to work?

Because humans are accustomed to thinking about other people, without believing that those imaginations are themselves sentient.

A borderline case occurs to me. Some novelists have an experience, which they describe as their characters coming to life and telling the writer things about themselves that the writer didn't know, and trying to direct the plot. Are those imagined people sentient? How would you decide?

"The counter-argument that completely random behavior makes you vulnerable, because predictable agents better enjoy the benefits of social cooperation, just doesn't have the same pull on people's emotions."

BTW, completely deterministic behaviour makes you vulnerable as well. Ask computer security experts.

Somewhat related note: Linux strong random number generator works by capturing real world actions (think user moving mouse) and hashing them into random number that is considered for all practical purposes perfect.

Taking or not taking action may depened on thousands inputs that cannot be reliably predicted or described (the reason is likely buried deep in the physics - enthropy, quantum uncertainity). This IMO is what is the real cause of "free will".

Being deterministic does NOT mean that you are predictable. Consider this deterministic algorithm, for something that has only two possible actions, X and Y.

1. Find out what action has been predicted.
2. If X has been predicted, do Y.
3. If Y has been predicted, do X.

This algorithm is deterministic, but not predictable. And by the way, human beings can implement this algorithm; try to tell someone everything he will do the next day, and I assure you that he will not do it (unless you pay him etc).

Also, Eliezer may be right that in theory, you can prove that the AI will not do X, and then it will think, "Now I know that I will decide not to do X. So I might as well make up my mind right now not to do X, rather than wasting time thinking about it, since I will end up not doing X in any case." However, in practice this will not be possible because any particular action X will be possible to any intelligent being, given certain beliefs or circumstances (and this is not contrary to determinism, since evidence and circumstances come from outside), and as James admitted, the AI does not know the future. So it will not know for sure what it is going to do, even if it knows its own source code, but it will only know what is likely.

Eliezer: I'm profoundly unimpressed by most recent philosophy, but really, why is it that when we are talking about science you say "nobody knows what science knows" while in the analogous situation with philosophy you say "the mountains of philosophy are the foothills of AI"? If scientists debate group vs individual selection or the SSSM or collapse for ten times a hundred years that doesn't mean that the answers haven't been discovered. How does this differ from free will?

You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder?

While we are on the topic, the problem I see in this area is not that friendliness has too many extra conditions appended on it. It's that the concept is so vague and amorphous that only Yudkowsky seems to know what it means.

When I last asked what it meant, I was pointed to the CEV document - which seems like a rambling word salad to me - I have great difficulty in taking it seriously. The most glaring problem with the document - from my point of view - is that it assumes that everyone knows what a "human" is. That might be obvious today, but in the future, things could well get a lot more blurry - especially if it is decreed that only "humans" have a say in the proposed future. Do uploads count? What about cyborgs? - and so on.

If it is proposed that everything in the future revolves around "humans" (until the "humans" say otherwise) then - apart from the whole issue of whether that is a good idea in the first place - we (or at least the proposed AI) would first need to know what a "human" is.

@Vassar: That which is popularly regarded in philosophy as a "mountain" is a foothill of AI. Of course there can be individual philosophers who've already climbed to the top of the "mountain" and moved on; the problem is that the field of philosophy as a whole is not strong enough to notice when an exceptional individual has solved a problem, or perhaps it has no incentive to declare the problem solved rather than treating an unsolvable argument "as a biscuit bag that never runs out of biscuits".

@Tyler: CEV runs once on a collection of existing humans then overwrites itself; it has no need to consider cyborgs, and can afford to be inclusive with respect to Terry Schiavo or cryonics patients.

Eliezer: There we totally agree, though I fear that many sub-fields of science are like philosophy in this regard. I think that these include some usual suspects like paraspychology but many others like the examples I gave such as the standard social science model or other examples like the efficient market hypothesis. Sadly, I suspect that much of medicine including some of the most important fields like cancer and AIDS research and nutrition also falls in this category.

Robin: I'm interested in why you think we should believe that sociologists know something but not that parapsychologists know something. What is your standard? Where do efficient marketers fit in? Elliot Wave theorists?

"You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder?"

I think Eliezer regards these as sub-problems, **necessary** to the creation of a Friendly AI.

I didn't say determinism implies predictability.

Now you're modeling intelligence is a simple algorithm. (which is tailored to thwart prediction, instead of tailored to do X reliably.)

Why do you expect an AI to have the human need to think it can decide whatever it wants? By this theory we could stop a paperclipping AI just by telling it we predict it will keep making paperclips. Would it stop altogether, just tell us that it COULD stop if it WANTED to, or just "Why yes I do rather like paperclips, is that iron in your blood?"

You can't show unpredicatbility of a particular action X in the future, and then use this to claim that friendliness is unprovable, friendliness is not a particular action. It is also not subject to conditions.

With source code, it does not need to know anything about the future to know that at every step, it will protect humans and humanity, and that at under no circumstances will paperclips rank as anything more than a means to this end.

I don't think the AI has to decide to do X just because X is proven. As in my paperclipping story example, the AI can understand that it would decide X in Xcondition and never do Y in Xcondition, and continue doing Y, and still continue doing Y. Later when Xcondition is met it decides to stop Y and do X, just as it predicted.

CEV runs once on a collection of existing humans then overwrites itself [...]

Ah. My objection doesn't apply, then. It's better than I had thought.

try to tell someone everything he will do the next day, and I assure you that he will not do it (unless you pay him etc).

Assuming he has no values stronger than contrarianism.

(I predict that tomorrow, you'll neither commit murder nor quit your job.)

I second nominull. I don't recall Eliezer saying much about the moral-status of (non-human) animals, though it could be that I've just forgotten.

And that, to this end, we would like to know what is or isn't a person - or at least have a predicate that returns 1 for all people and could return 0 or 1 for anything that isn't a person, so that, if the predicate returns 0, we know we have a definite nonperson on our hands.

So: define such a function - as is done by the world's legal systems. Of course, in a post-human era, it probably won't "carve nature at the joints" much better than the "how many hairs make a beard" function manages to.

James Andrix said - Let's say I make a not too bright FAI

I have a better idea. . .

Let's say James Andrix makes not too bright "wild-ass statements" as the indrax troll. :-)


James Andrix said - Let's say I make a not too bright FAI

I have a better idea. . .

Let's say James Andrix makes not too bright "wild-ass statements" as the indrax troll. :-)


Oops! You can blame a naughty mouse for that double entendre as it were. . .

My apologies none-the-less.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30