« What Core Argument? | Main | We Agree: Get Froze »

December 11, 2008

Comments

I understand there are various levels on which one can express one's loves. One can love Suzy, or kind pretty funny women, or the woman selected by a panel of judges, or the the one selected by a judging process designed by a certain AI strategy, etc. But even very meta loves are loves. You want an AI that loves the choices made by a certain meta process that considers the wants of many, and that may well be a superior love. But it is still a love, your love, and the love you want to give the AI. You might think the world should be grateful to be placed under the control of such a superior love, but many of them will not see it that way; they will see your attempt to create an AI to take over the world as an act of war against them.

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

And I believe that if two very smart people manage to agree on where to go for lunch they have accomplished a lot for one day.

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

To clear up any confusion about the meaning of this statement, I do agree with pretty much everything here, and I do agree that FAI is critically important.

That doesn't change the fact that I think EY isn't being very useful ATM.

I'm just trying to get the problem you're presenting. Is it that in the event of a foom, a self-improving AI always presents a threat of having its values drift far enough away from humanity's that it will endanger the human race? And your goal is to create the set of values that allow for both self-improvement and friendliness? And to do this, you must not only create the AI architecture but influence the greater system of AI creation as well? I'm not involved in AI research in any capacity, I just want to see if I understand the fundamentals of what you're discussing.

Robin, using the word "love" sounds to me distinctly like something intended to evoke object-level valuation. "Love" is an archetype of direct valuation, not an archetype of metaethics.

And I'm not so much of a mutant that, rather than liking cookies, I like everyone having their reflective equilibria implemented. Taking that step is the substance of my attempt to be fair. In the same way that someone voluntarily splitting up a pie into three shares, is not on the same moral level as someone who seizes the whole pie for themselves - even if, by volunteering to do the fair thing rather than some other thing, they have shown themselves to value fairness.

My take on this was given in The Bedrock of Fairness.

But you might as well say "George Washington gave in to his desire to be a tyrant; he was just a tyrant who wanted democracy." Or "Martin Luther King declared total war on the rest of the US, since what he wanted was a nonviolent resolution."

Similarly with "I choose not to control you" being a form of controlling.

AGI Researcher:
"... I do agree that FAI is critically important."
"... EY isn't being very useful ATM."

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

"Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?"

That poster is taking bits of an IM conversation out of context and then paraphrasing them. Sadly any expectation of logical consistency has to be considered unwarranted optimism.

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

All that stuff was the party line back in 2004.

There has been no /visible/ progress since then.

Slightly off the main topic but nearer to Robin's response:

Eliezer, how do we know that human good-ness scales? How do we know that, even if corectly implemented, applying it to a near-infinitely capable entity won't yield something equally monstrous as a paperclipper? Perhaps our sense of good-ness is meaningful only at or near our current level of capability?

There is nothing oxymoronic about calling democracy "the tyranny of the majority". And George Washington himself was decisive in both the violent war of secession called a "revolution" that created a new Confederate government and the unlawful replacement of the Articles of Confederation with the Constitution, after which he personally crushed the Whiskey Rebellion of farmers resisting the national debt payments saddled upon them by this new government. Even MLK has been characterized as implicitly threatening more riots if his demands were not met (in that respect he followed Gandhi, who actually justified violence on the basis of nationalism though this is not as well remembered). Eliezer is mashing applause lights.

AGI Researcher: "There has been no /visible/ progress since [2004]."

What would you consider /visible/ progress? Running code?

Also, how about this: "Overcoming Bias presently gets over a quarter-million monthly pageviews"?

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

"In a foom that took two years.."

The people of the future will be in a considerably better position than you to evaluate their immediate future. More importantly, they are in a position to modify their future based on that knowledge. This anticipatory reaction is what makes both of your opinions exceedingly tenuous. Everyone else who embarks on pinning down the future at least has the sense to sell books.

In the light of this, the goal should be to use each other's complementary talents to find the hardest rock solid platform not to sell the other a castle made of sand.

Robin, we're still talking about a local foom. Keeping security for two years may be difficult but is hardly unheard-of.

An AI that reaches a certain point in its own development becomes able to improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability.

This seems like the first problem I detected. An intelligence being able to improve itself does not necessarily lead to a recursive cascade of self-improvement - since it may only be able to improve some parts of itself - and it's quite possible that after it has done those improvements, it can't do any more.

Say that machine intelligence learns how to optimise FOR loops, eliminatining unnecessary conditions, etc. Presto, it can optimise its entire codebase - and thus improve itself. However, that doesn't lead to a self-improving recursive cascade - because it only improved itself in one way, and that was a rather limited way. Of course this kind of improvement has been going on for decades - via lint tools and automatic refactoring.

As machines get smarter, they will gradually become able to improve more and more of themselves. Yes, eventually machines will be able to cut humans out of the loop - but before that there will have been much automated improvement of machines by machines - and after that there may still be human code reviews.

This is not the first time I have made this point here. It does not seem especially hard to understand to me - but yet the conversation sails gaily onwards, with no coherent criticism, and no sign of people updating their views: it feels like talking to a wall.

Ironic, such passion directed toward bringing about a desirable singularity,
rooted in an impenetrable singularity of faith in X.
X yet to be defined, but believed to be [meaningful|definable|implementable] independent of future context.

It would be nice to see an essay attempting to explain an information or systems-theoretic basis supporting such an apparent contradiction (definition independent of context.)

Or, if the one is arguing for a (meta)invariant under a stable future context, an essay on the extended implications of such stability, if the one would attempt to make sense of "stability, extended."

Or, a further essay on the wisdom of ishoukenmei, distinguishing between the standard meaning of giving one's all within a given context, and your adopted meaning of giving one's all within an unknowable context.

Eliezer, I recall that as a child you used to play with infinities. You know better now.

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

But its clearly the best search engine available.
And here I am making an argument for peace via economics!

If it's doing anything visible, its probably doing something at least some people want.

Regarding the 2004 comment, AGI Researcher probably was referring to the Coherent Extrapolated Volition document which was marked by Eliezer as slightly obsolete in 2004, and not a word since about any progress in the theory of Friendliness.

Robin, if you grant that a "hard takeoff" is possible, that leads to the conclusion that it will eventually be likely (humans being curious and inventive creatures). This AI would "rule the world" in the sense of having the power to do what it wants. Now, suppose you get to pick what it wants (and program that in). What would *you* pick? I can see arguing with the feasibility of hard takeoff (I don't buy it myself), but if you accept that step, Eliezer's intentions seem correct.

Oh, and Friendliness theory (to the extent it can be separated from specific AI architecture details) is like the doomsday device in Dr. Strangelove: it doesn't do any good if you keep it secret! [in this case, unless Eliezer is supremely confident of programming AI himself first]

@Tim Re: FOR loops - I made that exact point explicitly when introducing the concept of "recursion" via talking about self-optimizing compilers.

Talk about no progress in the conversation. I begin to think that this whole theory is simply too large to be communicated to casual students. Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too.

"FOOM that takes two years"

In addition to comments by Robin and Aron, I would also pointed out the possibility that longer the FOOM takes, larger the chance it is not local, regardless of security - somewhere else, there might be another FOOMing AI.

Now as I understand, some consider this situation even more dangerous, but it as well might create "take over" defence.

Another comment to FOOM scenario and this is sort of addition to Tim's post:

"As machines get smarter, they will gradually become able to improve more and more of themselves. Yes, eventually machines will be able to cut humans out of the loop - but before that there will have been much automated improvement of machines by machines - and after that there may still be human code reviews."

Eliezer seems to spend a lot of time explaining what happens when "k > 1" - when AI intelligence surpases human and starts selfimproving. But I suspect that the phase 0.3 < k < 1 might be pretty long, maybe decades.

Also, moreover, by the time of FOOM, we should be able to use vast amounts of fast 'subcritical' AIs (+ weak AIs) as guardians of process. In fact, by that time, k < 1 AIs might play a pretty important role in world economy and security by that time and it does not take too much pattern recognition power to keep things at bay. (Well, in fact, I believe Eliezer proposes something similar in his thesis, except for locality issue).

Eliezer:

"Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own."

Why do you think it is crushing objection? I believe Tim just repeats his favorite theme (which, in fact, I tend to agree with) where machine augmented humans build better machines. If you can use automated refactoring to improve the way compiler works (and today, you often can), that is in fact pretty cool augmentation of human capabilities. It is recursive FOOM. The only difference of your vision and his is that as long as k < 1 (and perhaps some time after that point), humans are important FOOM agents. Also, humans are getting much more capable in the process. For example, machine augmented human (think weak AI + direct neural interface and all that cyborging whistles + mind drugs) might be quite likely to follow the FOOM.

Robin says "You might think the world should be grateful to be placed under the control of such a superior love, but many of them will not see it that way; they will see your attempt to create an AI to take over the world as an act of war against them."

Robin, do you see that CEV was created (AFAICT) to address that very possibility? That too many, feeling this too strongly, means the AI self detructs or somesuch.

I like that someone challenged you to create your own unoffensive FAI/CEV, I hope you'll respond to that. Perhaps you believe that there simply isn't any possible fully global wish, however subtle or benign, that wouldn't also be tantamount to a declaration of war...?

Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own.

It does not seem very likely that I am copying you - when my essay on this subject dates from February 3rd, while yours apparently dates from November 25th.

So what exactly is the counter-argument you were attempting to make?

That self-optimising compilers lack "insight" - and "insight" is some kind of boolean substance that you either have or you lack?

In my view, machines gradually accumulate understanding of themselves - and how to modify themselves. There is a long history of automated refactoring - which seems to me to clearly demonstrate that "insight" within machines into how to modify computer code comes in a vast number of little pieces, which are gradually being assembled over the decades into ever more impressive refactoring tools. I have worked on refactoring tools myself - and I see no hint of sudden gains in capability in this area - rather progress is made in thousands, or even millions of tiny steps.

" I can see arguing with the feasibility of hard takeoff (I don't buy it myself), but if you accept that step, Eliezer's intentions seem correct."

Bambi,

Robin has already said *just that.* I think Eliezer is right that this is a large discussion, and when many of the commenters haven't carefully followed it, comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool.

Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented.

"comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool."

how about using something like debatepedia?

http://wiki.idebate.org/

There are some types of knowledge that seem hard to come by (especially for singletons). The type of knowledge is knowing what destroys you. As all knowledge is just an imperfect map, there are some things a priori that you need to know to avoid. The archetypal example is in-built fear of snakes in humans/primates. If we hadn't had this while it was important we would have experimented with snakes the same way we experiment with stones/twigs etc and generally gotten ourselves killed. In a social system you can see what destroys other things like you, but the knowledge of what can kill you is still hard won.

If you don't have this type of knowledge you may step into an unsafe region, and it doesn't matter how much processing power or how much you correctly use your previous data. Examples that might threaten singletons:

1) Physics experiments, the model says you should be okay but you don't trust your model under these circumstances, which is the reason to do the experiment.
2) Self-change, your model says that the change will be better but the model is wrong. It disables the system to a state it can't recover from, i.e. not an obvious error but something that renders it ineffectual.
3) Physical self-change. Large scale unexpected effects from feedback loops at a different levels of analysis, e.g. things like the swinging/vibrating bridge problem, but deadly.

It is true that the topic is too large for casual followers (such as myself). So rather than aiming at refining any of the points personally, I wonder in what ways Robin has convinced Eli, and vice-versa. Because certainly, if this were a productive debate, they would be able to describe how they are coming to consensus. And from my perspective there are distinct signals that the anticipation of a successful debate declines as posts become acknowledged for their quality as satire.

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

The default case of FOOM is an unFriendly AI
Before this, we also have: "The default case of an AI is to not FOOM at all, even if it's self-modifying (like a self-optimizing compiler)." Why not anti-predict that no AIs will FOOM at all?

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).
Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

Huh? I never mentioned self-optimizing compilers, and you never mentioned FOR loops.

I usually view this particular issue in terms of refactoring - not compilation - since refactoring is more obviously a continuous iterative process operating on an evolving codebase: whereas you can't compile a compiled version of a program very many times.

Anyway, this just seems like an evasion of the point - and a digression into trivia.

If you have any kind of case to make that machines will suddenly develop the ability to reprogram and improve themselves all-at-once - with the histories of compilation, refactoring, code wizards and specification languages representing an irrelevant side issue - I'm sure I'm not the only one who would be interested to hear about it.

Eliezer:

"Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented."

Well, it certainly does:

"Today, machines already do a lot of programming. They perform refactoring tasks which would once have been delegated to junior programmers. They compile high-level languages into machine code, and generate programs from task specifications. They also also automatically detect programming errors, and automatically test existing programs."

I guess your claim is only a misunderstaning caused by not understaning CS terminology.

Find a new way how to optimize loops is application of automated refactoring and automated testing and benchmarking.

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

My point was not that non-singletons can see it coming. But if one non-singletons trys self-modification in a certain way and it doesn't work out then other non-singletons can learn from the mistake (or in worst the evolutionary case the descendents of people curious in a certain way would be out competed by those that instinctively didn't try the dangerous activity). Less so with the physics experiments, depending on dispersal of non-singletons, range of the physical destruction.

Carl, Robin's response to this post was a critical comment about the proposed content of Eliezer's AI's motivational system. I assumed he had a reason for making the comment, my bad.

Venu: Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

It seems to me like a pretty small probability that an AI not designed to self-improve will be the first AI that goes FOOM, when there are already many parties known to me who would like to deliberately cause such an event.

Why not anti-predict that no AIs will FOOM at all?

A reasonable question from the standpoint of antiprediction; here you would have to refer back to the articles on cascades, recursion, the article on hard takeoff, etcetera.

Re Tim's "suddenly develop the ability reprogram and improve themselves all-at-once" - the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI, not whether things can happen with zero resource or instantaneously. But the former position seems to be routinely distorted into the straw latter.

For example, machine augmented human (think weak AI + direct neural interface and all that cyborging whistles + mind drugs) might be quite likely to follow the FOOM

It seems unlikely to me. For one thing, see my Against Cyborgs video/essay. For another, see my Intelligence Augmentation video/essay. The moral of the latter one in this context is that Intelligence Augmentation is probably best thought of as machine intelligence's close cousin and conspirator - not really some kind of alternative, something that will happen later on, or a means to keep humans involved somehow.

Eliezer:

"Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere."

I guess there is significant difference - for singleton, each mistake can be fatal (and not only for it).

I believe that this is the real part I dislike about the idea, except the part where singleton either cannot evolve or cannot stay singleton (because of speed of light vs locality issue).

Tim:

Well, as off-topic recourse, I see only cited some engineering problems in your "Against Cyborgs" essay as contraargument. Anyway, let me to say that in my book:

"miniaturizing and refining cell phones, video displays, and other devices that feed our senses. A global-positioning-system brain implant to guide you to your destination would seem seductive only if you could not buy a miniature ear speaker to whisper you directions. Not only could you stow away this and other such gear when you wanted a break, you could upgrade without brain surgery."

is pretty much equivalent of what I had in mind with cyborging. Brain surgery is not the point. I guess it is even today pretty obvious that to read thoughts, you will not need any surgery at all. And if information is fed back into my glasses, that is OK with.

Still, the ability to just "think" the code (yep, I am a programmer), then see the whole procedure displayed before my eyes already refactored and tested (via weak AI augmentation) sound like nice productivity booster. In fact, I believe that if thinking code is easy, one, with the help of some nice programming language, could learn to use coding to solve much more problems in normal live situations, gradually building personal library of routines..... :)

the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI

Uh, that's a totally different issue from the one I was discussing.

To recap: I was pointing out that machines have been writing code and improving themselves for decades - that refactoring and lint-like programs applying their own improvements to their own codebases has a long history in the community - dating back to the early days of Smalltalk. That progress in computer ability at self-improvement (via modification of your own codebase) is, in point of fact, a long, slow and gradual process that has been going on for decades so far - and thus is not really well conceived of as being something that will happen suddenly in the future - when computers attain "insight".

Also, I notice that you have "quietly" edited the original post - in an attempt to eliminate the very point I was originally criticising. This rather makes it look as though I was misquoting you. Then you accuse me of attacking a straw man - after this clumsy attempt to conceal the original evidence. Oh well, at least you are correcting your own mistakes when they are pointed out to you - it seems like a kind of progress to me.

Eliezer: "and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever)."

Why do the values freeze? Because there is no more competition? And if that's the problem, why not try to plan a transition from pre-AI to an ecology of competing AIs that will not converge to a singleton? Or spell out the problem clearly enough that we can figure whether one can achieve a singleton that doesn't have that property?

(Not that Eliezer hasn't heard me say this before. I made a bit of a speech about AI ecology at the end of the first AGI conference a few years ago.)

Robin: "In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it."

Yes. The timespan of the foom is important largely because it changes what the AI is likely to do, because it changes the level of danger that the AI is in and the urgency of its actions.

Eliezer: "When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background."

There are many sociological parallels between Eliezer's "movement", and early 20th-century communism.

Eliezer: "I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI."

I wonder if you're thinking that I meant that. You can see that I didn't in my first comment on Visions of Heritage. But I do think you're going one level too few meta. And I think that CEV would make it very hard to escape the non-meta philosophies of the programmers. It would be worse at escaping them than the current, natural system of cultural evolution is.

Numerous people have responded to some of my posts by saying that CEV doesn't restrict the development of values (or equivalently, that CEV doesn't make AIs less free). Obviously it does. That's the point of CEV. If you're not trying to restrict how values develop, you might as well go home and watch TV and let the future spin out of control. One question is where "extrapolation" fits on a scale between "value stasis" and "what a free wild-type AI would think of on its own." Is it "meta-level value stasis"?

I think that evolution and competition have been pretty good at causing value development. (That's me going one more level meta.) Having competition between different subpopulations with different values is a key part of this. Taking that away would be disastrous.

Not to mention the fact that value systems are local optima. If you're doing search, it might make sense to average together some current good solutions and test the results out, in competition with the original solutions. It is definitely a bad idea to average together your current good solutions and replace them with the average.

Eliezer: "Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too."

No; the dynamic you're thinking of is that I raise objections to things that you have already analyzed, because I think your analyis was unconvincing. Eg., the recent Attila the Hun / Al Qaeda example. The fact that you have written about something doesn't mean you've dealt with it satsifactorily.

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.

Your perception of lawful extrapolation of values as "stasis" seems to stem from intuitions about free will. If you look at the worldline as a 4D crystal, everything is set in stone, according to laws of physics. The future is determined by the content of the world, in particular by actors embedded in it. If you allow AI to fiddle with the development of humanity, you can view it as a change in underlying laws of physics in which humanity is embedded, not as a change on the level you'd recognize as interference in your decision-making. If it must, this change can drive the events in ways so locally insignificant you'd need to be a superintelligence yourself to tell them from chance, but it could act as a special "luck" that in the end results in the best possible outcome given the allowed level of interference.

A two year FOOM doesn't have to be obvious for one year or even half a year. If the growth rate is up-curving, it's going to spend most of its ascent looking a bit ELIZA, and then it's briefly a cute news-darling C3PO, and then it goes all ghost-in-the shell - game over. Even if there is a window of revealed vulnerability, will you without hindsight recognize it? Can you gather the force and political will in time? How would you block the inevitable morally outraged (or furtively amoral) attempts to rebuild?

Bruce Willis is not the answer.

The problems that I see with friendly AGI are:

1) Its not well understood outside of AI researchers, so the scientists who create it will build what they think is the most friendly AI possible. I understand what Eliezer is saying about not using his personal values, so instead he uses his personal interpretation of something else. Eliezer says that making a world which works by "better rules" then fading away would not be a "god to rule us all", but who's decided on those rules (or the processes by which the AI decides on those rules)? Ultimately its the coders who design the thing. Its a very small group of people with specialized knowledge changing the fate of the entire human race.

2) Do we have any reason to believe that a single foom will drastically increase an AI's intelligence, as opposed to making it just a bit smarter? Typically, recursive self-improvement does make significant headway, until the marginal return on investment in more improvement is eclipsed by other (generally newer) projects.

3) If an AGI could become so powerful as to rule the world in a short time span, any group which disagrees with how an AGI project is going will try to create their own before the first one is finished. This is a prisoner's dilemma arms-race scenario. Considerations about its future friendliness could be put on hold in order to get it out "before those damn commies do".

4) In order to create an AGI before the opposition, vast resources would be required. The process would almost certainly be undertaken by governments. I'm imagining the cast of characters from Dr. Strangelove sitting in the War Room and telling the programmers and scientist how to design their AI.

In short, I think the biggest hurdles are political, and so I'm not very optimistic they'll be solved. Trying to create a friendly AI in response to someone else creating a perceived unfriendly AI is a rational thing to do, but starting the first friendly AI project may not be rational.

I don't see whats so bad about a race of machines wiping us out though; we're all going to die and be replaced by our children in one way or another anyways.

It would have been better of me to reference Eliezer's Al Qaeda argument, and explain why I find it unconvincing.

Vladimir:

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.
You believe evolution works, right?

You can replace randomness only once you understand the search space. Eliezer wants to replace the evolution of values, without understanding what it is that that evolution is optimizing. He wants to replace evolution that works, with a theory that has so many weak links in its long chain of logic that there is very little chance it will do what he wants it to, even supposing that what he wants it to do is the right thing to do.

Vladimir:

Your perception of lawful extrapolation of values as "stasis" seems to stem from intuitions about free will.
That's a funny thing to say in response to what I said, including: 'One question is where "extrapolation" fits on a scale between "value stasis" and "what a free wild-type AI would think of on its own."' It's not that I think "extrapolation" is supposed to be stasis; I think it may be incoherent to talk about an "extrapolation" that is less free than "wild-type AI", and yet doesn't keep values out of some really good areas in value-space. Any way you look at it, it's primates telling superintelligences what's good.

As I just said, clearly "extrapolation" is meant to impose restrictions on the development of values. Otherwise it would be pointless.

Vladimir:

it could act as a special "luck" that in the end results in the best possible outcome given the allowed level of interference.
Please remember that I am not assuming that FAI-CEV is an oracle that magically works perfectly to produce the best possible outcome. Yes, an AI could subtly change things so that we're not aware that it is RESTRICTING how our values develop. That doesn't make it good for the rest of all time to be controlled by the utility functions of primates (even at a meta level).

Here's a question whose answer could diminish my worries: Can CEV lead to the decision to abandon CEV? If smarter-than-humans "would decide" (modulo the gigantic assumption CEV makes that it makes sense to talk about what "smarter than humans would decide", as if greater intelligence made agreement more rather than less likely - and, no, they will not be perfect Bayesians) that CEV is wrong, does that mean an AI guided by CEV would then stop following CEV?

If this is so, isn't it almost probability 1 that CEV will be abandoned at some point?

Eliezer, maybe you should be writing fiction. You say you want to inspire the next generation of researchers and you're spending a lot of time writing these essays and correcting misconceptions of people who never read or didn't understand earlier essays (fiction could tie the different parts of your argument together better than this essay style. Why not try coming up with several possible scenarios along with your thinking embedded in them. It may be worth remembering that far more of the engineers working on Apollo spoke of being inspired by Robert Heinlein, than by Goddard and von Braun and the rocket pioneers.

"If this is so, isn't it almost probability 1 that CEV will be abandoned at some point?"

Phil, if a CEV makes choices *for reasons* why would you expect it to have a significant chance of reversing that decision without any new evidence or reasons, and for this chance to be independent across periods? I can be free to cut off my hand with an axe, even if the chance that I'll do it is very low, since I have reasons not to.

Phil, I don't see the point in criticizing a flawed implementation of CEV. If we don't know how to implement it properly, if we don't understand how it's supposed to work in much more technical detail than the CEV proposal includes, it shouldn't be implemented at all, no more than a garden-variety unFriendly AI. If you can point out a genuine flaw in a specific scenario of FAI's operation, right implementation of CEV shouldn't lead to that. To answer your question, yes, CEV could decide to disappear completely, construct an unintelligent artifact, or produce an AI with some strange utility. It makes a single decision, an attempt to deliver humane values through the threshold of inability to self-reflect, and what comes of it is anyone's guess.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31