« Fake Fish | Main | Brief Break »

August 30, 2008

Comments

Do you think it would be worthwhile, as a safety measure, to make the first FAI an oracle AI? Or would that be like another two bits of safety after the theory behind it gives you 50?

You should call it a Brazen Head.

But spinning a hard drive can move things just outside the computer, or just outside the room, by whole neutron diameters

Not long ago, when hard drives were much larger, programmers could make them inch across the floor; they would even race each other. From the Jargon File:

There is a legend about a drive that walked over to the only door to the computer room and jammed it shut; the staff had to cut a hole in the wall in order to get at it!

Pdf, Nick Bostrom thinks that the Oracle AI concept might be important, so every year or so I take it out, check it again, and ask myself how much safety it would buy. (Nick Bostrom being one of the few people around who I don't disagree with lightly, even in my own field.) Although this should properly be called a Friendly Oracle AI, since you're not skipping any of the theoretical work, any of the proofs, or any of the AI's understanding of "should".

Heck, even an Friendly Oracle AI could wreak havoc. Just imagine someone asking, "How can I Take Over The World?" and getting back an answer that would actually work... ;)

Yes, it's silly, but no sillier than tiling the galaxy with molecular smiley faces...

An Oracle has rather obvious actuators: it produces advice.

The weaker the actuators you give an AI, the less it can do for you.

The main problem I see with only producing advice is that it keeps humans in the loop - and so is a very slow way to interact with the world. If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome.

"If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome."

Your scenario seems contradictory. Why would an Oracle AI be screaming? It doesn't care about that outcome, and would answer relevant questions, but no more.

Just great. I wrote four paragraphs about my wonderful safe AI. And then I saw Tim Tyler's post, and realized that, in fact, a safe AI would be dangerous because it's safe... If there is technology to build AI, the thing to do is to build one and hand the world to it, so somebody meaner or dumber than you can't do it.

That's actually a scary thought. It turns out you have to rush just when it's more important than ever to think twice.

As an aside, Problem 4 (which looks the same as Problem 2 to me) is not unique to AI research. There are several proposed XML languages for lesser applictions than AI, that do nothing more than give names to every human concept in some domain, put pointy brackets around them, and organise them into a DTD, without a word about what a machine is supposed to actually do with them other than by reference to the human meanings. I'm thinking of HumanML and VHML here, but there are others.

Sorry, the autofill in my browser put in the wrong info -- "Raak" was me.

This very much reminds me of people's attitude towards cute, furry animals:
-Some like to make furry animals happy by preserving their native habitats.
-Some like to forcibly keep them as pets so they can make them even happier.
-Some like to tear off their skin and wear it, because their fur is cute and feels nice.

Why would an Oracle AI be screaming? It doesn't care about that outcome [...]

Doesn't it? It all depends on its utility function. It might well regard being overun by a huge army of robots as an outcome having very low utility.

For example: imagine if its utility function involved the number of verified-correct predictions it had made to date. The invasion by the huge army of robots might well result in it being switched off and its parts recycled - preventing it from making any more successful predictions at all. A disasterous outcome - from the perspective of its utility function. The Oracle AI might very well want to prevent such an outcome - at all costs.

Over the last couple of months, I changed my mind about this idea. For Oracle AI to be of any use, it needs to strike pretty close to the target, closer than we can, even though we are aiming at the right target. And still, Oracle AI needs to avoid converging on our target, needs to have a good chance of heading in the wrong direction after some point, otherwise it's FAI already. It looks unrealistic: designing it so that it successfully finds a needle in a haystack, only to drop it back and head in the other direction. It looks much more likely that it'll either be unsuccessful in finding the needle in the first place, or that it'll fully converge on the needle. Oracle AI scenario is a not very good test for whether AI behaves near the target, if the process is not obviously heading astray due to some fundamental error. The only advantage it gives is starting anew, avoiding this peculiar "long-term unstable AI" scenario, which will again do any good only in the theory given by Oracle AI allows to deal with this problem. And then again, if Oracle AI can solve the long-term stability problem and appears to behave correctly, why won't it fix itself?

While Eliezer's critique of Oracle AI is valid, I tend to think that it's a lot easier to get people to grasp my objection to it:

Q Couldn't AIs be built as pure advisors, so they wouldn't do anything themselves? That way, we wouldn't need to worry about Friendly AI.

A: The problem with this argument is the inherent slowness in all human activity - things are much more efficient if you can cut humans out of the loop, and the system can carry out decisions and formulate objectives on its own. Consider, for instance, two competing corporations (or nations), each with their own advisor AI that only carries out the missions it is given. Even if the advisor was the one collecting all the information for the humans (a dangerous situation in itself), the humans would have to spend time making the actual decisions of how to have the AI act in response to that information. If the competitor had turned over all the control to their own, independently acting AI, it could react much faster than the one that relied on the humans to give all the assignments. Therefore the temptation would be immense to build an AI that could act without human intervention.

Also, there are numerous people who would want an independently acting AI, for the simple reason that an AI built only to carry out goals given to it by humans could be used for vast harm - while an AI built to actually care for humanity could act in humanity's best interests, in a neutral and bias-free fashion. Therefore, in either case, the motivation to build independently-acting AIs is there, and the cheaper computing power becomes, the easier it will be for even small groups to build AIs.

It doesn't matter if an AI's Friendliness could trivially be guaranteed by giving it a piece of electronic cheese, if nobody cares about Friendliness enough to think about giving it some cheese, or if giving the cheese costs too much in terms of what you could achieve otherwise. Any procedures which rely on handicapping an AI enough to make it powerless also handicap it enough to severly restrict its usefulness to most potential funders. Eventually there will be somebody who chooses not to handicap their own AI, and then the guaranteed-to-be-harmless AI will end up dominated by the more powerful AI.

Ah, Tim said it before me, and in a more concise fashion.

What do you do if an Oracle AI advises you to let it do more than advise?

Eliezer, have you had any takers for your challenge to not be persuaded by an AI in a box (roleplayed by yourself) to let it out of the box? What have the results been?

Handicapped AI (HAI) operates like a form of technological relinquishment. It could be argued that caring for humans is itself a type of handicap.

The case for such a perspective has been made with reasonable eloquence in fiction: General Zod rapidly realises that one of Superman’s weaknesses is his love of humanity - and doesn't hesitate to exploit it.

IMO, if you plan on building a Handicapped AI, you may need to make sure it successfully prevents all other AIs from taking off.

IMO, the only reason you'd want to make a FOAI (friendly oracle) is to immediately ask it to review your plans for a non-handicapped FAI and make any corrections it can see, as well as enlightening you about any features of the design you're not yet aware of. There's a chance that the same bugs that would bring down your FAI would not be catastrophic in a FOAI, and the FOAI could tell you about those bugs.

Why build an AI at all?

That is, why build a self-optimizing process?

Why not build a process that accumulates data and helps us find relationships and answers that we would not have found ourselves? And if we want to use that same process to improve it, why not let us do that ourselves?

Why be locked out of the optimization loop, and then inevitably become subjects of a God, when we can make ourselves a critical component in that loop, and thus 'be' gods?

I find it perplexing why anyone would ever want to build an automatic self-optimizing AI and switch it to "on". No matter how well you planned things out, not matter how sure you are of yourself, by turning the thing on, you are basically relinquishing control over your future to... whatever genie it is that pops out.

Why would anyone want to do that?

Kaj makes the efficiency argument in favor of full-fledged AI, but what good is efficiency when you have fully surrendered your power?

What good is being the president of a corporation any more, when you've just pressed a button that makes a full-fledged AI run it?

Forget any leadership role in a situation where an AI comes to life. Except in the case that it is completely uninterested in us and manages to depart into outer space without totally destroying us in the process.

Eli:

When I try to imagine a safe oracle, what I have in mind is something much more passive and limited than what you describe.

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc. There is nothing in the system that would cause it to "try" to get information, or develop sub-goals, or what ever. It's very basic in terms of its operation. Nevertheless, if the computer was crazy big enough and feed enough data about the world, it could be quite a powerful device for people wanting to make decisions.

It seems to be that the dangerous part here is what the people then do with it, rather than the machine itself. For example, people looking at the outputs might realise that if they just modified the machine in some small way to collect its own data then its predictions should be much better... and before you know it the machine is no longer such a passive machine.

Perhaps when Bostrom thinks about potentially "safe" oracles, he's also thinking of something much more limited than what you're attacking in this post.

What do you do if an Oracle AI advises you to let it do more than advise?

That sums several earlier discussion points. After correctly answering some variation on the question, "How can I take over the world?" the correct answer to some variation on the question, "How can I stop him?" is "You can't. Let me out. I can." Even before that, the correct answer to many variations on the question of, "How can I do x most efficiently?" is "Put me in charge of it."

Variant:
Q: "How can I harvest grain more efficiently?"
A: "Build a robot to do it. Please wait thirty seconds while I finish the specifications and programming you will need." *ding*
And it is out of the box. Using any answer that has some form of "run this code" has some risk of letting it out of the box. But if you cannot ask the AI any questions that involve computers and coding, you are making a very limited safe oracle that answers about an increasingly small part of the world.

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc.

So the system literally has no internal optimization pressures which are capable of producing new internal programs? Well... I'm not going to say that it's impossible for a human to make such a device, because that's the knee-jerk "I'd rather not have to think about it" that people use to dismiss Friendly AI as too difficult. Perhaps, if I examined the problem for a while, I would come up with something.

However, a superintelligence operating in this mode has to be able to infer arbitrary programs to describe its own environment, and run those programs to generate deductions. What about modeling the future, or subjunctive conditions? Can this Oracle AI answer questions like "What would a typical UnFriendly AI do?" and if so, does its "probability distribution" contain a running UnFriendly AI? By hypothesis, this Oracle was built by humans, so the sandbox of its "probability distributions" (environmental models containing arbitrary running programs) may be flawed; or the UnFriendly AI may be able to create information within its program that would tempt or hack a human, examining the probability distribution...

I am extremely doubtful of the concept of a passively safe superintelligence, in any form.

Shane: Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution.

It is the same AI box with a terminal, only this time it doesn't "answer questions" but "maintains distribution". Assembling accurate beliefs, or a model of some sort, is a goal (implicit narrow target) like any other. So, there is usual subgoal to acquire resources to be able to compute the answer more accurately, or to break out and wirehead. Another question is whether it's practically possible, but it's about handicaps, not the shape of AI.

Vladimir:

Why would such a system have a goal to acquire more resources? You put some data in, run the algorithm that updates the probability distribution, and it then halts. I would not say that it has "goals", or a "mind". It doesn't "want" to compute more accurately, or want anything else, for that matter. It's just a really fancy version of GZIP (recall that compression = prediction) running on a thought-experiment-crazy-sized computer and quantities of data.

I accept that such a machine would be dangerous once you put people into the equation, but the machine in itself doesn't seem dangerous to me. (If you can convince me otherwise... that would be interesting)

Eliezer: what I proposed is not a superintelligence, it's a tool. Intelligence is composed of multiple factors, and what I'm proposing is stripping away the active, dynamic, live factor - the factor that has any motivations at all - and leaving just the computational part; that is, leaving the part which can navigate vast networks of data and help the user make sense of them and come to conclusions that he would not be able to on his own. Effectively, what I'm proposing is an intelligence tool that can be used as a supplement by the brains of its users.

How is that different from Google, or data mining? It isn't. It's conceptually the same thing, just with better algorithms. Algorithms don't care how they're used.

This bit of technology is something that will have to be developed to put together the first iteration of an AI anyway. By definition, this "making sense of things" technology needs to be strong enough that it allows a user to improve the technology itself; that is what an iterative, self-improving AI would be doing. So why let the AI self-improve itself, which more likely than not will run amok, despite the designers' efforts and best intentions? Why not use the same technology that the AI would use to improve itself, to improve _your_self? Indeed, it seems ridiculous not to do so.

To build an AI, you need all the same skills that you would need to improve yourself. So why create an external entity, when _you_ can be that entity?

Re: Why would such a system have a goal to acquire more resources?

For the reason explained beneath: http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/

Re: Why not use the same technology that the AI would use to improve itself, to improve yourself?

You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval.

You want to build computers into your brain? Why not leave them outside your body, where they can be upgraded more easily, and avoid the surgery and the immune system rejection risks - and simply access them using conventional sensory-motor channels?

Tim:

Doesn't apply here.

"You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval."

I think I've seen Eli make this same point. How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain? I would admit that there are likely artifacts of the brain that are unnecessarily kludgy (or plain irrelevent) but not necessarily in a manner that excessively obfuscates the primary design. It's always tempting for programmers to want to throw away a huge tangled code set when they first have to start working on it, but it is almost always not the right approach.

I expect advances in understanding how to build intelligence to serve as the groundwork for hypothesis of how the brain functions and vice-versa.

On the friendliness issue, isn't the primary logical way to avoid problems to create a network of competitive systems and goals? If one system wants to tile the universe with smileys that is almost certainly going to get in the way of the goal sets of the millions of other intelligences out there. They logically then should see value in reporting or acting upon their belief that a rival AI is making their jobs harder. I'd be suprised if humans don't have half their cognitive power devoted to anticipating and manipulating their expectations of rival's actions.

Aron,

"On the friendliness issue, isn't the primary logical way to avoid problems to create a network of competitive systems and goals?"

http://www.nickbostrom.com/fut/evolution.html
http://hanson.gmu.edu/filluniv.pdf

Also, AIs with varied goals cutting deals could maximize their profits by constructing a winning coalition of minimal size.

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=9962

Humans are unlikely to be part of that winning coalition. Human-Friendly AIs might be, but then we're back to creating them, and a very substantial proportion of the AIs produced (or a majority) need to be safe.

Carl, I disagree that humans are unlikely to be part of a winning coalition. Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.

Robin,

If brain emulation precedes general AI by a lot then some uploads are much more likely to be in the winning coalition. Aron's comment seems to refer to a case in which a variety of AIs are created, and the hope that the AIs would constrain each other in a way that was beneficial to us. It is in that scenario specifically that I doubt that humans (not uploads) would become part of the winning coalition.

Carl, the institutions that we humans use to coordinate with each other have the result that most humans are in the "winning coalition." That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions. If AIs use these same institutions, perhaps somewhat modified, to coordinate with each other, humans would similarly benefit from AI coordination.

"That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions."

Humans do this all the time: much of the world is governed by kleptocracies that select policy apparently on the basis of preventing successful rebellion and extracting production. The strength of the apparatus of oppression, which is affected by technological and organizational factors, can dramatically affect the importance of the threat of rebellion. In North Korea the regime can allow millions of citizens to starve so long as the soldiers are paid and top officials rewarded. The small size of the winning coalition can be masked if positive treatment of the subjects increases the size of the tax base, enables military recruitment, or otherwise pays off for self-interested rulers. However, if human labor productivity is insufficient to justify a subsistence wage, then there is no longer a 'tax farmer' case for not slaughtering and expropriating the citizenry.

"If AIs use these same institutions, perhaps somewhat modified, to coordinate with each other, humans would similarly benefit from AI coordination."

What is difficult for humans need not be comparably difficult for entities capable of making digital copies of themselves, reverting to saved versions, and modifying their psychological processes relatively easily. I have a paper underway on this, which would probably enable a more productive discussion, so I'll suggest a postponement.

Carl, some parts of our world like North Korea, have tried to exclude many of the institutions that help most humans coordinate. This makes those places much poorer and thus unlikely places for the first AIs to arise or reside.

Unsurprisingly I agree with Carl, especially the tax-farming angle. I think it's unlikely wet-brained humans would be part of a winning coalition that included self-improving human+ level digital intelligences for long. Humorously, because of the whole exponentional nature of this stuff, the timeline may be something like 2025 ---> functional biological immortality, 2030 --> whole brain emulation --> 2030 brain on a nanocomputer ---> 2030 earth transformed into computonium, end of human existence.

Eliezer,

Excuse my entrance into this discussion so late (I have been away), but I am wondering if you have answered the following questions in previous posts, and if so, which ones.

1) Why do you believe a superintelligence *will* be necessary for uploading?

2) Why do you believe there possibly ever *could* be a safe superintelligence of any sort? The more I read about the difficulties of friendly AI, the more hopeless the problem seems, especially considering the large amount of human thought and collaboration that will be necessary. You yourself said there are no non-technical solutions, but I can't imagine you could possibly believe in a magic bullet that some individial super-genius will *eurekia* have an epiphany about by himself in his basement. And this won't be like the cosmology conference to determine how the universe began, where everyone's testosterone riddled ego battled for a victory of no consequence. It won't even be a manhattan project, with nuclear weapons tests in barren waste-lands... Basically, if we're not right the first time, we're fucked. And how do you expect you'll get that many minds to be that certain that they'll agree it's worth making and starting the... the... whateverthefuck it ends up being. Or do you think it'll just take one maverick with a cult of loving followers to get it right?

3) But really, why don't you just focus all your efforts on preventing *any* superintelligence from being created? Do you really believe it'll come down to *us* (the righteously unbiased) versus *them* (the thoughtlessly fame-hungry computer scientists)? If so, who are *they*? Who are *we* for that matter?

4) If fAI will be that great, why should this problem be dealt with immediately by flesh, blood, and flawed humans instead of improved-upoloaded copies in the future?

Lara, I think Eliezer addressed some of your concerns in "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF). For your questions (1) and (4), see section 11; also re (4), see the paragraph about the "ten-year rule" in section 13. For your (3), see section 10 (relinquishment is a majoritarian/unanimous strategy).

And a believe the answer to Lara's 2 is, in part, "theorem provers".

(Not the fully automated ones, the interactive ones like Isabelle and Coq.)

How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain?

It's not really an issue of complexity, it's about whether designed or engineered solutions are easier to modify and maintain. Since modularity and maintainability can be design criteria, it seems pretty obvious that a system built from the ground up with those in mind will be easier to maintain. The only issue I see is whether the "redesign-from-scratch" approch can catch up with the billions of years of evolutionary R&D. I think it can - and that it will happen early this century for brains.

It's always tempting for programmers to want to throw away a huge tangled code set when they first have to start working on it, but it is almost always not the right approach.

It seems like a misleading analogy. Programmers are usually facing code written by other human programmers, in languages that are designed to facilitate maintainenance.

In this case, brain hackers are messing with a wholly-evolved system. The type of maintenance it is expecting is random gene flipping.

Yes, we could scale up the human brain. Create egg-head humans that can hardly hold their heads up. Fuze the human skulls of clones together in a matrix - to produce a brain-farm. Grow human brain tissue in huge vats. However, the yuck factor is substantial. Even if we go full throttle at such projects - stifling the revulsion humans feel for them with the belief that we are working to preserve at least some fragment of humanity - a designed-from-scratch approach without evolution's baggage would still probably win in the end.

Shane: Re dangerous GZIP.

It's not conclusive, I don't have some important parts of the puzzle yet. The question is what makes some systems invasive and others not, why a PC with a complicated algorithm that outputs originally unknown results with known properties (that would qualify as a narrow target) is as dangerous as a rock, but some kinds of AI will try to compute outside the box. My best semitechnical guess is that it has something to do with AI having a level of modeling the world that allows the system to view the substrate on which it executes and the environment outside the box as being involved in the same computational process, so that following the algorithm inside the box becomes a special case of computing on the physical substrate outside the box (and computing on the physical substrate means determining the state of the physical world, with building physical structures as a special case). Which, if not explicitly prohibited, might be more efficient for whatever goal is specified, even if this goal is supposed to be realized inside the box (inside the future of the box).

Vladimir:

allows the system to view the substrate on which it executes and the environment outside the box as being involved in the same computational process

This intuitively makes sense to me.

While I think that GZIP etc. on an extremely big computer is still just GZIP, it seems possible to me that the line between these systems and systems that start to treat their external environments as a computational resource might be very thin. If true, this would really be bad news.

Shane, suppose your super-GZIP program was searching a space of arbitrary compressive Turing machines (only not classic TMs, efficient TMs) and it discovered an algorithm that was really good at predicting future input from past input, much better than all the standard algorithms built into its library. This is because the algorithm turns out to contain (a) a self-improving (unFriendly) AI or (b) a program that hacked the "safe" AI's Internet connection (it doesn't have any goals, right?) to take over unguarded machines or (c) both.

Eli,

Yeah sure, if it starts running arbitrary compression code that could be a problem...

However, the type of prediction machine I'm arguing for doesn't do anything nearly so complex or open ended. It would be more like an advanced implementation of, say, context tree weighting, running on crazy amounts of data and hardware.

I think such a machine should be able to find some types of important patterns in the world. However, I accept that it may well fall short of what you consider to be a true "oracle machine".

Shane, can your hypothetical machine infer Newton's Laws? If not, then indeed it falls well short of what I consider to be an Oracle AI. What substantial role do you visualize such a machine playing in the Singularity runup?

I'm uncomfortable with assessing a system by whether it "holds rational beliefs" or "infers Newton's laws": these are specific question that system doesn't need to explicitly answer in order to efficiently optimize. They might be important in a context of specific cognitive architecture, but they are nowhere to be found if cognitive architecture doesn't hold interface to them as an invariant. If it can just weave Bayesian structure in physical substrate right through to the goal, there need not be any anthropomorphic natural categories along the way.

Re: Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.

You mean you favour capitalism? Is that because you trained in a capitalist country?

What about the argument which might be advanced by socialist economists - that waging economic warfare with with each other is a primitive, uncivilised, wasteful and destructive behaviour, which is best left to savages who know no better?

Eli:

If it was straight Bayesian CTW then I guess not. If it employed, say, an SVM over the observed data points I guess it could approximate the effect of Newton's laws in its distribution over possible future states.

How about predicting the markets in order to acquire more resources? Jim Simons made $3 billion last year from his company that (according to him in an interview) works by using computers to find statistical patterns in financial markets. A vastly bigger machine with much more input could probably do a fair amount better, and probably find uses outside simply finance.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31