« The Hope Premium | Main | Awww, a Zebra »

September 30, 2008

Comments

Yadda yadda yadda, show us the code.

Yes, I'm kidding. Small typo/missing word, end of first paragraph.

Ugh, that was ugly. Fixed.

Eliezer,

In reading your posts the past couple days, I've had two reoccurring thoughts:

1. In Bayesian terms, how much have your gross past failures affected your confidence in your current thinking? On a side note - it's also interesting that someone who is as open to admitting failures as you are still writes in the style of someone who's never once before admitted a failure. I understand your desire to write with strength - but I'm not sure if it's always the most effective way to influence others.

2. It also seems that your definition of "intelligence" is narrowly tailored - yet your project of Friendly AI would appear to require a deep knowledge of multiple types of human intelligence. Perhaps I'm reading you wrong - but if your view of human intelligence is in fact this narrow, will this not be evident in the robots you one day create?

Just some thoughts.

Thanks again for taking the time to post.

Take care,

Cormac

I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.

I'm afraid this is still unclear to me. What do you mean by "supposed to do"? Socially expected to do? Think you have to do, based on clever rationalization?

"I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you."

You finally realized inanimate objects can't be negotiated with... and then continued with your attempt to rectify this obvious flaw in the universe :)

Nick, sounds like "supposed to do" means "everything you were taught to do in order to be a good [person/scientist/transhumanist/etc]". That would include things you've never consciously contemplated, assumptions you've never questioned because they were inculcated so early or subtly.

And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.

You can actually do what actually is the best possible course for you to take and reality can still kill you.
That is, you can do everything right and still get buried in shit.
All you can do is do your best and hope that cuts the odds against you enough for you to succeed.

It helps if you also work on making your best even better.

A useful, sobering reminder.

Eliezer, after you realized that attempting to build a Friendly AI is harder and more dangerous than you thought, how far did you back-track in your decision tree? Specifically, did it cause you to re-evaluate general Singularity strategies to see if AI is still the best route? You wrote the following on Dec 9 2002, but it's hard to tell whether it's before or after your "late 2002" realization.

I for one would like to see research organizations pursuing human
intelligence enhancement, and would be happy to offer all the ideas I
thought up for human enhancement when I was searching through general
Singularity strategies before specializing in AI, if anyone were willing
to cough up, oh, at least a hundred million dollars per year to get
started, and if there were some way to resolve all the legal problems with
the FDA.

Hence the Singularity Institute "for Artificial Intelligence". Humanity
is simply not paying enough attention to support human enhancement
projects at this time, and Moore's Law goes on ticking.

Aha, a light bulb just went off in my head. Eliezer did reevaluate, and this blog is his human enhancement project!

I am impressed. Finally...Growth! And in that I grow a little too...Sorry for not being patient with you, E.

Eli, sometimes I find it hard to understand what your position actually is. It seems to me that your position is:

1) Work out an extremely robust solution to the Friendly AI problem

Only once this has been done do we move on to:

2) Build a powerful AGI

Practically, I think this strategy is risky. In my opinion, if you try to solve Friendliness without having a concrete AGI design, you will probably miss some important things. Secondly, I think that solving Friendliness will take longer than building the first powerful AGI. Thus, if you do 1 before getting into 2, I think it's unlikely that you'll be first.

@Dynamically Linked: Eliezer did reevaluate, and this blog is his human enhancement project!

I suggested a similar opinion of the blog's role here 6 weeks ago, but EY subsequently denied it. Time will tell.

At the risk of sounding catty, I've just got to say that I desperately wish there were some kind of futures market robust enough for me to invest in the prospect of EY, or any group of which EY's philosophy has functional primacy, achieving AGI. The chance of this is, of course, zero. The entire approach is batty as hell, not because the idea of AGI is batty, but because the notion that you can sit around and think really, really hard, solve the problem, and then implement it -

Here's another thing that's ridiculous: "I'm going to write the Great American Novel. So I'm going to pay quiet attention my whole life, think about what novel I would write, and how I would write a novel, and then write it."

Except EY's AGI nonsense is actually for more nonsensical than that. In extremely rare cases novel writing DOES occur under such circumstances. But the idea that it is only by some great force of self-restraint that EY and co. desist from writing code, that they hold back the snarling and lunging dogs of their wisdom lest they set in motion a force that would destroy creation -

well. You can see what I think of it.

Here's a bit of advice, which perhaps you are rational enough to process: the entire field of AI researchers is not ignoring your ideas because it is, collectively, too dim to have achieved the series of revelations you have enumerated here at such length. Or because there's nothing in your thinking worth considering. And it's not because academia is somehow fundamentally incompatible with research notions so radical - this last is particularly a load of bollocks. No, it's because your methodology is misguided to the point of silliness and vague to the point of uselessness.

Fortunately for you, the great thing about occupying a position that is never put to the test, never produces anything that one can evaluate, is that one is not susceptible to public flogging, and dissent is reduced to little voices in dark sleepless hours.

And to "crackpots", of course.

Shane, unless you know that your plan leads to a good outcome, there is no point in getting there faster (and it applies to each step along the way). Outcompeting other risks only becomes relevant when you can provide a better outcome. If your plan says that you only launch an AGI when you know it's a FAI, you can't get there faster by omitting the FAI part. And if you do omit the FAI, you are just working for destruction, no point in getting there faster.

The amendment to your argument might say that you can get a crucial technical insight in the FAI while working on AGI. I agree with it, but work on AGI should remain a strict subgoal, neither in a "I'll fail at it anyway, but might learn something" sense, nor "I'll genuinely try to build an AGI", but as "I'll try to think about technical side of developing an AGI, in order to learn something". Like studying statistics, machine learning, information theory, computer science, cognitive science, evolutionary psychology, neuroscience, and so on, to develop understanding of the problem of FAI, you might study your own FAI-care-free ideas on AGI. This is dangerous, but might prove useful. I don't know how useful it is, but neither do I know how modern machine learning is useful for the same task, beyond basics. Thinking about AGI seems closer to the target than most of machine learning, but we learn machine learning anyway. The catch is that currently there is no meaningful science of AGI.

(My comment was directed to Shane Legg).

Shane [Legg], FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable. FAI research = Friendly-style AGI research. "Do the right thing" is not a module, it is the AI.

I've already worked out a handful of basic problems; noticed that AGIfolk want to go ahead without understanding even those; and they look like automatic killers to me. Meanwhile the AGIfolk say, "If you delay, someone else will take the prize!" I know reversed stupidity is not intelligence, but still, I think I can stand to learn from this.

You have to surpass that sheer blank wall, whose difficulty is not matched to your skills. An unalterable demand of Nature, which you cannot negotiate down. Though to be sure, if you try to shave off just a little (because everyone has to compromise now and then), Nature will not try to negotiate back up.

Until you can turn your back on your rivals and the ticking clock, blank them completely out of your mind, you will not be able to see what the problem itself is asking of you. In theory, you should be able to see both at the same time. In practice, you won't.

The sheer blank wall doesn't care how much time you have. It's just there. Pass-fail. You're not being graded on a curve. Don't have enough time? Too bad.

I think that solving Friendliness will take longer than building the first powerful AGI. Thus, if you do 1 before getting into 2, I think it's unlikely that you'll be first.

Who are you trying to negotiate with?

Cormac: In Bayesian terms, how much have your gross past failures affected your confidence in your current thinking?

Confidence is cheap. Pretending to unconfidence is equally cheap. Anyone can say they are certain and anyone can say they are uncertain.

My past failures have drastically affected the standards to which I hold an AI idea before I am willing to put my weight down on it. They've prevented me from writing code as yet. They've caused me to invest large amounts of time in better FAI theories, and more recently, in preparing for the possibility that someone else may have to take over from me. That's "affect". Confidence is cheap, and so is doubt.

On a side note - it's also interesting that someone who is as open to admitting failures as you are still writes in the style of someone who's never once before admitted a failure. I understand your desire to write with strength - but I'm not sure if it's always the most effective way to influence others.

...I think that's just the way I write.

Just as confidence is only a writing style, so too, it is cheap to write in a style of anguished doubt. It is just writing. If you can't see past my writing style that happens to sound confident, to ask "What is he doing?", then you will also not be able to see through writing that sounds self-doubtful, to ask "What are they doing?"

I'm going to write the Great American Novel. So I'm going to pay quiet attention my whole life, think about what novel I would write, and how I would write a novel, and then write it.

This approach sounds a lot better when you remember that writing a bad novel could destroy the world.

I second Vladimir.

I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating.
And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.
I knew I had to stop.
Halt, melt, and catch fire.
Say, "I'm not ready." Say, "I don't know how to do this yet.

I had to utter those words a few years ago, swallow my pride, drop the rat race - and inevitably my standard of living. I wasn't making progress that I could believe in, that I was willing to bet my entire future on.

An appropriate rebuttal to the "show me the code", "show me the math" -folk here pestering you about your lack of visible results. The real action happens in the brain. Rarely does one get a glimpse of it as thorough as the one you provide. In fact, so good you are in communicating the state of your nervous system that I'd bet my future on Eliezer2008 achieving or contributing something lasting and non-lethal than any other single individual in the field of AGI.

Shane E, meet Caledonian. Caledonian, Shane E.

Nick T - it's worse than that. You'd have to mathematically demonstrate that your novel was both completely American and infallibly Great before you could be sure it wouldn't destroy the world. The failure state of writing a good book is a lot bigger than the failure state of writing a good AI.

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

"This approach sounds a lot better when you remember that writing a bad novel could destroy the world."

The Bible? The Koran? The Communist Manifesto? Atlas Shrugged? A Fire Upon the Deep?

Your post reminds me of the early nuclear criticality accidents during the development of the atomic bomb. I wonder if, for those researchers, the fact that "nature is allowed to kill them" didn't really sink home until one accidentally put one brick too many on the pile.

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

From the Sometimes-Hard-Problems-Have-Simple-Solutions-Dept:
If you're so concerned... why don't you just implement a roll-back system to the AGI - if something goes wrong, you just roll back and continue as if nothing happened... or am I like missing something here?

There, perm ignore on. :)

Brandon: is there some meme or news making rounds as we speak because I read about criticality accidents only yesterday, having lived 10K+ days and now I see it mentioned again by you. I find this spookily improbable. And this isn't the first time. Once I downloaded something by accident, and decided to check it out, and found the same item in a random situation the next or a few days after that. And a few other "coincidences".

I bet it's a sim and they're having so much fun right now as I type this with my "free will".

Oh, man... criticality accident.... blue light, heat, taste of lead... what a way to go...

An appropriate rebuttal to the "show me the code", "show me the math" -folk here pestering you about your lack of visible results.

I'm not expecting to be shown AI code. I'm not even expecting to be shown a Friendliness implementation. But a formal definition of what 'Friendly' means seems to be a reasonable minimum requirement to take Eliezer's pronouncements seriously.

Alternatively, he could provide quantitative evidence for his reasoning regarding the dangers of AI design... or a quantitative discussion of how giving power to an AI is fundamentally different than giving power to humans when it comes to optimization.

Or a quantitative anything...

We are entering into a Pascal's Wager situation.

"Pascal's wager" is the argument that you should be Christian, because if you compute the expected value of being a Christian vs. of being an atheist, then for any finite positive probability that Christianity is correct, that finite probability multiplied by (infinite +utility minus infinite -utility) outweights the other side of the equation.

The similar Yudkowsky wager is the argument that you should be an FAIer, because the negative utility of destroying the universe outweighs the other side of the equation, whatever the probabilities are. It is not exactly analogous, unless you believe that the universe can support infinite computation (if it isn't destroyed), because the negative utility isn't actually infinite.

I feel that Pascal's wager is not a valid argument, but have a hard time articulating a response.

Phil: isn't it obvious? The flaws in Pascal's wager are the lack of strong justification for giving Christianity a significantly greater probability than anti-Christianity (in which only non-Christians are saved), and the considerable cost of a policy that makes you vulnerable to any parasitic meme claiming high utility. Neither is a problem for FAI.

Nature sounds a bit like a version of Rory Breaker from 'Lock, Stock and Two Smoking Barrels':

"If you hold back anything, I'll kill ya. If you bend the truth or I think your bending the truth, I'll kill ya. If you forget anything I'll kill ya. In fact, you're gonna have to work very hard to stay alive, Nick. Now do you understand everything I've said? Because if you don't, I'll kill ya. "

I think there is a well-understood, rather common phrase for the approach of "thinking about AGI issues and trying to understand them, because you don't feel you know enough to build an AGI yet."

This is quite simply "theoretical AI research" and it occupies a nontrivial percentage of the academic AI research community today.

Your (Eliezer's) motivations for pursuing theoretical rather than practical AGI research are a little different from usual -- but, the basic idea of trying to understand the issues theoretically, mathematically and conceptually before messing with code, is not terribly odd....

Personally I think both theoretical and practical AGI research are valuable, and I'm glad both are being pursued.

I'm a bit of a skeptic that big AGI breakthroughs are going to occur via theory alone, but, you never know ... history shows it is very hard to predict where a big discovery is going to come from.

And, hypothetically, let's suppose someone does come up with a big AGI breakthrough from a practical direction (like, say, oh, the OpenCogPrime team... ;-); then it will be very good that there exist individuals (like yourself) who have thought very deeply about the theoretical aspects of AGI, FAI and so forth ... you and other such individuals will be extremely well positioned to help guide thinking on the next practical steps after the breakthrough...

-- Ben G

Phil,

There are fairly quantifiable risks of human extinction, e.g. from dinosaur-killer asteroid impacts, for which there are clear paths to convert dollars to reduced extinction risk. If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks. The argument that "I should cut back on certain precautions because X is even more reckless/evil/confused and the marginal increase in my chance of beating X outweighs the worse expected outcome of my project succeeding first" is not wrong, arms races are nasty, but it goes wrong when it is used in a biased fashion.

Nature has rules, and Nature has conditions. Even behaving in perfect harmony with the rules doesn't guarantee you'll like the outcome, because you can never control all of the conditions.

Only theosophists imagine they can make the nature of reality bend to their will.

Eli,

FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable.

Ok, but this doesn't change my point: you're just one small group out of many around the world doing AI research, and you're trying to solve an even harder version of the problem while using fewer of the available methods. These factors alone make it unlikely that you'll be the ones to get there first. If this correct, then your work is unlikely to affect the future of humanity.


Valdimir,

Outcompeting other risks only becomes relevant when you can provide a better outcome.

Yes, but that might not be all that hard. Most AI researchers I talk to about AGI safety think the idea is nuts -- even the ones who believe that super intelligent machines will exist in a few decades. If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only *probably* kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered.

If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters. Provably safe theoretical AGI designs aren't going to matter much to us if we're already dead.

These factors alone make it unlikely that you'll be the ones to get there first. If this correct,

then we're all doomed.

Creating a Friendly AI is similar to taking your socks off when they're wet and wiggling your toes until dry. It's the best thing to do, but looks pretty silly, especially in public.

Back in 1993 my mom used to bake a good Singularity... lost the recipe and dementia got the best her... damn.

"Friendly AI"? It seems that we now have hundreds of posts on O.B. discussing "Friendly AI" - and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly - to *whom*? What does the term "Friendly" actually mean, if used in a technical context?

"Friendly AI"? It seems that we now have hundreds of posts on O.B. discussing "Friendly AI" - and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly - to whom? What does the term "Friendly" actually mean, if used in a technical context?

One really does wonder whether the topical collapse of American finance, systemic underestimation of risk, and overconfidence in being able to NEGOTIATE risk in the face of enormous complexity should figure into these conversations more than just a couple of sarcastic posts about short selling.

Couldn't Pascal's Wager-type reasoning be used to justify delaying any number of powerful technologies (and relatively unpowerful ones too -- after all, there's some non-zero chance that the water-wheel somehow leads directly to our downfall) until they were provably, 100% safe? And because that latter proposition is a virtual impossibility, wouldn't that mean we'd sit around doing nothing but meta-theorizing until some other heedless party simply went ahead and developed the technology anyway? Certainly being mindful of the risks inherent in new technologies is a good thing; just not sure that devoting excessive time to thinking about it, in lieu of actually creating it, is the smartest or most productive endeavor.

Like its homie, Singularity, FriendlyAI is growing old and wrinkly, startling allegations and revelations of its shady and irresponsible past are surfacing, its old friends long gone. I propose:
The Cuddly AI. Start the SingulariPartay!

"I need to beat my competitors" could be used as a bad excuse for taking unnecessary risks. But it is pretty important. Given that an AI you coded right now with your current incomplete knowledge of Friendliness theory is already more likely to be Friendly than that of some competitor who's never really considered the matter, you only have an incentive to keep researching Friendliness until the last possible moment when you're confident that you could still beat your competitors.

The question then becomes: what is the minimum necessary amount of Friendliness research at which point going full speed ahead has a better expected result than continuing your research? Since you've been researching for several years and sound like you don't have any plans to stop until you're absolutely satisfied, you must have a lot of contempt for all your competitors who are going full-speed ahead and could therefore be expected to beat you if any were your intellectual equals. I don't know your competitors and I wouldn't know enough AI to be able to judge them if I did, but I hope you're right.

If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks.
Asteroids don't lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren't quite comparable.

Still, your point only makes me wonder how we can justify not devoting 10% of GDP to deflecting asteroids. You say that we don't need to put all resources into preventing unfriendly AI, because we have other things to prevent. But why do anything productive? How do you compare the utility of preventing possible annihilation to the utility of improvements in life? Why put any effort into any of the mundane things that we put almost all of our efforts into? (Particularly if happiness is based on the derivative of, rather than absolute, quality of life. You can't really get happier, on average; but action can lead to destruction. Happiness is problematic as a value for transhumans.)

This sounds like a straw man, but it might not be. We might just not have reached (or acclimatized ourselves to) the complexity level at which the odds of self-annihilation should begin to dominate our actions. I suspect that the probability of self-annihilation increases with complexity. Rather like how the probability of an individual going mad may increase with their intelligence. (I don't think that frogs go insane as easily as humans do, though it would be hard to be sure.) Depending how this scales, it could mean that life is inherently doomed. But that would result in a universe where we were unlikely to encounter other intelligent life... uh...

It doesn't even need to scale that badly; if extinction events have a power law (they do), there are parameters for which a system can survive indefinitely, and very similar parameters for which it has a finite expected lifespan. Would be nice to know where we stand. The creation of AI is just one more point on this road of increasing complexity, which may lead inevitably to instability and destruction.

I suppose the only answer is to say that destruction is acceptable (and possibly inevitable); total area under the utility curve is what counts. Wanting an interesting world may be like deciding to smoke and drink and die young - and it may be the right decision. The AIs of the future may decide that dooming all life in the long run is worth it.

In short, the answer to "Eliezer's wager" may be that we have an irrational bias against destroying the universe.

But then, deciding what are acceptable risk levels in the next century depends on knowing more about cosmology, the end of the universe, and the total amount of computation that the universe is capable of.

I think that solving aging would change people's utility calculations in a way that would discount the future less, bringing them more in line with the "correct" utility computations.

Re. AI hell-worlds: SIAI should put "I have no mouth, and I must scream" by Harlan Ellison on its list of required reading.

Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only *probably* kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don't need a theory of FAI for the theory's sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. "Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?" Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it's not just wishful thinking?

You can't get FAI by hacking an AGI design at last minute, by performing "safety measures", adding a "Friendliness module", you shouldn't expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if "issues of safety are considered", you still almost certainly die. The target is too small. It's not obvious that the target is so small, and it's not obvious that you can't cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you "maximize" the chance of getting FAI out of it, you still loose. Nature doesn't care if you "maximized you chances" or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn't matter if it doesn't make you win. It doesn't mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can't expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can't argue that you'll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.

Provability is not about setting a standard that is too high, it is about knowing what you are doing -- like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you'll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as "provably correct", but given the absence of mathematical formulation of this problem in the first place, at best it's "almost certainly correct". Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn't stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn't good for anything.

You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won't just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn't really understand what you want of it and thus can't be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.

Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only *probably* kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don't need a theory of FAI for the theory's sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. "Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?" Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it's not just wishful thinking?

You can't get FAI by hacking an AGI design at last minute, by performing "safety measures", adding a "Friendliness module", you shouldn't expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if "issues of safety are considered", you still almost certainly die. The target is too small. It's not obvious that the target is so small, and it's not obvious that you can't cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you "maximize" the chance of getting FAI out of it, you still loose. Nature doesn't care if you "maximized you chances" or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn't matter if it doesn't make you win. It doesn't mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can't expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can't argue that you'll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.

Provability is not about setting a standard that is too high, it is about knowing what you are doing -- like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you'll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as "provably correct", but given the absence of mathematical formulation of this problem in the first place, at best it's "almost certainly correct". Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn't stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn't good for anything.

You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won't just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn't really understand what you want of it and thus can't be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.

[P.S. I had to repost it, original version had more links but was stopped by the filter.]

AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them.

For a moment, I read this as referring to Nature the Journal. "They are afraid of others solving the problem first, and they know that Nature is allowed to publish those results."

Eli, do you think you're so close to developing a fully functional AGI that one more step and you might set off a land mine? Somehow I don't believe you're that close.

There is something else to consider. An AGI will ultimately be a piece of software. If you're going to dedicate your life to talking about and ultimately writing a piece of software then you should have superb programming skills. You should code something.. anything.. just to learn to code. Your brain needs to swim in code. Even if none of that code ends up being useful the skill you gain will be. I have no doubt that you're a good philosopher and a good writer since I have read your blog but wether or not you're a good hacker is a complete mystery to me.

PK, I'm pretty sure Eliezer has spent hundreds, if not thousands of hours coding various things. (I've never looked at any of that code.) I don't know how much he's done in the past three years, though.

Eliezer,

How are you going to be 'sure' that there is no landmine when you decide to step?

Are you going to have many 'experts' check your work before you'll trust it? Who are these experts if you are occupying the highest intellectual orbital? How will you know they're not YesMen?

Even if you can predict the full effects of your code mathematically (something I find somewhat doubtful, given that you will be creating something more intelligent than we are, and thus its actions will be by nature unpredictable to man), how can you be certain that the hardware it will run on will perform with the integrity you need it to?

If you have something that is *changing* itself towards 'improvement,' than won't the dynamic nature of the program leave it open to errors that might have fatal consequences? I'm thinking of a digital version of genetic mutation in which your code is the DNA...

Like, lets say the superintelligence invents some sort of "Code shuffling" mechanism for itself whereby it can generate many new useful functions in an expedited evolutionary manner (Like we generate antibodies) but in the process accidentally does something disasterous.

The argument, 'it would be too intelligent and well intentioned to do that, doesn't seem to cut it, because the machine will be evolving from something of below human intelligence into something above, and it is not certain what types of intelligence it will evolve faster, or what trajectory this 'general' intelligence will take. If we knew that, then we could program the intelligence directly and not need to make it recursively self-improving.

For those complaining about references to terms not defined within the Overcoming Bias sequence, see:

Coherent Extrapolated Volition (what does a "Friendly" AI do?)
KnowabilityOfFAI (why it looks theoretically possible to specify the goal system of a self-modifying AI; I plan to post from this old draft document into Overcoming Bias and thereby finish it, so you needn't read the old version right now, unless demand immediate answers).

@Vladimir Nesov: Good reply, I read it and wondered "Who's channeling me?" before I got to the byline.

@Shane Legg: After studying FAI for a few years so that you actually have some idea of what the challenges are, and how many automatic failures are built into the problem, and seeing people say "We'll just go full steam ahead and try out best", and knowing that these people are not almost at the goal but ten lightyears short of it; then you learn to blank rivals out of your mind, and concentrate on the wall. That's the only way you can see the wall, at all.

@Ben Goertzel: Who is there that says "I am working on Artificial General Intelligence?" and is doing theoretical research? AFAICT there's plenty of theoretical research on AI, but it's by people who no longer see themselves as coding an AGI at the end of it - it just means you're working on narrow AI now.

@Yvain: To first order and generalizing from one data point, figure that Eliezer_2000 is demonstrably as smart and as knowledgeable as you can possibly get while still being stupid enough to try and charge full steam ahead into Unfriendly AI. Figure that Eliezer_2002 is as high as it gets before you spontaneously stop trying to build low-precision Friendly AI. Both of these are smart enough to be dangerous and not smart enough to be helpful, but they were highly unstable in terms of how long they stayed that way; Eliezer_2002 had less than four months left on his clock when he finished "Levels of Organization in General Intelligence". I would not be intimidated by either of them into giving up, even though they're taking holding themselves to much lower standards. They will charge ahead taking the quick and easy and vague and imprecise and wasteful and excruciatingly frustrating path. That's going to burn up a lot of their time.

Those of AGI who stay in suicidal states, for years, even when I push on them externally, I find even less intimidating than the prospect of going up against an Eliezer_2002 who permanently stayed bounded at the highest suicidal level.

An AGI wannabe could theoretically have a different intellectual makeup that allows them to get farther and be more dangerous than Eliezer_2002, without passing the Schwarzchild bound and collapsing into an FAI programmer; but I see no evidence that this has ever actually happened.

To put it briefly: There really is an upper bound on how smart you can be, and still be that stupid.

So the state of the gameboard is not good, but the day is not already lost. You draw a line with all the sloppy suicides on one side, and those who slow down for precise standards on the other, and you hope that no one sufficiently intelligent + knowledgeable can stay on the wrong side of the line for long.

That's the last thread on which our doom now hangs.

I too thought Nesov's comment was written by Eliezer.

"This approach sounds a lot better when you remember that writing a bad novel could destroy the world."

"we're all doomed."

You're not doomed, so shut up. Don't buy in to the lies of these doomsayers - the first AI to be turned on is not going to destroy the world. Even the first strong AI won't be able to do that.

Eliezer's arguments make sense if you literally have an AGI trying to maximize paperclips (or smiles, etc.), one which is smarter than a few hundred million humans. Oh, and it has unlimited physical resources. Nobody who is smart enough to make an AI is dumb enough to make one like this.

Secondly, for Eliezer's arguments to make sense and be appealing, you have to be capable of a ridiculous amount of human hubris. We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.

"Asteroids don't lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren't quite comparable."

Where did you get the idea that something like this is possible? The universe was stable enough 8 billion years ago to allow for life. Human civilization has been around for about 10,000. The galaxy is about 100,000 light years in diameter. Consider these facts. If such a thing as AGI-gone-wrong-turning-the-entire-light-cone-into-paperclips were possible, or probable, it's overwhelmingly likely that we would already some aliens' version of a paperclip by now.

Nobody who is smart enough to make an AI is dumb enough to make one like this.

Accidents happen.
CFAI 3.2.6: The Riemann Hypothesis Catastrophe
CFAI 3.4: Why structure matters
Comment by Michael Vassar
The Hidden Complexity of Wishes
Qualitative Strategies of Friendliness
(...and many more)

We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.
You'd actually prefer it wipe us out, or marginalize us? Hmph. CFAI: Beyond the adversarial attitude Besides, an unFriendly AI isn't necessarily going to do anything more interesting or worthwhile than paperclipping. Nick Bostrom: The Future of Human Evolution Michael Wilson: Normative Reasoning: A Siren Song? The Design Space of Minds-in-General Anthropomorphic Optimism
If such a thing as AGI-gone-wrong-turning-the-entire-light-cone-into-paperclips were possible, or probable, it's overwhelmingly likely that we would already some aliens' version of a paperclip by now.
Not if aliens are extremely rare.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31