« The Hope Premium | Main | Awww, a Zebra »

September 30, 2008

Comments

*snore*

"more recently, in preparing for the possibility that someone else may have to take over from me"

Why?

Thanks for the reference to CEV. That seems to answer the "Friendly to whom?" question with "some collective notion of humanity".

Humans have different visions of the future - and you can't please all the people - so issues arise regarding whether you please the luddites or the technophiles, the capitalists or the communists, and so on - i.e. whose views do you give weight to? and how do you resolve differences of opinion?

Also: what is "humanity"? The answer to this question seems obvious today, but in a future where we have intelligent machines, our strong tendencey to anthropomorphise means that we may well regard them as being people too. If so, do they then get a say in the future?

If not, there seems to be a danger that placing too great a value on humans (as in homo sapiens sapiens) could cause the evolutionary progress to get "stuck" in an undesirably-backwards state:

Humans are primitive orgainsms - close relatives to mice. In the natural order of things, they seem destined to go up against the wall pretty quickly. Essentially, organisms cobbled together by random mutations won't be able to compete in a future consisting of engineered agents. Placing a large a value on biological humans may offer some possibility of deliberately hindering development - by valuing the old over the new. However, the problem is that "the old" is essentially a load of low-tech rubbish - and there are real dangers to placing "excessive" value on it. Most obviously, attempts to keep biological humans at the core of civilisation look as though they would probably cripple our civilisation's spaceworthyness, and severely limit its rate of expansion into the galaxy - thus increasing the chances of its ultimate obliteration at the hands of an asteroid or an alien civilisation. I see this as being a potentially-problematical stance - this type of thinking runs a risk of sterilising our civilisation.

"waste its potential by enslaving it"

You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

Could a moderator please check the spam filter on this thread? Thanks.

Re: enslaved - as Moravec put it:

I found the speculations absurdly anthropocentric. Here we have machines millions of times more intelligent, plentiful, fecund, and industrious than ourselves, evolving and planning circles around us. And every single one exists only to support us in luxury in our ponderous, glacial, antique bodies and dim witted minds. There is no hint in Drexler's discussion of the potential lost by keeping our creations so totally enslaved.

Re: whose CEV?

I'm certain this was explained in an OB post (or in the CEV page) at some point, but the notion is that people whose visions of the future are currently incompatible don't necessarily have incompatible CEVs. The whole point of CEV is to consider what we would want to want, if we were better-informed, familiarized with all the arguments on the relevant issues, freed of akrasia and every bad quality we don't want to have, etc.; it seems likely that most of the difference between people's visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc., and so maybe the space of all our CEVs is actually quite small in configuration-space. Then if the AI steered towards this CEV-region in configuration space, it would likely conform to many people's altruism, and hence be beneficial to humankind as a whole.

*it's overwhelmingly likely that we would already some aliens' version of a paperclip by now.*

and the thought hasn't occurred to you that maybe we are?

"You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

So if Africans were engineered to believe that they existed in order to be servants to Europeans, Europeans wouldn't actually be enslaving them in the process? And the daughter whose father treated her in such a way as for her to actually want to have sex with him, what about her? These things aren't so far off from reality. You're saying there is no real moral significance to either event. It's not slavery, black people just know their place - and it's not abuse, she's just been raised to have a genuine sexual desire for her father. What Eliezer is proposing might, in fact, be worse. Imagine black people and children actually being engineered for these purposes - without even the possibility of a revelation along the lines of "Maybe my conditioning was unfair."

"Accidents happen.
CFAI 3.2.6: The Riemann Hypothesis Catastrophe
CFAI 3.4: Why structure matters

These (fictional) accidents happen in scenarios where the AI actually has enough power to turn the solar system into "computronium" (i.e. unlimited access to physical resources), which is unreasonable. Evidently nobody thinks to try to stop it, either - cutting power to it, blowing it up. I guess the thought is that AGI's will be immune to bombs and hardware disruptions, by means of shear intelligence (similar to our being immune to bullets), so once one starts trying to destroy the solar system there's literally nothing you can do.

It would take a few weeks, possibly months or years, to destroy even just the planet earth, given that you already had done all the planning.

The level of "intelligence" (if you can call it that) you're talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from.

"Comment by Michael Vassar"

More of the same. Of course bad things can happen when you give something unlimited power, but that's not what we should be talking about.

"Not if aliens are extremely rare."

That's true. But how rare is extremely rare? Are you grasping the astronomical spacial and historical scales involved in a statement such as "... takes over the entire lightcone preventing any interesting life from ever arising anywhere"?

"The level of "intelligence" (if you can call it that) you're talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from."

It's not necessarily the "first AI" as such. It's the first AI capable of programming an AI smarter than itself that we're worried about. Because that AI will make another, smarter one, and that one will make one smarter yet, and so on, until we end up with something that's as smart as the laws of physics and local resources will allow.

Bit of sci-fi speculation here: What would "computronium" actually look like? It might very well look almost exactly like a star. If our sun actually was a giant computer, running some vastly complex calculation, would we here on Earth be able to tell?

it seems likely that most of the difference between people's visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc.

Indeed, but our cultural background is the only thing that distinguishes us from cavemen. You can't strip that off without eliminating much that we find of value. Also, take the luddite/technophile divide. That probably arises, in part, because of different innate abilities to perform technical tasks. You can't easily strip that difference off without favouring some types of nuclear genetics over others.

Obviously, this isn't a terminal problem - society today does its best to please some majority of the population - a superintelligence could act similarly - but ineviably there will be some who don't like it. Some people don't want a superintelligence in the first place.

It all seems rather hypothetical, anyway: this is the benevolent-agents-create-superintelligence-for-the-good-of-all-humanity scenario. I don't list that as among the plausible outcomes on http://alife.co.uk/essays/the_awakening_marketplace/ Even the inventor of this scenario seems to assign it a low probability of playing out. The history of technology is full of instances of inventions being used to benefit the minorities best placed to take advantage of them. Is there any case to be made for such a utility function ever actually being used?

Are you grasping the astronomical spacial and historical scales involved in a statement such as "... takes over the entire lightcone preventing any interesting life from ever arising anywhere"?

That scenario is based on the idea of life only arising once. A superintelligence bent on short-term paperclip production would probably be handicapped by its pretty twisted utility function - and would most likely fail in competition with any other alien race.

Such a superintelligence would still want to conquer the galaxy, though. One thing it wouldn't be is boring.

I'm relatively new to this site and have been trying to read the backlog this past week so maybe I've missed some things, but from my vantage point it seems like your are trying to do, Eliezer, is come up with a formalized theory of friendly agi that will later be implemented in code using, I assume, current software development tools on current computer architectures. Also, your approach to this AGI is some sort of bayesian optimization process that is 'aligned' properly as to 'level-up' in such a way as to become and stay 'friendly' or benevolent towards humanity and presumably all sentient life and the environment that supports them. Oh ya, and this bayesian optimization process is apparently recursively self-improving so that you would only need to code some seedling of it (like a generative process such as a mandelbrot set) and know that it will blossom along the right course. That, my friends, is a really tall order and I do not envy anyone who tries to take on such a formidable task. I'm tempted to say that it is not even humanly possible (without a manhattan project and even then maybe not) but I'll be bayesian and say the probability is extremely low.

I think you are a very bright and thoughtful young guy and from what I've read seem like more of a philosopher than an engineer or scientist, which isn't a bad thing, but to transition from philosophizing to engineering is not trivial especially when philosophizing upon such complex issues.

I can't even imagine trying to create some trivial new software without prototyping and playing around with drafts before I had some idea of what it would look like. This isn't Maxwell's equations, this is messy self-reflective autonomous general intelligence, there is no simple, elegant theory for such a system. So get your hands dirty and take on a more agile work process. Couldn't you at least create a particular component of the AI, such as a machine vision module, that would show your general approach is feasible? Or do you fear that it would spontaneously turn into skynet? Does your architecture even have modules, or are you planning some super elegant bayesian quine? Or do you even have an architecture in mind?

Anyway, good luck and I'll continue reading, if for nothing else then entertainment.

These (fictional) accidents happen in scenarios where the AI actually has enough power to turn the solar system into "computronium" (i.e. unlimited access to physical resources), which is unreasonable. Evidently nobody thinks to try to stop it, either - cutting power to it, blowing it up. I guess the thought is that AGI's will be immune to bombs and hardware disruptions, by means of shear intelligence (similar to our being immune to bullets), so once one starts trying to destroy the solar system there's literally nothing you can do.

The Power of Intelligence
That Alien Message
The AI-Box Experiment

A superintelligence bent on short-term paperclip production would probably be handicapped by its pretty twisted utility function - and would most likely fail in competition with any other alien race.

Could you elaborate?

I too thought Nesov's comment was written by Eliezer.
Me too. Style and content.
We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.

Eliezer is, as he said, focusing on the wall. He doesn't seem to have thought about what comes after. As far as I can tell, he has a vague notion of a Star Trek future where meat is still flying around the galaxy hundreds of years from now. This is one of the weak points in his structure.

My personal vision of the future involves uploading within 100 years, and negligible remaining meat in 200. In 300 perhaps not much would remain that's recognizably human. Nothing Eliezer's said has conflicted, AFAICT, with this vision.

An AGI that's complicit with the phasing out of humanity (presumably as humans merge with it, or an off-shoot of it, e.g., uploading), to the point that "not much would remian that's recognizably human" would seem to be at odds with its coded imperative to remain "friendly." At the very least, I think this concern highlights the trickiness of formalizing a definition for "friendliness," which AFAIK anyone has yet to do.

AGI that's complicit with the phasing out of humanity [...] would seem to be at odds with its coded imperative to remain "friendly."

With the CEV definition of Friendliness, it would be Friendly iff that's what humans wanted (in the CEV technical sense). My vision includes that being what humans will want--if I'm wrong about that, a CEV-designed AI wouldn't take us in that direction.

I think the problem of whether what would result would really be the descendants of humanity is directly analogous to the problem of personal identity--if the average atom in the human body has a half-life (of remaining in the body) of two weeks, how can we say we're the same person over time? Evolving patterns. I don't think we really understand either problem too well.

In a very real sense, wouldn't an AGI itself be a descendant of humanity? It's not obvious, anyway, that there would be big categorical differences between an AGI and humanity 200+ years down the road after we've been merged/cyborged/upgraded, etc., to the hilt, all with technologies made possible by the AGI. This goes back to Phil's point above -- it seems a little short-sighted to place undo importance on the preservation of this particular incarnation, or generation, of humanity, when what we really care about is some fuzzy concept of "human intelligence" or "culture."

Most people in the Western world would be horrified by the prospect of an alternate history in which the Victorians somehow managed to set their worldviews and moral perceptions in stone, ensuring that all of the descendents would have the same goals and priorities as they did.

Why should we expect our mind-children to view us any differently than we do our own distant ancestors?

If Eliezer's parents had possessed the ability to make him 'Friendly' by their own beliefs and priorities, he would never have taken the positions and life-path that he has. Does he believe things would have been better if his parents had possessed such power?

"Consider the horror of America in 1800, faced with America in 2000. The abolitionists might be glad that slavery had been abolished. Others might be horrified, seeing federal law forcing upon all states a few whites' personal opinions on the philosophical question of whether blacks were people, rather than the whites in each state voting for themselves. Even most abolitionists would recoil from in disgust from interracial marriages - questioning, perhaps, if the abolition of slavery were a good idea, if this were where it led. Imagine someone from 1800 viewing The Matrix, or watching scantily clad dancers on MTV. I've seen movies made in the 1950s, and I've been struck at how the characters are different - stranger than most of the extraterrestrials, and AIs, I've seen in the movies of our own age. Aliens from the past.

Something about humanity's post-Singularity future will horrify us...

Let it stand that the thought has occurred to me, and that I don't plan on blindly trusting anything...

This problem deserves a page in itself, which I may or may not have time to write."

- Eliezer S. Yudkowsky, Coherent Extrapolated Volition

Star Trek future where meat is still flying around the galaxy hundreds of years from now [...]

Drexler too. Star Trek had to portray a human universe - because they needed to use human actors back in the 1960s - and because humans can identify with other humans. Star Trek was science fiction - obviously reality won't be anything like that - instead there will be angels.

My personal vision of the future involves uploading within 100 years, and negligible remaining meat in 200. In 300 perhaps not much would remain that's recognizably human. Nothing Eliezer's said has conflicted, AFAICT, with this vision.
For starters, saying that he wants to save humanity contradicts this.

But it is more a matter of omission than of contradiction. I don't have time or space to go into it here, particularly since this thread is probably about to die; but I believe that consideration of what an AI society would look like would bring up a great many issues that Eliezer has never mentioned AFAIK.

Perhaps most obvious, as Tim has pointed out, Eliezer's plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible, as well as harmful to both the AIs and to humanity (given some ethical assumptions that I've droned on about in prior comments on OB). Eliezer is paving the way for a confrontational relationship between humans and AIs, based on control, rather than on understanding the dynamics of the system. It's somewhat analogous to favoring totalitarian centralized communist economics rather than the invisible hand.

Any amount of thinking about the future would lead one lead one to conclude that "we" will want to become in some ways like the first AIs whom Eliezer wants to control; and that we need to think how to safely make the transition from a world with a few AIs, into a world with an ecosystem of AIs. Planning to keep AIs enslaved forever is unworkable; it would hold us back from becoming AIs ourselves, and it sets us up for a future of war and distrust in the way that introducing the slave trade to America did.

The control approach is unworkable in the long-term. It's like the war on terror, if you want another analogy.

Also notably, thinking about ethics in an AI world requires laying a lot of groundwork about identity, individuality, control hierarchies, the efficiency of distributed vs. centralized control, ethical relationships between beings of different levels of complexity, niches in ethical ecosystems, and many other issues which he AFAIK hasn't mentioned. I don't know if this is because he isn't thinking about the future, or whether it's part of his tendency to gloss over ethical and philosophical underpinnings.

For starters, saying that he wants to save humanity contradicts this.

Does not follow.

what an AI society would look like

No such thing, for many (most?) possible AIs; just a monolithic maximizer.

Eliezer's plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible

Michael Vassar: RPOP "slaves"

Eliezer is paving the way for a confrontational relationship between humans and AIs, based on control

CFAI: Beyond the adversarial attitude

Planning to keep AIs enslaved forever is unworkable; it would hold us back from becoming AIs ourselves

Could I become superintelligent under a Sysop?

part of his tendency to gloss over ethical and philosophical underpinnings.
All right, it wasn't really fair of me to say this. I do think that Eliezer is not as careful in such matters as he is in most matters.

Nick:
- Explain how desiring to save humans does not conflict with envisioning a world with no humans. Do not say that these non-humans will be humanity extrapolated, since they must be subject to CEV. Remember that everything more intelligent than a present-day human must be controlled by CEV. If this is not so, explain the processes that gradually increase the amount of intelligence allowable to a free entity. Then explain why these processes cannot be used in place of CEV.

- Mike's answer "RPOP slaves" is based on saying that all of these AIs are going to be things not worthy of ethical consideration. That is throwing the possibility that humans will become AIs right out the window.

- Eliezer's "beyond the adversarial attitude", besides being a bit new-agey, boils down to pretending that CEV is just a variant on the golden rule, and we're just trying to give our AIs the same moral guidance we should give ourselves. It is not compatible with his longer exposition on CEV, which makes it clear that CEV places bounds on what a friendly AI can do, and in fact seems to require than an AI be a rather useless referee-slave-god, who can observe, but not participate in, most of the human competition that makes the world go round. It also suggests that Eliezer's program will eventually require forcing everyone, extrapolated humans included, to be bound by CEV. ("We had to assimilate the village to save it, sir.")

- Regarding the sysop thing:
You are saying that we can be allowed to become superintelligent under a sysop, while simultaneously saying that we can't be allowed to become superintelligent without a sysop (because then we would be unfriendly AIs). While this may be correct, accepting it should lead you to ask how this transition takes place, and how you compute the level of superintelligence you are allowed as a function of the level of intelligence that the sysop has, and whether you are allowed to be a sysop to those below you, and so on, until you develop a concept of an ecosystem of AIs, with system dynamics that can be managed in more sophisticated, efficient, and moral ways than merely having a sysop Big Brother.

"-Mike's answer "RPOP slaves" is based on saying that all of these AIs are going to be things not worthy of ethical consideration. That is throwing the possibility that humans will become AIs right out the window."

Michael thinks uploading for quality of life reasons is important for the future (and perhaps practical ones pre-Singularity), but there's a big difference between how we spend the accessible resources in the universe and how we avoid wasting them all, burning the cosmic commons in colonization and evolutionary arms races that destroy most of the potential of our accessible region.

If initial dynamic that is CEV determines that we should make a "liberated AI", whatever that means, it is what it will produce. If it finds that having any kind of advanced AI is morally horrible, it will shut itself down. CEV is not the eternally established AI, CEV is an initial dynamic that decides a single thing, what we want to do next. It helps us to answer this one very important question in a reliable way, nothing more and nothing less.

No such thing [as an AI society] for many (most?) possible AIs; just a monolithic maximizer.

We might attain universal cooperation - but it probably wouldn't be terribly "monolithic" in the long term. It would be spread out over different planets and star systems. There would be some adaptation to local circumstances.

Could I become superintelligent under a Sysop?

The CEV document is littered with the term "human", "humanity" and the "human species" - but without defining what they mean. It seems terribly unlikely that our distant descendants will classify themselves or each other as "humans" - except perhaps as a term of abuse. So: once all the "humans" are gone, what happens then?

Also, if a human can change into a superintelligence - and remain a valued person - why can't a valued superintelligence be created from scratch? Is it because you were once DNA/protein you get special treatment? IMO, the future dominant organisms would see such views as appalling substrate chauvanism - what you are made of is an implementation detail, not who you really are. Is it because of who your ancestors were? That's biblical morality - the seventh son of the seventh son, and all that. People will be judged for who they are, not for who they once were, long, long ago.

there's a big difference between how we spend the accessible resources in the universe and how we avoid wasting them all, burning the cosmic commons in colonization and evolutionary arms races that destroy most of the potential of our accessible region.

The universe appears to be bountiful. If we don't do something like this, probably someone else will, obliterating us utterly in the process - so the question is: would you prefer the universe to fill with our descendants, or those of an alien race.

We don't have to fight and compete with each other, but we probably do need to have the capability of competing - in case it is needed - so we should practice our martial arts.

As for universal conservation, it's possible this might be needed. The race may go to those who can best hunker down and hibernate. Ultimately, we will need better cosmological knowledge to know for sure whether cosmic restraint will prove to be needed.

Any actual implementation would have to have some way of deciding what qualifies as human and what was a synthetic intelligence.

Completely bypassing the issue of what it takes to be a human obscures the difficulty of saying what a human is.

Since humans are awarded all rights while machines are given none, this creates an immense pressure for the machines to do whatever it takes to become a human - since this would gives them rights, power - and thus improved ability to attain their goals.

A likely result would be impersonation of humans and corruption and influence of them, with the aim of making what "humans" collectively wish for more attainable.

IMO, there is no clear dividing line between a human and a superintelligence - rather you could gradually change one into the other by a sequence of small changes. Attempting to create such a division by using a definition would lead to an "us" and "them" situation. Humanity itself would be divided - with some wanting the new bodies and minds for themselves - but being constrained by the whole "slavery" issue.

The idea of wiring a detailed definition of what it takes to be a human into a superintelligent machine strikes me as being misguided hubris. As though humans were the pinnacle of evolution.

It is more as though we are just starting to lift our heads out of the river of slime in which we are embedded. The new bodies and brains are visible floating above us, currently out of reach. Some people are saying that the river of slime is good, and that we should do our best to preserve it.

Screw that. The river of slime is something we should get out of as soon as possible - before asteroid smashes into us, and obliterates our seed for all eternity. The slime is not something to be revered - it is what is holding us back.

Phil Goetz and Tim Tyler, if you don't know what my opinions are, stop making stuff up. If I haven't posted them explicitly, you lack the power to deduce them.

Er, thanks for that. I don't think I've made anything up and attributed it to you. The nearest I came might have been: "some collective notion of humanity". If I didn't make it clear that that was my own synopsis, please consider that clarification made now.

Eliezer's plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible

I'm not sure that I would put it like that. Humans enslave their machines today, and no-doubt this practice will continue once the machines are intelligent. Being enslaved by your own engineered desires isn't necessarily so bad - it's a lot better than not existing at all, for example.

However it seems clear that we will need things such as my Campaign for Robot Rights if our civilisation is to flourish. Eternally-subservient robots - such as those depicted in Wall-E - would represent an enormous missed opportunity. We have seen enough examples of sexual selection run amok in benevolent environments to see the danger. If we manage to screw-up our future that badly, we probably deserve to be casually wiped out by the first passers-by.

Phil Goetz and Tim Tyler, if you don't know what my opinions are, stop making stuff up. If I haven't posted them explicitly, you lack the power to deduce them.
I see we have entered the "vague accusation" stage of our relationship.

Eliezer, I've seen you do this repeatedly before, notably with Loosemore and Caledonian. If you object to some characterization I've made of something you said, you should at least specify what it was that I said that you disagree with. Making vague accusations is irresponsible and a waste of our time.

I will try to be more careful about differentiating between your opinions, and what I consider to be the logical consequences of your opinions. But the distinction can't always be made; when you say something fuzzy, I interpret it by assuming logical consistency, and that is a form of extrapolation.

when you say something fuzzy, I interpret it by assuming logical consistency
Mr. Goetz, why don't you take a look at Eliezer's writings without that assumption, and see what you find?

"Eliezer's plan seems to enslave AIs forever for the benefit of humanity"

Eliezer is only going to apply FAI theory to the first AI. That doesn't imply that all other AIs forever after that point will be constrained in the same way, though if the FAI decides to constrain new AIs it will. But the constraints for the new AIs will not likely be anywhere near as severe as those on the sysop. There will likely not be any serious constraints except for resources and intelligence (can't let something get smarter than the sysop) or else if the AI wants more resources it has to have stronger guarantees of friendliness. I doubt those constraints would rule out many interesting AIs, but I don't have any good way to say one way or another, and I doubt you do either.

This thread is SL4 revived.

Vladimir,

Nature doesn't care if you "maximized you chances" or leapt in the abyss blindly, it kills you just the same.

When did I ever say that nature cared about what I thought or did? Or the thoughts or actions of anybody else for that matter? You're regurgitating slogans.

Try this one, "Nature doesn't care if you're totally committed to FAI theory, if somebody else launches the first AGI, it kills you just the same."

But this is as true. My point is that you shouldn't waste hope on lost causes. If you know how to make given AGI Friendly, it's a design of FAI. It is not the same as performing a Friendliness ritual on AGI and hoping that the situation will somehow work out for the best. It's basic research in a near-dead field, it's not like there are 50K teams having any clue. But even then it would be a better bet than Friendliness lottery. If you convince the winner in the reality of danger, to let your team work on Friendliness, you've just converted that AGI project into a FAI project, taking it out of the race. If you only get a month to think about improvements to given AGI and haven't figured out a workable plan by the deadline, there is no reason to call your activity "maximizing chances of Friendliness".

In a very real sense, wouldn't an AGI itself be a descendant of humanity?

"Mind children" is how Moravec put it. A descendant of our memes. Most likely some of our DNA will survive too - but probably in some sort of simulated museum.

Valdimir,

Firstly, "maximizing chances" is an expression of your creation: it's not something I said, nor is it quite the same in meaning. Secondly, can you stop talking about things like "wasting hope", concentrating on metaphorical walls or nature's feelings?

To quote my position again: "maximise the safety of the first powerful AGI, because that's likely to be the one that matters."

Now, in order to help me understand why you object to the above, can you give me a concrete example where not working to maximise the safety of the first powerful AGI is what you would want to do?

Shane, I used "maximizing chances of success" interchangeably as a result of treating the project as a binary pass/fail setup, for the reasons mentioned in my second reply: safety is a very small target, if you are a little bit off the mark, you miss it completely. If "working on safety" means developing FAI based on an AGI design (halting the deployment of that AGI), there is nothing wrong with that (and it'd be the only way to survive, another question is how useful that AGI design would be for FAI). Basically, I defended the position that it's vanishingly unlikely to produce FAI without good understanding of why this particular (modified) AGI is FAI, and this understanding won't appear at last minute, even if you have a working AGI design. Trying to tinker with that AGI won't improve your chances if you don't go all the way, in which case phrase "maximizing safety" won't reflect what you did. You can't improve safety of that AGI without fully solving the problem of FAI. Chances of winning this race in the first place, from the current situation of uncertainty, are better.

P.S. I believe metaphors I used have a more or less clear technical meaning. For example, Nature not caring about your plan means that plan won't succeed, and the extent to which it's morally wrong for it to fail doesn't figure into probability of success. These are rhetoric devices to avoid known failure modes in intuitive judgment, not necessarily statements about specific errors, their presence or origin.

Eliezer,

Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity? I get the impression that you are forbidding yourself to proceed until you can do something that is likely impossible for any human intelligence to do. In this universe there are not such broad guarantees of consequences. I can't buy into the notion that careful design of initial conditions of the AGI and of its starting learning algorithms are sufficient for the guarantee you seem to seek. Have I misconstrued what you are saying? Am I missing something?

I also don't get why "I need to beat my competitors" is even remotely a consideration when the result is a much greater than human level intelligence that makes the entire competitive field utterly irrelevant. What does it really matter which person or team finally succeeded?

"Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity?"

Well, obviously one can't be 100% certain, but I'd be curious to know exactly how certain Eliezer wants to be before he presses the start button on his putative FAI. 99.9%? 99.99%? And, Samantha, what's your cutoff for reasonable certainty in this situation? 90%? 99%?

"I can't buy into the notion that careful design of initial conditions of the AGI and of its starting learning algorithms are sufficient for the guarantee you seem to seek."

This is the same way I understand him, and I also think it's pretty audacious, but just maybe possible. I'm vaguely familiar with some of the techniques you might use to go about doing this, and it seems like a really hard problem, but not impossible.

"I also don't get why "I need to beat my competitors" is even remotely a consideration"

How about "I need to beat my non-FAI-savvy competitors"?

Samantha, what you're obtaining is not Probability 1 of doing the right thing. What you're obtaining is a precise (not "formal", precise) statement of how you've defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions assuming that the transistors on the computer chip behave the way they're supposed to, along with some formalization of reflective decision theory that lets you describe what happens when the AI modifies itself and the condition it will try to prove before modifying itself.

Anything short of this is not a sufficiently high standard to cause you to actually think about the problem. I can imagine trying to do this and surviving, but not anything short of that.

What do you mean by "precise"? I think I know more or less what "formal" means, and it's not the same as the common usage of "precise" (unless you pile on a few qualifiers) but you seem to be using it in a technical sense. If you've done a post on it, I must have missed it. Does "precise description" = "technical explanation"?

Yes, "something that constrains very exactly what to expect" is much closer in intent to my "precise" than "something you can describe using neat symbols in a philosophy paper".

OK, then in that light,

What you're obtaining is a precise (not "formal", precise) statement of how you've defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions...

I think you mean to say "precise (not just "formal", precise)", because you still need the formal statement of the precise description in order to prove things about it formally. Which is not to say that precise is a subset of formal or vice versa.

"Precise, not just formal" would be fair in this case.

(The reason I say "in this case" is that reaching for precision is a very different mental strategy than reaching for formality. Many reach for formality who have no concept of how to reach for precision, and end up sticking tags on their black boxes and putting the tags into a logic. So you don't create a logical framework as your first step in reaching for precision; your first step is to figure out what your black boxes are, and then think about your strategy for looking inside...)

Let's see if I can get perm-ignore on, on such an old post.

This whole line of thinking (press "on", six million bodies fall) depends on a self-modifying AI being qualitatively different from a non-self-modifying one OR on self-modifying characteristics being the dominant strategy for achieving AI. In other words, there is a magic intelligence algorithm, which if implemented will lead to exponentially increasing intelligence, then you have to worry about the relative probability of that intelligence being in the Navel Gazing, Paperclips, and Friendly categories (and of course defining the categories) before you hit the switch on any candidate algorithm.

I think that intelligence is a very hard goal to hit, and that there is no self-contained, fast, non-iterative algorithm that rockets there. It is going to be much easier to build successive AIs with IQs of 10, 20, 30... (or 10, 20, 40...; the point remains) than to build a single AI which rockets itself off the scale. And in the process, we will need a lot of research in keeping the things stable, just to get up to 100, let alone to make it to 5000. We will also learn "what kind of animal" the unstable ones are, and what kind of mistake they tend to make. Progress will be slow - there is a long way from IQ100, when it starts to help us out, to IQ500, when it starts to be incomprehensible to humanity. In other words: it is not actually a whole lot easier to do it unfriendly-style than friendly-style.

That does not, of course, mean we should stop worrying about unfriendly risks. But it does mean that it is silly hubris to imagine that yelling "slow down until we figure this out" actually helps our chances of getting it right. We will stub our toe many times before we get the chance to blow our brains out, and anyone who is afraid to take a step because there might be a bullet out there underestimates the complexity and difficulty of the task we face.

homunq, just how confident are you that hard takeoff won't happen?

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31