« Funding Bias | Main | Interpersonal Morality »

July 28, 2008

Comments

Mike Blume: Do you claim that the CEV of a pygmy father would assert that his daughter's clitoris should not be sliced off? Or that the CEV of a petty thief would assert that he should not possess my iPod?

Mike, a coherent extrapolated volition is generally something you do with more than one extrapolated volition at once, though I suppose you could extrapolate a single human's volition into a spread of outcomes and look for coherence in the spread. But this level of metaethics is of interest primarily to FAIfolk, I would think.

With that said, if I were building a Friendly AI, I would probably be aiming to construe 'extrapolated volitions' across at least the same kind of gaps that separate Archimedes from the modern world. Whether you can do this on a strictly individual extrapolation - whether Archimedes, alone in a spacesuit and thinking, would eventually cross the gap on his own - is an interesting question.

At the very least, you should imagine the pygmy father having full knowledge of the alternate lives his daughter would lead, as though he had lived them himself - though that might or might imply full empathy, it would at the least imply full knowledge.

And at the very least, imagine the petty thief reading through everything ever written in the Library of Congress, including everything ever written about morality.

This advice is hardly helpful in day-to-day moral reasoning, of course, unless you're actually building an AI with that kind of extrapolative power.

Vladimir Nesov: 'Same moral arguments as before' doesn't seem like an answer, in the same sense as 'you should continue as before' is not a good advice for cavemen (who could benefit from being brought into modern civilization). If cavemen can vaguely describe what they want from environment, this vague explanation can be used to produce optimized environment by sufficiently powerful optimization process that is external to cavemen...

At this point you're working with Friendly AI. Then, indeed, you have legitimate cause to dip into metaethics and make it a part of your conversation.

Unknown: "But it is quite impossible that the complicated calculation in Eliezer's brain should be exactly the same as the one in any of us: and so by our standards, Eliezer's morality is immoral. And this opinion is subjectively objective, i.e. his morality is immoral and would be even if all of us disagreed. So we are all morally obliged to prevent him from inflicting his immoral AI on us"

Well, I would agree with this point if I thought what Eliezer was going to inflict upon us was so out of line with what I want that we would be better off without it. Since, you know, NOT dying doesn't seem like such a bad thing to me, I'm not going to complain, when he's one of the only people on Earth actually trying to make that happen...

On the other hand, Eliezer, you *are* going to have to answer to millions if not billions of people protesting your view of morality, especially this facet of it (the not dying thing), so yeah, learn to be diplomatic. You NOT allowed to fuck this up for the rest of us!

Unknown wrote:
As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI.

As Bayesians, educated by Mr. Yudkowsky himself, I think we all know the probability of such an event is quite low. In 2004, in the most moving and intelligent eulogy I have ever read, Mr. Y stated: "When Michael Wilson heard the news, he said: "We shall have to work faster." Any similar condolences are welcome. Other condolences are not." Somewhere, some person or group is working faster, but at the Singularity Institute, all the time is being spent on somewhat brilliant and very entertaining writing. I shall continue to read and reflect, for my own enjoyment. But I hope those others I mentioned have Mr. Y's native abilities, because I agree with Woody Allen: "I don't want to achieve immortality through my work. I want to achieve it by not dying."

Any chance of a post summarizing all of the building block posts for this topic, like you did with your physics posts?
I hate to be a beggar, but that would be very helpful.

Just another point as to why important, meglomeniacal types like Eliezer need to have their motives checked:
Frank Vertosick, in his book "When the Air Hits Your Brain: Tales from Neurosurgery," about a profession I am seriously considering, describes what becomes of nearly *all* people taking such power over life and death:

"He was the master... the 'ptototypical surgical psychopath' - someone who could render a patient quadriplegic in the morning, play golf in the afternoon, and spend the evening fretting about that terrible slice off the seventh tee. At the time this seemed terrible, but I soon learned he was no different than any other experierienced neurosurgeon in this regard... I would need to learn not to cry at funerals."

I had an interesting conversation with a fellow traveler about morality in which he pointed out that 'upright' citizens will commit the worst atrocities in the name of a greater good that they *think* they understand... Maybe some *absolute* checks are required on actions, especially those of people who might actually have a lot power over the outcome of the future. What becomes of the group lead by the man who simultaneously Achilles and Agamemnon?

About the comments on compromise: that's why I changed my mind. The functions are so complex that they are bound to be different in the complex portions, but they also have simplifying terms in favor of compromise, so it is possible that everyone's morality will end up the same when this is taken into account.

As for the probability that Eliezer will program an AI, it might not be very low, but it is extremely low that his will be the first, simply because so many other people are trying.

I'm near Unknown's position. I don't trust any human being with too much power. No matter how nice they seem at first, history indicates to me that they inevitably abuse it. We've been told that a General AI will have power beyond any despot known to history. Am I supposed to have that much reliance on the essential goodness within Eliezer's heart? And in case anyone brings this up, I certainly don't trust the tyranny of the majority either. I don't recognize any moral obligation to stop it because I don't recognize any obligations at all. Also, I might not live to seem him or his followers immanentize the Eschaton.

Female circumcision is commonly carried out by women who've undergone the procedure themselves. So I don't think the Pygmy father will be convinced.

Larry: You think you can convince a paperclip maximizer to value human life? Or do you think paperclip maximizers are impossible?

I don't think that convincing arbitrary minds is the point. The point is that there does exist a canonical system of values. Just because there's an objectively true and canonical morality doesn't mean that every mind in existence can be persuaded to follow it: some minds are simply not rational.

Perhaps I should not have said "I disagree" to Eli, I should have said "what you say is trivially true, but it misses the point"

Female circumcision is commonly carried out by women who've undergone the procedure themselves.

Then they don't know the true difference between the two possible lives, do they?

I see this line of thinking coming directly out of Hume. Some of Hume's main points, as I read him:

1. Morality flows straight from humanity's values, and that's it.

2. Morality is universalizable among humans because of the psychological commonality.

3. What we are really doing in ethics is trying to find general principles which explain the values we have, then we can use the general principle to make ethical decisions. This is another label for trying to define the big abstract computation in our heads so that we can better optimize it. Hume never really questions our ethical beliefs; he just takes them as given and tries to understand them.

I'm very interested in how Eliezer gets from his meta-ethics to utilitarianism. Many an experiment has been thought for the sole purpose of showing how utilitarianism is in direct conflict with our moral intuitions. On the other hand, the same can be said for deontological ethics.

I don't understand why it must be a given that things like love, truth, beauty, murder, etc.. are universal moral truths that are right or wrong independent of the person computing the morality function. I know you frown upon mentioning evolutionary psychology, but is it really a huge stretch to surmise that the more even-keeled, loving and peaceful tribes of our ancestors would out-survive the wilder warmongers who killed each other out? Even if their good behavior was not genetic, the more "moral" leaders would teach/impart their morality to their culture until it became a general societal truth. We find cannibalism morally repugnant, yet for some long isolated islander tribes it was totally normal and acceptable, what does this say about the universal morality of cannibalism?

In short, I really reading enjoyed your insight on evaluating morality by looking backwards from results, and your idea of a hidden function that we all approximate is a very elegant idea, but I still don't understand how you saying "murder is wrong no matter whether I think it's right or not" does not amount to a list of universal moral postulates sitting somewhere in the sky.

TGGP:
I have great sympathy with this position. An incorrectly formatted AI is one of the biggest fears of the singularity institute, mainly because there are so many more ways to be way wrong than even slightly right about it... It might be that the task of making an actually friendly AI is just too difficult for *anyone,* and our efforts should be spent in preventing *anyone* from creating a generally intelligent AI, in the mean time trying to figure out, with our inperfect *human* brains and the crude tools at our disposal, how to make uploads ourselves or create other physical means of life-extension... No idea. The particulars are out of my area of expertise. I might keep your brain from dying a little longer though... (stroke research)

Matt Simpson: Many an experiment has been thought for the sole purpose of showing how utilitarianism is in direct conflict with our moral intuitions.

I disagree, or you're referring to something I haven't heard of. If I know what you mean here, those are a species of strawman ("act") utilitarianism that doesn't account for the long-term impact and adjustment of behavior that results.

(I'm going to stop giving the caveats; just remember that I accept the possibility you're referring to something else.)

For example, if you're thinking about cases where people would be against a doctor deciding to carve up a healthy patient against his will to save ~40 others, that's not rejection of utilitarianism. It can be recognition that once a doctor does that, people will avoid them in droves, driving up risks all around.

Or, if you're referring to the case of how people would e.g. refuse to divert a train's path so it hits one person instead of five, that's not necessarily an anti-utilitarian intuition; there are many factors at play in such a scenario. For example, the one person may be standing in a normally safe spot and so consented to a lower level of risk, and so by diverting the train, you screw up the ability of people to see what's really safe, etc.

So, what do you do about inter-morality? Suppose a paperclip maximizer with the intellect and self-improvement limits of a typical human thirteen-year-old. They are slightly less powerful than you and can be expected to stay that way, so they aren't an existential threat. They are communicative and rational but they have one terminal value, and it's paperclips.

How ought a moral human to react? Can you expect to negotiate an inter-morality? Or must the stronger party win by force and make the weaker party suffer non-fulfillment? Is this mis-treatment and immoral? Ought you to allow them a bale of wire so they can at least make *some* paperclips?

I second TGGP that the mention of wearing black etc. is ridiculous.

Lara: The portion you quote from Caledonian isn't at all well-defined itself; it's a near-pure insult hinting at, but not giving, actual arguments. I fully support its deletion. Also, Eliezer isn't saying "keep on believing what you believe", but "keep on following the process you have been"; he allows for moral error.

Lara, TGGP: The most important point is that building an AI, unlike surgery or dictatorship, doesn't give you any power to be corrupted by - any opportunity to make decisions with short-term life-or-death results - until the task is complete, and shortly after that (for most goal systems, including Friendly ones) it's out of your hands. Eliezer's obvious awareness of rationalization is encouraging wrt not committing atrocities out of good intentions. CEV's "Motivations" section, and CFAI FAQs 1.(2,3) and their references address correcting for partial programmer selfishness. Finally, I would think there would be more than one AI programmer, reducing the risk of deliberate evil.

Matt: See The "Intuitions" Behind "Utilitarianism".

Boris: See The Gift We Give To Tomorrow. Eliezer isn't saying there's some perfect function in the sky that evolution has magically led us to approximate. By "murder is wrong no matter whether I think it's right or not", he just means Eliezer-in-this-world judges that murder is still wrong in the counterfactual world where counterfactual-Eliezer judges that murder is right.

Richard: Eliezer thinks he can approximate the GHOST because the GHOST - his GHOST, more properly - is defined with respect to his own mind. Again, it's not some light in the sky. He can't, by definition, be twisted in such a way as to not be an approximation of his GHOST. And he obviously isn't suggesting that anyone is infallible.

Eliezer: It seems to me that your point would be much more clear (like to Boris and Richard) if you would treat morality as a 2-place function: "I judge that murder is wrong even if...", not "Murder is wrong even if...". (Would you say Allan is right to call your position relativism?)

Eliezer [in response to me]: This just amounts to defining should as an abstract computation, and then excluding all minds that calculate a different rule-of-action as "choosing based on something other than morality". In what sense is the morality objective, besides the several senses I've already defined, if it doesn't persuade a paperclip maximizer?

I think my position is this:

If there really was such a thing as an objective morality, it would be the case that only a subset of possible minds could actually discover or be persuaded of that fact.

Presumably, for any objective fact, there are possible minds who could never be convinced of that fact.

Caledonian: uh... he didn't say you couldn't make arguments _about_ all possible minds, he was saying you couldn't construct an argument that's so persuasive, so convincing that every possible mind, no matter how unusual its nature, would automatically be convinced by that argument.

That point is utterly trivial. You can implement any possible relationship between input and output. That even includes minds that are generally rational but will fail only in specifically-defined instances - such as a particular person making a particular argument. This does not, however, support the idea that we shouldn't bother searching for valid arguments, or that we'd need to produce arguments that would convince every possible information-processing system capable of being convinced.

"So that we can regard our present values, as an approximation to the ideal
morality that we would have if we heard all the arguments, to whatever extent
such an extrapolation is coherent."

This seems to be in the right ballpark, but the answer is dissatisfying
because I am by no means persuaded that the extrapolation would be coherent
at all (even if you only consider one person.) Why would it? It's
god-shatter, not Peano Arithmetic.

There could be nasty butterfly effects, in that the order in which you
were exposed to all the arguments, the mood you were in upon hearing them and
so forth could influence which of the arguments you came to trust.

On the other hand, viewing our values as an approximation to the ideal
morality that us would have if we heard all the
arguments, isn't looking good either: correctly predicting a bayesian port of
a massive network of sentient god-shatter looks to me like it would require a
ton of moral judgments to do at all. The subsystems in our brains sometimes
resolve things by fighting (ie. the feeling being in a moral dilemma.)
Looking at the result of the fight in your real physical brain isn't helpful
to make that judgment if it would have depended on whether you just had a
cup of coffee or not.

So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?

Caledonian: I can't think of anyone EVER choosing to interpret statements as stupid rather than sensible to the degree to which you do on this blog. There is usually NO ambiguity and you still get things wrong and then blame them for being stupid.

In all honesty why do you post here? On your own blog you are articulate and intelligent. Why not stick with that and leave commenting to people who want to actually respond to what people say rather than to straw men?

We've been told that a General AI will have power beyond any despot known to history.

If that will be then we are doomed. Power corrupts. In theory an AI, not being human, might resist the corruption, but I wouldn't bet on that. I do not think it is a mere peculiarity of humanity that we are vulnerable to corruption.

We humans are kept in check by each other. We might therefore hope, and attempt to engineer, a proliferation of self-improving AIs, to form a society and to keep each other in check. With luck, cooperative AIs might be more successful at improving themselves - just as honest folk are for the most part more successful than criminals - and thus tend for the most part to out-pace the would-be despots.

As far as how a society of AIs would relate to humans, there are various possibilities. One dystopia imagines that humans will be treated like lower animals, but this is not necessarily what will happen. Animals are not merely dumb, but unable to take part in mutual respect of rights. We humans will always be able to and so might well remain forever recipients of an AI respect for our rights however much they evolve past us. We may of course be excluded from aspects of AI society which we humans are not able to handle, just as we exclude animals from rights. We may never earn hyper-rights, whatever those may be. But we might retain our rights.

Nick,

Eliezer's one-place function is exactly infallible, because he defines "right" as its output.

I misunderstood some of Eliezer's notation. I now take his function to be an extrapolation of his volition rather than anyone else's. I don't think this weakens my point: if there were a rock somewhere with a lookup table for this function written on it, Eliezer should always follow the rock rather than his own insights (and according to Eliezer everyone else should too), and this remains true even if there is no such rock.

Furthermore, the morality function is based on extrapolated volition. Someone who has only considered one point of view on various moral questions will disagree with their extrapolated (completely knowledgable, completely wise) volition in certain predictable ways. That's exactly what I mean by a "twist."

I can't think of anyone EVER choosing to interpret statements as stupid rather than sensible to the degree to which you do on this blog.
It's not a matter of choice - there have to be sensible interpretations available.

[MISREPRESENTATION]If, as Eliezer's defenders insist, we should interpret his remarks as suggesting that there is no point to looking for convincing arguments regarding 'morality' because there are no arguments that will convince all possible minds,[/MISREPRESENTATION] how exactly can this be construed as sensible? How is this compatible with rational inquiry? It's (usually) understood that the only arguments we need to concern ourselves with rational arguments that will convince rational minds - not in this case, it seems.

Interpreting the comments about mindspace being too big as referring to extent, rather than inclusion, still renders them stupid. But at least it's a straightforward and simple stupidity that is easily remedied. If [MISREPRESENTATION]the standard alluded to[/MISREPRESENTATION] were actually implemented, we'd have to discard all arguments about anything, because there is no topic where specific arguments will convince all possible minds.

I don't know much about the ins-and-outs of blog identification. Is it possible that someone could post diametrically, under two names, such as Caledonian and Robin Hanson, in order to maintain a friendship while minimizing the importance of efforts that one considers unimportant?

Constant: Corruptibility is a complex evolutionary adaptation. Even the best humans have hard-to-suppress semiconscious selfish motivations that tend to come out when we have power, even if we thought we were gaining that power for idealistic reasons. There's no reason an AI with an intelligently designed, clean, transparent goal system would be corrupted, or need to be kept in check. "Respect for rights" is similarly anthropomorphic. Creating a society of AIs would be very problematic due to the strong first-mover effect, and the likely outcome amounts to extinction anyway.

Richard: Of course Eliezer should follow the rock; by stipulation, it states exactly those insights he would have if he were perfectly informed, rational, etc. This is nothing like "morality as the will of God", since any such rock would have a causal dependency on his brain. It's not clear to me that he's saying everyone else should as well. Also by stipulation (AFAICS), his volition will be able to convince him through honest argument of any of its moral positions, regardless of how twisted he might be. (I share Marcello's concern here, though.)

Caledonian: Eliezer is obviously not saying there is no point to looking for convincing arguments regarding 'morality', but that there are no arguments that will sway all minds, and there don't need to be any for morality to be meaningful. You're being ridiculous. (And how do you define 'rational mind'?)

inquiringmind: I know Caledonian's IP address. He's not Robin.

Why not, I can't help myself: Caledonian = Thersites, Eliezer = Agamemnon

Thersites only clamour’d in the throng,
Loquacious, loud, and turbulent of tongue:
Awed by no shame, by no respect controll’d,
In scandal busy, in reproaches bold:
With witty malice studious to defame,
Scorn all his joy, and laughter all his aim:—
But chief he gloried with licentious style
To lash the great, and monarchs to revile.
...
Sharp was his voice; which in the shrillest tone,
Thus with injurious taunts attack’d the throne.
Whate’er our master craves submit we must,
Plagued with his pride, or punish’d for his lust.
Oh women of Achaia; men no more!
Hence let us fly, and let him waste his store
In loves and pleasures on the Phrygian shore.
We may be wanted on some busy day,
When Hector comes: so great Achilles may:
From him he forced the prize we jointly gave,
From him, the fierce, the fearless, and the brave:
And durst he, as he ought, resent that wrong,
This mighty tyrant were no tyrant long.”
...
“Peace, factious monster, born to vex the state,
With wrangling talents form’d for foul debate:
Curb that impetuous tongue, nor rashly vain,
And singly mad, asperse the sovereign reign.
Have we not known thee, slave! of all our host,
The man who acts the least, upbraids the most?
...
Expel the council where our princes meet,
And send thee scourged and howling through the fleet.”

This post is called the "The Meaning of Right", but it doesn't spend much time actually defining what situations should be considered as right instead of wrong, other than a bit at the end which seems to define "right" as simply "happiness". Rather its a lesson in describing how to take your preferred world state, and causally link that to what you'd have to do to get to that state. But that world state is still ambiguously right/wrong, according to any absolute sense, as of this post.

So does this post say what "right" means, other than simply "happiness" (which sounds like generic utilitarianism), am I simply missing something?

Eliezer once wrote that "We can build up whole networks of beliefs that are connected only to each other - call these "floating" beliefs. It is a uniquely human flaw among animal species, a perversion of Homo sapiens's ability to build more general and flexible belief networks.

The rationalist virtue of empiricism consists of constantly asking which experiences our beliefs predict - or better yet, prohibit."

I can't see how nearly all of the beliefs expressed in this post predict or prohibit any experience.

Silas: I'm referring to all thought experiments where the intended purpose was to show that utilitarianism is inconsistent with our moral intuitions. So, yes, the examples you mention, and more. Most of them do fall short of their purpose.

Nick Tarleton: I'm not sure all seemingly anti-utilitarian intuitions can be explained away by scope insensitivity, but that does take care of the vast majority of cases.

One case I was thinking of (for both of you) is the 'utility monster:' someone who receives such glee from killing, maiming, and otherwise causing havoc that the pain others endure due to him is virtually always outweighed by the happiness the monster receives.

Another case would be the difference between killing a terrorist who has 10 people hostage and murdering an innocent man to save 10 people. I would think that in general, people would be willing to do the first while hesitant to do the second, though I defer to anyone who knows the empirical literature.

Then they don't know the true difference between the two possible lives, do they?
"True difference" gets me thinking of "no true Scotsman". Has there ever been anybody who truly knew the difference between two possible lives? Even if someone could be reincarnated and retain memories the order would likely alter their perceptions.

I'm very interested in how Eliezer gets from his meta-ethics to utilitarianism
He's not a strict utilitarian in the "happiness alone" sense. He has an aversion to wireheading, which maximizes the classic version of utility.

I know you frown upon mentioning evolutionary psychology, but is it really a huge stretch to surmise that the more even-keeled, loving and peaceful tribes of our ancestors would out-survive the wilder warmongers who killed each other out?
Yes, it is. The peaceful ones would be vulnerable to being wiped out by the more warlike ones. Or, more accurately (group selection isn't as big a factor given intergroup variance being smaller than intragroup variance), the members of the peaceful tribe more prone to violence would achieve dominance as hawks among doves do. Among the Yanonamo we find high reproductive success among men who have killed. The higher the bodycount, the more children. War and murder appear to be human universals.

Eliezer's obvious awareness of rationalization is encouraging
Awareness of biases can increase errors, so it's not encouraging enough given the stakes.

Finally, I would think there would be more than one AI programmer, reducing the risk of deliberate evil
I'm not really worried about that. No one is a villain in their own story, and people we would consider deviants would likely be filtered out of the Institute and would probably be attracted to other career paths anyway. The problem exists, but I'm more concerned with well-meaning designers creating something that goes off in directions we can't anticipate.

Caledonian, Eliezer never said anything about not bothering to look for arguments. His idea is to find out how he found respond if he were confronted with all arguments. He seems to assume that he (or the simulation of him) will correctly evaluate arguments. His point about no universal arguments is that he has to start with himself rather than some ghostly ideal behind a veil of ignorance or something like that.

It seems to me you're saying that what our conscience tells us is right is right because it's output is what we mean by "right" in the first place. While I agree in general that a concept is it's referents, I don't agree with what you're saying here.

Those referents are not values but evaluations. And they are evaluations with respect to a standard that we can in fact change. We don't choose the output of our conscience on the spot, in that sense it is objective, but over time we can reprogram it through repetition and effort. It's evaluations are short-term objective but long term subjective.

Just do what is, ahem, right - to the best of your ability to weigh the arguments you have heard, and ponder the arguments you may not have heard.

What's "the best of your ability"? 'Best' is a determination of quality. What constitutes quality reasoning about 'morality'?

When we talk about quality reasoning in, say, math, we don't have problems with that question. We don't permit just any old argument to be acceptable - if people's reasoning doesn't fit certain criteria, we don't accept that reasoning as valid. That they make the arguments, and that they may able to make only those arguments, is utterly irrelevant. If those are the only arguments they're capable of making, we say they're incapable of reasoning about math, we don't redefine our concept of math to permit their arguments to be sensible.

We have a conceptual box we call 'morality'. We know that people have mutually contradictory ideas about what sorts of things go in the box. It follows that we can't resolve the question of what should go in the box by looking at the output of people's morality evaluations. Those outputs are inconsistent; they can't proceed from a common set of principles.

So we have to look at the nature of the evaluations, not the output of the evaluations, and determine which outputs are right and which aren't. Not 'right' in the 'moral' sense, whatever that is - that would be circular reasoning. We can't evaluate moral evaluations with moral evaluations. 'Right' in the sense that mathematical arguments are right.

I'm confused. I'll try to rephrase what you said, so that you can tell me whether I understood.

"You can change your morality. In fact, you do it all the time, when you are persuaded by arguments that appeal to other parts of your morality. So you may try to find the morality you really should have. But - "should"? That's judged by your current morality, which you can't expect to improve by changing it (you expect a particular change would improve it, but you can't tell in what direction). Just like you can't expect to win more by changing your probability estimate to win the lottery.

Moreover, while there is such a fact as "the number on your ticket matches the winning number", there is no ultimate source of morality out there, no way to judge Morality_5542 without appealing to another morality. So not only you can't jump to another morality, you also have to reason to want to: you're not trying to guess some true morality.

Therefore, just keep whatever morality you happen to have, including your intuitions for changing it."

Did I get this straight? If I did, it sounds a lot like a relativistic "There is no truth, so don't try to convice me" - but there is indeed no truth, as in, no objective morality.

TGGP wrote:

We've been told that a General AI will have power beyond any despot known to history.
Unknown replied:
If that will be then we are doomed. Power corrupts. In theory an AI, not being human, might resist the corruption, but I wouldn't bet on that. I do not think it is a mere peculiarity of humanity that we are vulnerable to corruption.
A tendency to become corrupt when placed into positions of power is a feature of some minds. Evolutionary psychology explains nicely why humans have evolved this tendency. It also allows you to predict that other intelligent organisms, evolved in a sufficiently similar way, would be likely to have a similar feature.
Humans having this kind of tendency is a predictable result of what their design was optimized to do, and as such them having it doesn't imply much for minds from a completely different part of mind design space.
What makes you think a human-designed AI would be vulnerable to this kind of corruption?

A mind with access to its source code, if it doesn't want to be corrupted, won't be.

What does 'corrupted' mean in this context?

If we go by definitions, we have

6. to destroy the integrity of; cause to be dishonest, disloyal, etc., esp. by bribery.
7. to lower morally; pervert: to corrupt youth.
8. to alter (a language, text, etc.) for the worse; debase.
9. to mar; spoil.
10. to infect; taint.
11. to make putrid or putrescent.

Most of those meanings cannot apply here - and the ones that do refer to changes in morality.

A tendency to become corrupt when placed into positions of power is a feature of some minds.

Morality in the human universe is a compromise between conflicting wills. The compromise is useful because the alternative is conflict, and conflict is wasteful. Law is a specific instance of this, so let us look at property rights: property rights is a decision-making procedure for deciding between conflicting desires concerning the owned object. There really is no point in even having property rights except in the context of the potential for conflict. Remove conflict, and you remove the raison d'etre of property rights, and more generally the raison d'etre of law, and more generally the raison d'etre of morality. Give a person power, and he no longer needs to compromise with others, and so for him the raison d'etre of morality vanishes and he acts as he pleases.

The feature of human minds that renders morality necessary is the possibility that humans can have preferences that conflict with the preferences of other humans, thereby requiring a decisionmaking procedure for deciding whose will prevails. Preference is, furthermore, revealed in the actions taken by a mind, so a mind that acts has preferences. So all the above is applicable to an artificial intelligence if the artificial intelligence acts.

What makes you think a human-designed AI would be vulnerable to this kind of corruption?

I am assuming it acts, and therefore makes choices, and therefore has preferences, and therefore can have preferences which conflict with the preferences of other minds (including human minds).

I am assuming [the AI] acts, and therefore makes choices, and therefore has preferences, and therefore can have preferences which conflict with the preferences of other minds (including human minds).

An AI can indeed have preferences that conflict with human preferences, but if it doesn't start out with such preferences, it's unclear how it comes to have them later.

On the other hand, if it starts out with dubious preferences, we're in trouble from the outset.

Constant: "Give a person power, and he no longer needs to compromise with others, and so for him the raison d'etre of morality vanishes and he acts as he pleases."

If you could do so easily and with complete impunity, would you organize fights to death for your pleasure? Would you even want to? Moreover, humans are often tempted to do things they know they shouldn't, because they also have selfish desires. AIs don't if you don't build it into them. If they really do ultimately care about humanity's well-being, and don't take any pleasure from making people obey them, they will keep doing so.

An AI can indeed have preferences that conflict with human preferences, but if it doesn't start out with such preferences, it's unclear how it comes to have them later.

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted. For example, babies do not start out preferring Bach to Beethoven or Beethoven to Bach, but adults are able to develop that preference, even if it is not clear at this point how they come to do so.

If you could do so easily and with complete impunity, would you organize fights to death for your pleasure?

Voters have the ability to vote for policies and to do so easily and with complete impunity (nobody retaliates against a voter for his vote). And, unsurprisingly, voters regularly vote to take from others to give unto themselves - which is something they would never do in person (unless they were criminals, such as muggers or burglars). Moreover humans have an awe-inspiring capacity to clothe their rapaciousness in fine-sounding rhetoric.

Moreover, humans are often tempted to do things they know they shouldn't, because they also have selfish desires. AIs don't if you don't build it into them.

Conflict does not require selfish desires. Any desire, of whatever sort, could potentially come into conflict with another person's desire, and when there are many minds each with its own set of desires then conflict is almost inevitable. So the problem does not, in fact, turn on whether the mind is "selfish" or not. Any sort of desire can create the conflict, and conflict as such creates the problem I described. In a nutshell: evil men need not be selfish. A man such as Pol Pot could indeed have wanted nothing for himself and still ended up murdering millions of his countrymen.

Larry D'Anna: The reason that we say it is too big is because there are subsets of Mindspace that do admit universally compelling arguments, such as (we hope) neurologically intact humans.

What precisely is neurological intactness? It rather seems to me that the majority agrees on some set of "self-evident" terminal values, and those few people that do not are called psychopaths. If by "human" we mean what usually people understand by this term, then there are no compelling arguments even for humans. Althoug I gladly admit your statement is approximatively valid, I am not sure how to formulate it to be exactly true and not simultaneously a tautology.

Constant [sorry for getting the attribution wrong in my previous reply] wrote:

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.
I do not know whether those changes in opinion indicate changes in terminal values, but it doesn't really matter for the purposes of this discussion, since humans aren't (capital-F) Friendly. You definitely don't want an FAI to unpredictably change its terminal values. Figuring out how to reliably prevent this kind of thing from happening, even in a strongly self-modifying mind (which humans aren't), is one of the sub-problems of the FAI problem.
To create a society of AIs, hoping they'll prevent each other from doing too much damage, isn't a viable solution to the FAI problem, even in the rudimentary "doesn't kill all humans" sense. There's various problems with the idea, among them:
1. Any two AIs are likely to have a much vaster difference in effective intelligence than you could ever find between two humans (for one thing, their hardware might be much more different than any two working human brains). This likelihood increases further if (at least) some subset of them is capable of strong self-improvement. With enough difference in power, cooperation becomes a losing strategy for the more powerful party.
2. The AIs might agree that they'd all be better off if they took the matter currently in use by humans for themselves, dividing the spoils among each other.

Any two AIs are likely to have a much vaster difference in effective intelligence than you could ever find between two humans (for one thing, their hardware might be much more different than any two working human brains). This likelihood increases further if (at least) some subset of them is capable of strong self-improvement. With enough difference in power, cooperation becomes a losing strategy for the more powerful party.

I read stuff like this and immediately my mind thinks, "comparative advantage." The point is that it can be (and probably is) worthwhile for Bob and Bill to trade with each other even if Bob is better at absolutely everything than Bill. And if it is worthwhile for them to trade with each other, then it may well be in the interest of neither of them to (say) eliminate the other, and it may be a waste of resources to (say) coerce the other. It is worthwhile for the state to coerce the population because the state is few and the population are many, so the per-person cost of coercion falls below the benefit of coercion; it is much less worthwhile for an individual to coerce another (slavery generally has the backing of the state - see for example the fugitive slave laws). But this mass production of coercive fear works in part because humans are similar to each other and so can be dealt with more or less the same way. If AIs are all over the place, then this does not necessarily hold. Furthermore if one AI decides to coerce the humans (who are admittedly similar to each other) then the other AIs may oppose him in order that they themselves might retain direct access to humans.

The AIs might agree that they'd all be better off if they took the matter currently in use by humans for themselves, dividing the spoils among each other.

Maybe but maybe not. Dividing the spoils paints a picture of the one-time destruction of the human race, and it may well be to the advantage of the AIs not to kill off the humans. After all, if the humans have something worth treating as spoils, then the humans are productive and so might be even more useful alive.

You definitely don't want an FAI to unpredictably change its terminal values. Figuring out how to reliably prevent this kind of thing from happening, even in a strongly self-modifying mind (which humans aren't), is one of the sub-problems of the FAI problem.

The FAI may be an unsolvable problem, if by FAI we mean an AI into which certain limits are baked. This has seemed dubious ever since Asimov. The idea of baking in rules of robotics has long seemed to me to fundamentally misunderstand both the nature of morality and the nature of intelligence. But time will tell.

Humans having this kind of tendency is a predictable result of what their design was optimized to do, and as such them having it doesn't imply much for minds from a completely different part of mind design space.
Eliezer seems to be saying his FAI will emulate his own mind, assuming it was much more knowledgeable and had heard all the arguments.

Um, no. First, the last revision of the plan called for focusing the FAI on the whole human species, not just one or more programmers. Second, the extrapolation is a bit more complicated than "if you knew more". I am neither evil nor stupid.

After all, if the humans have something worth treating as spoils, then the humans are productive and so might be even more useful alive.
Humans depend on matter to survive, and increase entropy by doing so. Matter can be used for storage and computronium, negentropy for fueling computation. Both are limited and valuable (assuming physics doesn't allow for infinite-resource cheats) resources.
I read stuff like this and immediately my mind thinks, "comparative advantage." The point is that it can be (and probably is) worthwhile for Bob and Bill to trade with each other even if Bob is better at absolutely everything than Bill.
Comparative advantage doesn't matter for powerful AIs at massively different power levels. It exists between some groups of humans because humans don't differ in intelligence all that much when you consider all of mind design space, and because humans don't have the means to easily build subservient-to-them minds which are equal in power to them.
What about a situation where Bob can defet Bill very quickly, take all its resources, and use them to implement a totally-subservient-to-Bob mind which is by itself better at everything Bob cares about than Bill was? Resolving the conflict takes some resources, but leaving Bill to use them a) inefficiently and b) for not-exactly-Bob's goals might waste (Bob's perspective) even more of them in the long run. Also, eliminating Bill means Bob has to worry about one less potential threat that it would otherwise need to keep in check indefinitely.
The FAI may be an unsolvable problem, if by FAI we mean an AI into which certain limits are baked.
You don't want to build an AI with certain goals and then add on hard-coded rules that prevent it from fulfilling those goals with maximum efficiency. If you put your own mind against that of the AI, a sufficiently powerful AI will always win that contest. The basic idea behind FAI is to build an AI that genuinely wants good things to happen; you can't control it after it takes off, so you put in your conception of "good" (or an algorithm to compute it) into the original design, and define the AI's terminal values based on that. Doing this right is an extremely tough technical problem, but why do you believe it may be impossible?

We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.

I believe Eliezer is trying to create "fully recursive self-modifying agents that retain stable preferences while rewriting their source code". Like Sebastian says, getting the "stable preferences" bit right is presumably necessary for Friendly AI, as Eliezer sees it.

(This clause "as Eliezer sees it" isn't meant to indicate dissent, but merely my total incompetence to judge whether this condition is strictly necessary for friendly AI.)

I find it interesting that I found many of the posts leading up to this one intensely hard to follow as they seemed to be arguing against worldviews that I had little or no comprehension of.

So, I must say that I am very relieved to see that your take on what morality is, is what I've been assuming it is, all along: just a fascinating piece of our internal planning software.

Eliezer,

I've just reread your article and was wondering if this is a good quick summary of your position (leaving apart how you got to it):

'I should X' means that I would attempt to X were I fully informed.

Here 'fully informed' is supposed to include complete relevant empirical information and also access to all the best relevant philosophical arguments.

To cover cases where people are making judgments about what others should do, I could also extend this summary in a slightly more cumbersome way:

When X judges that Y should Z, X is judging that were she fully informed, she would want Y to Z

This allows X to be incorrect in her judgments (if she wouldn't want Y to Z when given full information). It allows for others to try to persuade X that her judgment is incorrect (it preserves a role for moral argument). It reduces 'should' to mere want (which is arguably simpler). It is, however, a conception of should that is judger-dependent: it could be the case that X correctly judges that Y should Z, while W correctly judges that Y should not Z.

I have a newbie question... if A) quantum mechanics shows that we can't distinguish personal identity by the history of how someone's atoms got into the configuration that they are in, and B) morality (other things being equal) flows backwards from the end result, and C) it is immoral to allow a child to die on the railroad tracks, then D) why would it not also be immoral to decide not to marry and have children? Both decisions have the same consequence (a live child who otherwise would not be).

At some point we (or the machines we build) will be able to manipulate matter at the quantum level, so I think these kind of questions will be important if we want to be able to make moral decisions when we have that capability.

If I myself were given the task to program the little child life saving machine, I admit that right now I wouldn't know how to do better than a naive leads-to-child-living rule which would result in the mass of the observable universe being converted into habitat for children...

Assuming that we want it-all-adds-up-to-normalcy, we would hope to find a rule consistent with quantum mechanics that would end up with saving the life of a child on the railroad tracks having a higher moral imperative than converting the available mass of the universe into children (and habitat etc. so that they have happy fulfilling lives etc...)

The it-all-adds-up-to-normalcy approach though reminds me a bit of the correspondence principle in quantum mechanics. (The correspondence principle says that for large systems quantum mechanics should give the same result as classical mechanics). The principle was very useful when quantum mechanics was first being developed, but it completely broke down once we had large systems such as superconductors which could not be described classically. Similarly, I can imagine that perhaps my moral judgments would change if I was able to integrate the reality of quantum mechanics into my moral thinking.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31