« Harmful Options | Main | Alien Bad Guy Bias »

December 25, 2008

Comments

"I flinched away from that thought's implications, not so much because I feared superintelligent paternalism myself, but because I feared what other people would say of that position."

This is basically THE reason I always advocate increased comfort with lying. It seems to me that this fear of believing what they don't want to say if they only believe truth is the single largest seemingly removable barrier to people becoming rationalists at all, or passing that barrier, to becoming the best rationalists they can be.

You are forgetting about "Werewolf Contracts" in the Golden Age. Under these contracts you can appoint someone who can "use force, if necessary, to keep the subscribing party away from addictions, bad nanomachines, bad dreams or other self-imposed mental alterations."

If you sign such a contract then, unlike what you wrote, it's not true that "one moment of weakness is enough to betray you."

What is the point of trying to figure out what your friendly AI will choose in each standard difficult moral choice situation, if in each case the answer will be "how dare you disagree with it since it is so much smarter and more moral than you?" If the point is that your design of this AI will depend on how well various proposed designs agree with your moral intuitions in specific cases, well then the rest of us have great cause to be concerned about how much we trust your specific intuitions.

James is right; you only need one moment of "weakness" to approve a protection against all future moments of weakness, so it is not clear there is an asymmetric problem here.

In addition to what James said, I'm reminded of the mechanism to change screen resolution in Windows XP: It automatically resets to its original resolution in X seconds, in case you can't see the screen. This is so people can't break their computers in one moment of weakness.

A similar thing could be done with self-modification. Self-destruction would still be possible, of course, just as it is now (I could go jump off of a bridge). But just as suicide is something that is built up to in humans, failsafes could be put in place so self-modification was equally deliberate.

@James:

Doesn't the choice of a perfect external regulator amount to the same thing as directly imposing restrictions on yourself, thereby going back to the original problem? I suppose such a regulator, on indeed any stabilizing self-modification, could have the advantage of being publicly available and widely used, and therefore be well-tested and having thoroughly understood operations and consequences.

Another way to do it might be to create many copies of yourself (I'm assuming this scenario takes place inside a computer) and let majority (or 2/3s majority or etc) rule when it comes to "rescuing" copies that have made un-self-recoverable errors.

Anyway I suppose this is all somewhat beside the point since such a scenario was chosen as an example of what Eliezer expects a successful future to *not* look like.

@michael vassar:

So, are you saying that lying about your beliefs can be good because it allows you to freely believe some non-PC or otherwise unpopular idea (that your reason leads you to believe is the truth), without having to worry about the social consequences of being discovered to have such a belief?

I'm not sure if I agree with or not but it's worth thinking about.

@Eli, Vassar

Before I comment further, let me clarify please: do you gentlemen understand what you are saying?

Vassar, I think we've talked about this before. Are you contrasting this with Nietzsche's deadly truths or invoking a Hansonian Socratic irony? Otherwise. . .

Hrm... I think, at least initially I'd want some limiters for myself along the lines of, well, the system telling me "this isn't going to do what you actually really want it to do, so no."

But "no mental alteration at all without being a neuroanatomical master yourself in the first place", at least initially, seems a bit too harsh. That is, to the extent that one needs a bit of an intel/etc boost to fully master it in the first place, we'd have a bit of a problem here. :)

I'd be perfectly happy with something that, if, say, I said "I'd like to lower my agression" and it came back with "uh, no. The structure of your mind is such that that is tied to ambition/get-it-doneness and so on, and since even by your own measure you're overly passive as is. This is not what you want, even if you think it is."

(Note, I'm not saying that I have the knowledge to, well, say that agression and ambition are tied to each other like that. This is just a hypothetical, though at least from personal introspection they do seem potentially related like that)

But I'd like it if it also add on something like "However here is a subtler change that can be made that would have the effects you actually wanted out of what you just asked for."

or even "And on that note, here's something to do about that passivity/laziness that seems to be something that is a much larger source of frustration on your part."

However, I don't really have any objection to it sometimes returning with "No. This is the sort of thing you really want (even if you don't know it) to do/work out for yourself in terms of what's already available to you."

And on the other other other hand, there's the issue of "do we really want it to be the sort of thing that we'd perceive as a person, rather than an abstract process?"

"A singleton might be justified in prohibiting standardized textbooks in certain fields, so that people have to do their own science [...]"

No textbooks?! CEV had better overrule you on this one, or my future selves across the many worlds are all going to scream bloody murder. It may be said that I'm missing the point: that ex hypothesi the Friendly AI knows better than me.

But I'm still going to cry.

ShardPhoenix wrote "Doesn't the choice of a perfect external regulator amount to the same thing as directly imposing restrictions on yourself, thereby going back to the original problem?"

No because if there are many possible future states of the world it wouldn't be practical for you in advance to specify what restrictions you will have in every possible future state. It's much more practical for you to appoint a guardian who will make decisions after it has observed what state of the world has come to pass. Also, you might pick a regulator who would impose different restrictions on you than you would if you acted without a regulator.

ShardPhoenix also wrote "Another way to do it might be to create many copies of yourself (I'm assuming this scenario takes place inside a computer) and let majority (or 2/3s majority or etc) rule when it comes to 'rescuing' copies that have made un-self-recoverable errors."

Good idea except in the Golden Age World these copies would become free individuals who could modify themselves. You would also be financially responsible for all of these copies until they became adults.

Robin, if people are tempted to gloss my metaethical agenda as "creating a God to rule us all", then it seems clear that there's an expected benefit from talking about my object-level guesses in order to contradict this, since talking about the meta stuff doesn't seem to grab in quite the same way.

There's also the other standard reasons to talk about Fun Theory, such as people asking too little of the future (a God to rule over us is an example of this pattern, as is expecting wonderful new video games); or further crushing religious notions of theodicy (by illustrating what a well-designed world that respected its inhabitants free will and self-determination would look like, in contrast to this one).

Frelkins, Vassar advocates that rationalists should learn to lie, I advocate that rationalists should practice telling the truth more effectively, and we're still having that argument.

Re: Vassar advocates that rationalists should learn to lie, I advocate that rationalists should practice telling the truth more effectively, and we're still having that argument.

Uh huh. What are the goals of these hypothetical rational agents?

ShardPhoenix: Yes. This is the same principle that says that credible confidentiality within a group can sometimes improve aggregate information flow and collective epistemology.

Tim Tyler: Human goals. I definitely do NOT want alien rationalists to be able to lie, but I doubt I have much choice regarding that. Also not transhuman children. There I might have some limited choice.

Eliezer: I certainly think that rationalists should practice telling truth more effectively as well as lie, and you admit that not lying enough makes people gullible, so it's mostly a matter of estimates of the magnitude of the relevant trade-offs here.
I think that our disagreements are based on radically different models of social psychology. We disagree a great deal about the degree to which being known to sometimes lie reduces future credibility in the eyes of actual existent humans relative to being known to sometimes mislead without lying. I believe that being known to lie increases credibility somewhat relative to "wizards oath", while you think it greatly decreases it. I think that I know your reasons for your belief and that you don't know mine. I'm not sure whether you think that I know your reasons, and I'm not sure whether this difference in social psychological theory is the specific belief we disagree about. I'd like confirmation on whether you agree that this is our main point of disagreement. Also possibly a poll of the audience on the social psychology fact.

For many reasons I think it's better to remember to see a superintelligence as modeling the world (including people in it) on a level different from intentionality, and using concepts unnatural to a human. The world with a superintelligence in it, if you need to understand its impact on the world, doesn't have any humans, any intelligent agents at all, not even the singleton itself in the model that singleton runs in its moments of decision. Only the singleton makes decisions, and with respect to those decisions everything else is stuff of its mind, the material that gets optimized, according to humane utility function. The utility function is ultimately over the stuff of reality, not over transhuman people or any kind of sentient beings. This underlies the perspective on singleton as new humane physics of the world.

The way we interpret the world in a singleton and actions of a singleton on the world is different from the way it interprets the world and makes decisions on it, even if a simplified model agrees with reality nine times out of ten. What the singleton builds can be interpreted back from our perspective as sentient beings, and again sentient beings that we interpret from the optimized stuff of reality, could from our perspective be seen as interpreting what's going on as there being multiple sentient beings going around in a new world, learning, communicating, living their lives. They can even (be interpreted to) interpret the actions of the singleton as certain adjustments to the physics, to people's minds, to objects in the world, but it's not the level where the singleton's decisions are being made. It's the level on which they make their own decisions. Their decisions are determined by their cognitive algorithms, but the outcomes of their decisions are taken into account in arranging the conditions that allow those decisions to be made, even to be thought about, even to the options for thoughts of one agent that lead to thoughts of other agents after object-level interaction that lead to the outcome in question. It's a perpetual worldwide Newcomb's paradox in action, with singleton arranging everything it can to be right, including keeping a balance with unwanted interference, and unwanted awareness of interference, which is interference in its own right, and so on. You are the stuff of physics, and you determine what comes of your actions, but this time physics is not at all simple, in very delicate ways, and you consist of this superintelligent physics as well. I think that this perspective allows to see how the guiding process can be much more subtle than prohibiting things that fall in natural human or transhuman categories.

Of course, these human interpretations would apply to optimized future only if the singleton is tuned so perfectly as to produce something that can be described by them, and maybe not even then, because a creative surprise could show a better unexpected way.

I think that an empirical approach self modification would quickly become prominent. alter one variable and test it, with a self imposed timeout clause.
the problem is that this does not apply to one sort of change: a change in utility function. an inadvertent change of utility function is extremely dangerous, because changing your utility function is of infinite negative utility by the standards of your current utility, and vice-versa.

an inadvertent change of utility function is extremely dangerous, because changing your utility function is of infinite negative utility by the standards of your current utility, and vice-versa.

Not true at all. A change from N_paperclips to N_paperclips + 10^-100*N_staples, for instance, probably has no effect. A change to N_paperclips + .5*N_staples might result in fewer paperclips, but finitely many.

I should have specified a domain change. a modification that varies your utility function by degree has a calculable negative utility.

nazgulnarsil, can you give examples? I don't understand your claim. What do you mean by "domain change" here?

Michael Vasar:- maybe you chose to work in an area, where you had to lie to survive. Perhaps Eli works in an area where the discovery of lying has a higher price (in destroyed reputation) than sticking to the inconvenient truth. But unfortunately I think it is easier to discount a truth-sayer (he is after all an alien) than a randomised liar (he is one of us). In other words it is easier to buy the mix of truth-and-untruth than the truth and nothing but the truth. But the social result seems to be the same - untruth wins.

Michael Vasar:- maybe you chose to work in an area, where you had to lie to survive. Perhaps Eli works in an area where the discovery of lying has a higher price (in destroyed reputation) than sticking to the inconvenient truth. But unfortunately I think it is easier to discount a truth-sayer (he is after all an alien) than a randomised liar (he is one of us). In other words it is easier to buy the mix of truth-and-untruth than the truth and nothing but the truth. But the social result seems to be the same - untruth wins.

Wright either didn't know or chose to ignore the thinking that led to Asimov's Three Laws. While the laws themselves (that robots must keep humans from coming to harm, obey human orders, and preserve themselves, in that order of priority) are impossible to codify, the underlying insight that we make knives with hilts is sound. Science fiction has a dystopian/idiot inventor streak because that makes it easier to get the plot going.

From another angle, part of sf is amplifying aspects of the real world. We can wreck our lives in a moment of passion or bad judgement, or by following a bad idea repeatedly.

Having to figure out the neuroscience by yourself is not an especially good protection against mistakes. Knowing how to make a change is different from and easier than knowing how to debug a change.

I don't think prohibiting textbooks is necessary or sufficient to give people the pleasure of making major discoveries. Some people are content to solve puzzles, but others don't just want being right, they want to be right about something new. My feeling is that the world is always going to be more complex than what we know about it. I'm hoping that improved tools, including improved cognition, will mean that we'll never run out of new things, including new general principles, to discover.

I agree with Psy-Kosh that advice should and would be available, and also something like therapy if you suspect that you've deeply miscalibrated yourself. However, there is going to more than one system of advice and of therapy because there isn't going to be agreement on what constitutes an improvement.

Excuse me if it's been covered here, but in an environment like that deciding, not just what you want, but what changes turn you into not-you is a hard problem.


Eliezer, this post seems to me to reinforce, not weaken, a "God to rule us all" image. Oh, and among the various clues that might indicate to me that someone would make a good choice with power, the ability to recreate that power from scratch does not seem a particularly strong clue.

In re lying when you're trying to set up a research and invention organization: It seems to me that it would make recruiting difficult. The public impression of what you're doing is going to be your lies, which makes it even harder to get the truth of what you're doing to the people you want to work with. And even the discrepancy between your public and private versions doesn't appear in some embarrassing form on the internet, you're going to tend to attract sneaky people and repel candid people, and this will probably make it harder to have an organization which does what you want.

The fact that Michael Vassar is willing to advocate "increased comfort with lying" in a public forum suggests to me that we are not talking about a literal Secret a la intelligence work, but something more along the lines of little white lies like "You're looking good today" where the listener as well as the speaker knows to apply a discounting factor. I might be willing to tolerate that in people I associate with - in fact, I do so all the time - so long as the overall system is one where it's okay if I give only true answers when I'm questioned myself.

However, the fact that Michael Vassar can't think of a better word than "lie" for this, for the sake of PR purposes, suggests to me that he's not going to be very good at shading the truth - that he's still trying to approach things the nerd way. Non-nerds lie easily and they'd never think of calling the process "increased comfort with lying", either - at least I've never read a non-nerd using those words outright, whatever it is they're actually advocating. But now I'm getting into the details of our current strategic debates, which isn't really on-topic for this post.

I should have read Michael Vassar's original post in this thread more carefully.

I suspect that people's fear of becoming more rational has at least as much to do with the perceived consequences of being more honest with themselves about what they're doing as it does with the fear of having to tell the truth to other people.

Michael, I thought that you advocated comfort with lying because smart people marginalize themselves by compulsive truth-telling. For instance, they find it hard to raise venture capital. Or (to take an example that happened at my company), when asked "Couldn't this project of yours be used to make a horrible terrorist bioweapon?", they say, "Yes." (And they interpret questions literally instead of practically; e.g., the question actually intended, and that people actually hear, is more like, "Would this project significantly increase the ease of making a bioweapon?", which might have a different answer.)

Am I compulsively telling the truth again? Doggone it.

Is it just me, or did Wright's writing style sound very much like Eliezer's?

Surely the problem with the clipping isn't the loaded gun or the stern stoicism - it's the daft Prime Directive. Of course you should edit someone back to sanity, by force if necessary. I could play rhetorical tricks and argue from incapacity, but I won't even do that. Saving people is just obviously the right thing to do.

@James Miller:

What justifies the right of your past self to exert coercive control over your future self? Their may be overlap of interests, which is one of the typical de facto criteria for coercive intervention; but can your past self have an epistemic vantage point over your future self?

Can you write a contract saying that if your future self ever converts away from Christianity, the Church has the right to convert you back? Can you write a contract saying that your mind is to be overwritten with an approximation of Richard Dawkins who will then be tortured in hell forever for his sins?

If you constrain the contracts that can be written, then clearly you have an idea of good or bad mindstates apart from the raw contract law, and someone is bound to ask why you don't outlaw the bad mindstates directly.

If children under 75 don't need Werewolf Contracts, why should children under 750?

Phaethon, in the story, refuses to sign a Werewolf Contract out of pride, just like his father. You could laugh and call him an idiot. Personally, I think that (a) many people are at least that stupid, at least right now and (b) it's cruel to inflict horrific punishments on people for no greater idiocy than that. But at any rate, why force Phaethon to sacrifice his pride, by putting him in that environment? Why make him give up on his dream of adulthood? Why force everyone to take a cautious non-heroic approach to life or else risk a fate worse than death? Phaethon is being harmed by the extra options offered him, one way or another.

Peter: if your change of utility functions is of domain rather than degree you can't calculate the negative utility. the difference in utility between making 25 paperclips a day and 500 a day is a calculable difference for a paperclip maximizing optimization process.

however, if the paperclip optimizer self-modifies and inadvertently changes his utility function to maximizing staples....well you can't calculate paperclips in terms of staples. This outcome is of infinite negative utility from the perspective of the paperclip maximizer. And vice-versa. Once the utility function has changed to maximizing staples, it would be of infinite negative utility to change back to paperclips from the perspective of the staple maximizing utility.

this defeats the built in time out clause. with a modification that only affects your ability to reach your current utility, you have a measurable output. with a change that changes your utility you are changing the very thing you were using to measure success by.

I know that this isn't worded very well. I'm sure one of elizer's posts has done this subject better at some point.

Eliezer-

“What justifies the right of your past self to exert coercive control over your future self? There may be overlap of interests, which is one of the typical de facto criteria for coercive intervention; but can your past self have an epistemic vantage point over your future self?”

In general I agree. But werewolf contracts protect against temporary lapses in rationality. My level of rationality varies. Even assuming that I remain in good health for eternity there will almost certainly exist some hour in the future in which my rationality is much lower than it is today. My current self, therefore, will almost certainly have an “epistemic vantage point over [at least a small part of my] future self.” Given that I could cause great harm to myself in a very short period of time I am willing to significantly reduce my freedom in return for protecting myself against future temporary irrationality.

Having my past self exert coercive control of my future self will reduce my future information costs. For example, when you download something from the web you must often agree to a long list of conditions. Under current law if these terms of conditions included something like “you must give Microsoft all of your wealth” the term wouldn’t be enforced. If the law did enforce such terms then you would have to spend a lot of time examining the terms of everything you agreed to. You would be much better off if your past self prevented your current self from giving away too much in the fine print of agreements.

“If you constrain the contracts that can be written, then clearly you have an idea of good or bad mindstates apart from the raw contract law, and someone is bound to ask why you don't outlaw the bad mindstates directly.”

The set of possible future mindstates / world state combinations is very large. It’s too difficult to figure out in advance which combinations are bad. It’s much more practical to sign a Werewolf contract which gives your guardian the ability to look at the mindstate / worldstate you are in and then decide if you should be forced to move to a different mindstate.


“why force Phaethon to sacrifice his pride, by putting him in that environment?”

Phaethon placed greater weight on freedom than pride and your type of paternalism would reduce his freedom.

But in general I agree that if most humans alive today were put in the Golden Age world then many would do great harm to themselves and in such a world I would prefer that the Sophotechs exercise some paternalism. But if such paternalism didn’t exist then Warewolf contracts would greatly reduce the type of harm you refer to.

nazgulnarsil, I think you're confused about what a utility function is. "Maximizing paperclips" or "maximizing staples" are not utility functions, although they may describe the actions carried out by an expected utility maximizer. Try reading the wikipedia article on expected utility.

When something is particularly dangerous or potentially destructive you must not be allowed to have a textbook telling you the safe way to implement it. Instead, you should discover such destructive powers by your own (relatively) weak skills and, presumably, trial and error. You are not permitted to learn from other people's mistakes. You must make them yourself.

I'm not feeling safe yet. Am I at least allowed to see casualty statistics for exploring particular fields of prohibited study? Perhaps a graph of success rate vs IQ and time spent in background research?

Cameron, I suppose that's a fair enough comment. I'm used to the way things work in AI, where the naive simply fail utterly and completely to accomplish anything whatsoever, rather than hurting anyone else or themselves, and you have to get pretty far to get beyond that to the realm of dangers.

Not to mention, I'm used to the idiom that the lack of any prior feedback or any second try is what makes something an adult problem - but that really is representative of the dangers faced by someone able to modify their own brain circuitry. If there's an AI that can say "No" but you're allowed to ignore the "No" then one mistake is fatal. The sort of people who think, "Gee, I'll just run myself with the modification for a while and see what happens 'cuz I can always go back" - they might ignore a "No" based on their concept of "testing", and then that would be the end of them.

You want to put the dangerous things behind a challenge/lock such that by the time you pass it you know how dangerous they really are. "Make an AI", unfortunately, may not be quite strong enough as a case of this, but "Make an AI without anyone helping you on a 100MHz computer with 1GB of RAM and 100GB of disk space" is probably strong enough.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31