« Aliens Among Us | Main | InTrade fee structure discourages selling the tails? »

November 23, 2007

Comments

Is there a safe way to wish for an unsafe genie to behave like a safe genie? That seems like a wish TOSWP should work on.

"I wish for a genie that shares all my judgment criteria" is probably the only safe way.

Sounds like we need to formalize human morality first, otherwise you aren't guaranteed consistency. Of course formalizing human morality seems like a hopeless project. Maybe we can ask an AI for help!

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because _all_ human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet). Human morality changes as our technology and understanding changes, sometimes significantly. There is no reason to believe this trend will stop. I am afraid (genuine fear, not figure of speech) that the quest to properly formalize and generalize human morality for use by a 'friendly AI' is akin to properly formalizing and generalizing Ptolemean astronomy.

This generalises. Since you don't know everything, anything you do might wind up being counterproductive.

Like, I once knew a group of young merchants who wanted their shopping district revitalised. They worked at it and got their share of federal money that was assigned to their city, and they got the lighting improved, and the landscaping, and a beautiful fountain, and so on. It took several years and most of the improvements came in the third year. Then their landlords all raised the rents and they had to move out.

That one was predictable in hindsight, but I didn't predict it. There could always be things like that.

When anything you do could backfire, are you better off to stay in bed? No, the advantages of that are obvious but it's also obvious you can't make a living that way.

You have to make your choices and take your chances. If I had an outcome pump and my mother was trapped in a burning building and I had no other way to get her out, I hope I'd use it. The result might be worse than letting her burn to death but at least there would be a chance for a good outcome. If I can just get it to remove some of the bad outcomes the result may be an improvement.

Wonderfully provocative post (meaning no disregard toward the poor old woman caught in the net of a rhetorical and definitional impasse). Obviously in reference to the line of thought in the "devil's dilemma" enshrined in the original Bedazzled, and so many magic-wish-fulfillment folk tales, in which there is always a loophole exploited by a counter-force, probably IMO in response to the motive to shortcut certain aspects of reality and its regulatory processes, known or unknown.

It would be interesting to collect real life anecdotes about people who have "gotten what they want," and end up begging for their old life back, like Dudley Moore's über-frustrated Stanley Moon trapped in a convent.

I hope this question, ultimately of the relationship of the Part and the Whole, continues to be expressed, especially as relevant to any transhuman enterprise.

It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user. Technology has to be designed and it is designed with an effect/result in mind. It is then optimized so that the end user understands how to call forth this effect. So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

I suppose the human mind has a very complex "ceteris paribus" function which holds all these background parameters at equal to their previous values, while not explicitly stating them, and the ironic-wish-fulfillment-Genie idea relates to the fulfillment of a wish while violating an unspoken ceteris paribus rule. Demolishing the building structure violates ceteris paribus more than the movements of a robot-retriever would in moving aside burning material to save the woman. Material displaced from building should be as nearly equal to the womans body weight as possible, inducing an explosion is a horrible violation of the objective, if the Pump could just be made to sense the proper (implied) parameters.

If the market forces of supply and demand continue to undergird technological progress (i.e. research and development and manufacturing), then the development of a sophisticated technology not-optimized-for-use is problematic: who pays for the second round of research implementation? Surely not the customer, when you give him an Outcome Pump whose every use could result in the death and destruction of his surrounding environs and family members. Granted this is an aside and maybe impertinent in the context of this discussion.

Eric, I think he was merely attempting to point out the futility of wishes. Or rather, the futility of asking something for something you want that does not share your judgments on things. The Outcome pump is merely, like the Genie, a mechanism by which to explain his intended meaning. The problem of the outcome pump is, twofold: 1. Any theory that states that time is anything other than a constant now with motion and probability may work mathematically but has yet to be able to actually alter the thing which it describes in a measurable way, and 2. The production of something such as a time machine to begin with would be so destructive as to ultimately prevent the creation of the Outcome Pump.

In fact, as rational as we would like to be, if we are so rational that we miss the forest for the trees, or in this case, the moral for the myth, we sort of undo the reason we have rationality. It's like disassembling a clock to find the time.

Anyhow, the problem of wishes is the trick of prayer: To get something that God will grant, we cannot create a God that wants what we want; it is our inherent experience in life that if God really is all-powerful and above all that he must be singular, and since men's wishes oft conflict he can not by any stretch of the imagination mysteriously coincide with your own capricious desires. Thus you must make the 'wish no wish' which is to change your judgment to that of God's, and then in that case you can not possibly wish something that he will NOT grant.

The mystery of it is that it is still not the same as the 'safe genie'; but at the same time not altogether different. But in the sense that some old Christian Mystics have said the best prayer is the one in which you make no petitions at all (and in fact say nothing!) probably attests to the fact that it is indeed the 'safe genie'.

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because _all_ human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet).

You're right. Hence, CEV.

Eliezer, you read Home on the Strange?

So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in turn wish even more powerful Outcome Pumps into existence. So once you cross a certain threshold, you get an explosion of optimization power, which mere trial and error is not sufficient to control because of the enormous change of context, in particular, the genie has gone from being less powerful than you to being more powerful than you, and what appeared to work in the former context won't work in the latter.

Which is precisely what happened to natural selection when it developed humans.

"Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in turn wish even more powerful Outcome Pumps into existence."

Yes, technology that develops itself, once a certain point of sophistication is reached.

My only acquaintance with AI up to now has been this website:
http://www.20q.net
Which contains a neural network that has been learning for two decades or so.
It can "read your mind" when you're thinking of a character from the TV show The Simpsons. Pretty incredible actually!

Eliezer, I clicked on your name in the above comment box and voila- a whole set of resources to learn about AI. I also found out why you use the adjective "unfortunately" in reference to the Outcome Pump, as its on the Singularity Institute website. Fascinating stuff!

"It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user."

Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases.

"Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases."

No I haven't. Could you expand on what you mean?

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome. This is why we always need to give judges some discretion when administering the law.

James Miller, have you read The Myth of the Rule of Law? What do you think of it?

Every computer programmer, indeed anybody who uses computers extensively has been surprised by computers. Despite being deterministic, a personal computer taken as a whole (hardware, operating system, software running on top of the operating system, network protocols creating the internet, etc. etc.) is too large for a single mind to understand. We have partial theories of how computers work, but of course partial theories sometimes fail and this produces surprise.

This is not a new development. I have only a partial theory of how my car works, but in the old days people only had a partial theory of how a horse works. Even a technology as simple and old as a knife still follows non-trivial physics and so can surprise us (can you predict when a given knife will shatter?). Ultimately, most objects, man-made or not are 'black boxes.'

TGGP,

I have not read the Myth of the Rule of Law.

Given that it's impossible for the someone to know your total mind without being it, the only safe genie is yourself.

From the above it's easy to see why it's never possible to define the "best interests" of anyone but your own self. And from that it's possible to show that it's never possible to define the best interests of the public, except through their individually chosen actions. And from that you can derive libertarianism.

Just an aside :-)

"Ultimately, most objects, man-made or not are 'black boxes.'"

OK, I see what you're getting at.

Three questions about black boxes:

1) Does the input have to be fully known/observable to constitute a black box? When investigating a population of neurons, we can give stimulus to these cells, but we cannot be sure that we are aware of all the inputs they are receiving. So we effectively do not entirely understand the input being given.

2) Does the output have to be fully known/observable to constitute a black box? When we measure the output of a population of neurons, we also cannot be sure of the totality of information being sent out, due to experimental limitations.

3) If one does not understand a system one uses, does that fact alone make that system a black box? In that case there are absolute black boxes, like the human mind, about which complete information *is not known*, and relative black boxes, like the car or TCP/IP, about which complete information *is not known to the current user*.

4) What degree of understanding is sufficient for something not to be called a black box?

Depending on how we answer these things, it will determine whether black box comes to mean:

1) Anything that is identifiable as a 'part', whose input and output is known but whose intermediate working/processing is not understood.
2) Anything that is identifiable as a 'part' whose input, output and/or processing is not understood.
3) Any 'part' that is not completely understood (i.e. presuming access to all information)
4) Anything that is not understood by the user at the time
5) Anything that is not FULLY understood by the user at the time.

We will quickly be in the realm where anything and everything on earth is considered to be a black box, if we take the latter definitions. So how can this word/metaphor be most profitably wielded?

TGGP: What did you think of it? I agree till the Socrates Universe, but thought the logic goes downhill from there.

tggp, that paper was interesting, although I found its thesis unremarkable. You should share it with our pal Mencius.

Upon some reflection, I remembered that Robin has showed that two Bayesians who share the same priors can't disagree. So perhaps you can get your wish from an unsafe genie by wishing, "... to run a genie that perfectly shares my goals and prior probabilities."

As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible? I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

"As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible?"

Such a genie might already exist.

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome.

If the rule doesn't apply, it's not relevant in the first place. I doubt very much you can establish what a 'bad' outcome would involve in such a way that everyone would agree - and I don't see why your personal opinion on the matter should be of concern when we consider legal design.

Such a genie might already exist.

You mean GOD? From the good book? It's more plausible than some stories I could mention.

GOD, I meta-wish for an ((...Emergence-y Re-get) Emergence-y Re-get) Emergency Regret Button.

Recovering Irrationalist said:

I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

Right. It's silly to wish for a genie with the same _beliefs_ as yourself, because the system consisting of you and an unsafe genie is already such a genie.

I discussed "The Myth of the Rule of Law" with Mencius Moldbug here. I recognize that politics alters the application of law and that as long as it is written in natural language there will be irresolvable differences over its meaning. At the same time I observe that different countries seem to hold different levels of respect for the "rule of law" that the state is expected to obey, and it appears to me that those more prone to do so have more livable societies. I think the norm of neutrality on the part of judges applying law with objective meaning is good to be promoted. When there is bad law it is properly the job of the legislature to fix it. This makes it easier for people to know what the law is in advance so they can avoid being smacked with it.

"You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes.... The only safe genie is a genie that shares all your judgment criteria."

Is a genie that *does* share all my judgment criteria necessarily safe?

Maybe my question is ill-formed; I am not sure what "safe" could mean besides "a predictable maximizer of my judgment criteria". But I am concerned that human judgment under ordinary circumstances increases some sort of Beauty/Value/Coolness which would not be increased if that same human judgment was used to search over a less restricted set of possibilities.

The world is full of cases where selecting for A automatically increases B when you are searching over a restricted set of possibilities but does *not* increase B when those restrictions are lifted. Overfitting is a classic example. In cases of overfitting, if we search only over a restricted set of few-parameter models, models that do well on the training set will automatically do well on the generalization set, but if we allow more parameters the correlation disappears.

Modern marketing / product development can search over a larger set of alternatives than we used to have access to. In many cases human judgments correlate with less when used on modern manufactured goods than when used on the smaller set of goods that was formerly available. Judgments of tastiness used to correlate with health but now do not. Judgments of "this is a limited resource which I should grab quickly" used to indicate resources which we really should grab quickly but now do not (because of manufactured "limited time offer only" signs and the like).

Genies or AGI's would search over an even larger space of possibilities than contemporary marketing searches over. In this larger space, many of the traditional correlates of human judgment will disappear. That is: in today's restricted search spaces, outcomes which are ranked highly according to human judgment criteria tend also to have various other properties P1, P2, ... Pk. In an AGI's search space, outcomes which are ranked highly according to human judgment criteria will not have properties P1... Pk.

I am worried that properties P1...Pk are somehow valuable. That is, I am worried that in this world human judgments pick out outcomes that are somehow valuable and that human judgments' ability to do this resides, not in our judgment criteria alone (which would be uploaded into our imagined genie) but in the conjunction of our judgment criteria with the restricted set of possibilities that has so far been available to us.

"Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs."

So, a kind of Maxwell's demon? :)

Rather than designing a genie to exactly match your moral criteria, the simple solution would be to cheat and use *yourself* as the genie. What the Outcome Pump should solve for is your own future satisfaction. To that end, you would omit all functionality other than the "regret button", and make the latter default-on, with activation by anything other than a satisfied-you vanishingly improbable. Say, with a lengthy password.

Of course, you could still end up in a universe where your brain has been spontaneously re-wired to hate your mother. However, I think that such an event is far less likely than a proper rescue.

You have a good point about the exhaustiveness required to ensure the best possible outcome. In that case the ability of the genie to act "safely" would depend upon the level of the genie's omniscience. For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion. Therefore it would effectively be using you as an oracle of success or failure.

A non-omniscient genie would either need complete instructions, or would only work well where there was an ideal solution. For example, if you wished for your mother to be rescued by a fireman without anyone dying or experiencing damage to more than 2% of their skin, bones or internal organs. The difficulty is when not all your criteria can be satisfied. Things suddenly become very murky.

With a safe genie, wishing is superfluous. Just run the genie.

But while most genies are terminally unsafe, there is a domain of "nearly-safe" genies, which must dwarf the space of "safe" genies (examples of a nearly-safe genie: one that picks the moral code of a random living human before deciding on an action or a safe genie + noise). This might sound like semantics, but I think the search for a totally "safe" genie/AI is a pipe-dream, and we should go for "nearly safe" (I've got a short paper on one approach to this here).

I am worried that properties P1...Pk are somehow valuable.

In what sense can they be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?

For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion.

Formalizing "coercion" is itself an exhaustive problem. Saying "don't manipulate my brain except through my senses" is a big first step, but it doesn't exclude, e.g., powerful arguments that you don't really want your mother to live.

Nick,

Are you thinking of magically strong arguments, or ones that convince because they provide good reasons?

I'd think the latter would be valuable even if it leads to a result you'd initially suppose to be bad.

The first.

"In what sense can [properties P1...Pk] be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?"

I don't know. It might be that the only sense in which something can be valuable is to look valuable according to human judgment criteria (when thoroughly implemented, and well informed, and all that). If so, my concern is ill-formed or irrelevant.

On the other hand, it seems *possible* that human judgments of value are an imperfect approximation of what is valuable in some other (external?) sense. Imagine for example if we met multiple alien races and all of them said "I see what you're getting at with this 'value/goodness/beauty/truth' thing, but you are misunderstanding it a bit; in a few thousand years, you will modify your root judgment criteria in such-and-such a way." In that case I would wonder whether my current judgment criteria were not best understood as an approximation of this other set of criteria and whether it was not value according to this other set of criteria that I should be aiming for.

If human judgment criteria *are* an approximation of some other kind of value, they would probably cease to approximate that other kind of value when used to search over the large space of genie-accessible possibilities.

By way of analogy, scientists' criteria for judging scientific truth/relevance/etc. seem to be changing usefully over time, and it may be that scientists' criteria at different times can be viewed as successive approximations of some other (external?) truth-criteria. Galilean physicists had one way of determining what to believe, Newtonians another, and contemporary physicists yet another. In the restricted set of situations considered by Galilean physicists, Galilean methods yield approximately the same predictions as the methods of contemporary physicists. In the larger space of genie-accessible situations, they do not.

Nick,

What makes you think that magically strong arguments are possible? I can imagine arguments that work better than they should because they indulge someone's unconscious inclinations or biases, but not ones that work better than their truthfulness would suggest and cut against the grain of one's inclinations.

I don't know that they are, but it's the conservative assumption, in that it carries less risk of the world being destroyed if you're wrong. Also, see the AI-box experiments.

Excellent post.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31