« Roberts' Bias Therapy | Main | Academic Ideals »

January 29, 2009

Comments

"Except to remark on how many different things must be known to constrain the final answer."

What would you estimate the probability of each thing being correct is?

What is human morals and metamorals?

What about "near-human" morals, like, say, Kzinti: Where the best of all possible words contains hierarchies, duels to the death, and subsentient females; along with exploration, technology, and other human-like activities. Though I find their morality repugnant for humans, I can see that they have the moral "right" to it. Is human morality, then, in some deep sense better than those?

I think Eliezer is due for congratulation here. This series is nothing short of a mammoth intellectual achievement, integrating modern academic thought about ethics, evolutionary psychology and biases with the provocative questions of the transhumanist movement. I've learned a staggering amount from reading this OB series, especially about human values and my own biases and mental blank spots.

I hope we can all build on this. Really. There's a lot left to do, especially for transhumanists and those who hope for a significantly better future than the best available in today's world. For those who have more pedestrian ambitions for the future (i.e. most of the world), this series provides a stark warning as to how the well intentioned may destroy everything.

Bravo!

[crosspost from h+ goodness]

Pearson, it's not that kind of chaining. More like trying to explain to someone why their randomly chosen lottery ticket won't win (big space, small target, poor aim) when their brain manufactures argument after argument after different argument for why they'll soon be rich.

The core problem is simple. The targeting information disappears, so does the good outcome. Knowing enough to refute every fallacious remanufacturing of the value-information from nowhere, is the hard part.

What are the odds that every proof of God's existence is wrong, when there are so many proofs? Pretty high. A selective search for plausible-sounding excuses won't change reality itself. But knowing the specific refutations - being able to pinpoint the flaws in every supposed proof - that might take some study.

I have read and considered all of Eliezer's posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values? Or is it only in the stage after ours where such rare good values were unlikely to exist?

I imagine a distant future with just a smattering of paper clip maximizers -- having risen in different galaxies with slightly different notions of what a paperclip is -- might actually be quite interesting. But even so, so what? Screw the paperclips, even if they turn out to be more elegant and interesting than us!

Robin, I discussed this in The Gift We Give To Tomorrow as a "moral miracle" that of course isn't really a miracle at all. We're judging the winding path that evolution took to human value, and judging it as fortuitous using our human values. (See also, "Where Recursive Justification Hits Bottom", "The Ultimate Source", "Created Already In Motion", etcetera.)

RH: "I have read and considered all of Eliezer's posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values?"

- yes, it almost certainly was because of the way we evolved. There are two distinct events here:

1. A species evolves to intelligence with the particular values we have.

2. Given that a species evolves to intelligence with some particular values, it decides that it likes those values.

1 is an extremely unlikely event. 2 is essentially a certainty.

One might call this "the ethical anthopic argument"

Evolution (as an algorithm) doesn't work on the indestructible. Therefore all naturally-evolved beings must be fragile to some extent, and must have evolved to value protecting their fragility.

Yes, a designed life form can have paper clip values, but I don't think we'll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets.

Ian C.: [i]"Yes, a designed life form can have paper clip values, but I don't think we'll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets."[i]
Almost all life forms (especially simpler ones) are sort of paperclip maximizers, they just make copies of themselves ad infinitum. If life could leave this planet and use materials more efficiently, it would consume everything. Good for us evolution couldn't optimize them to such an extent.

Ian: some individual values of other naturally-evolved beings may be recognizable, but that doesn't mean that the value system as a whole will.

I'd expect that carnivores, or herbivores, or non-social creatures, or hermaphrodites, or creatures with a different set of senses - would probably have some quite different values.

And there can be different brain architectures, different social/political organisation, different transwhateverism technology, etc.

Roko:

Not so fast. We like some of our evolved values at the expense of others. Ingroup-outgroup dynamics, the way we're most motivated only when we have someone to fear and hate: this too is an evolved value, and most of the people here would prefer to do away with it if we can.

The interesting part of moral progress is that the values etched into us by evolution don't really need to be consistent with each other, so as we become more reflective and our environment changes to force new situations upon us, we realize that they conflict with one another. The analysis of which values have been winning and which have been losing (in different times and places) is another fascinating one...

"Ingroup-outgroup dynamics, the way we're most motivated only when we have someone to fear and hate: this too is an evolved value, and most of the people here would prefer to do away with it if we can."

So you would want to eliminate your special care for family, friends, and lovers? Or are you really just saying that your degree of ingroup-outgroup concern is less than average and you wish everyone was as cosmopolitan as you? Or, because ingroup-concern is indexical, it results in different values for different ingroups, so you wish every shared your precise ingroup concerns? Or that you are in a Prisoner's Dilemma with other groups (or worse), and you think the benefit of changing the values of others would be enough for you to accept a deal in which your own ingroup-concern was eliminated?

http://www.overcomingbias.com/2008/03/unwanted-morali.html

I suspect it gets worse. Eliezer seems to lean heavily on the psychological unity of humankind, but there's a lot of room for variance within that human dot. My morality is a human morality, but that doesn't mean I'd agree with a weighted sum across all possible extrapolated human moralities. So even if you preserve human morals and metamorals, you could still end up with a future we'd find horrifying (albeit better than a paperclip galaxy). It might be said that that's only a Weirdtopia, that's you're horrified at first, but then you see that it's actually for the best after all. But if "the utility function [really] isn't up for grabs," then I'll be horrified for as long as I damn well please.

This post seems almost totally wrong to me. For one thing, its central claim - that without human values the future would, with high probability be dull is not even properly defined.

To be a little clearer, one would need to say something like: if you consider a specified enumeration over the space of possibile utility functions, a random small sample from that space would be "dull" (it might help to say a bit more about what dullness means too, but that is a side issue for now).

That claim might well be true for typical "shortest-first" enumerations in sensible languages - but it is not a very interesting claim - since the dull utility functions would be those which led to an attainable goal - such as "count up to 10 and then stop".

The "open-ended" utilility functions - the ones that resulted in systems that would spread out - would almost inevitably lead to rich complexity. You can't turn the galaxy into paper-clips (or whatever) without extensively mastering science, technology, intergalactic flight, nanotechnology - and so on. So, you need scientists and engineers - and other complicated and interesting things. This conclusion seems so obvious as to hardly be worth discussing to me.

I've explained all this to Eleizer before. After reading this post I still have very little idea about what it is that he isn't getting. He seems to think that making paper clips are boring. However, they are not any more boring than making DNA sequences, and that's the current aim of most living systems.

A prime-seeking civilisation has a competitive disadvantage over one that doesn't have silly, arbitrary bits tacked on to its utility function. It is more likely to be wiped out in a battle with an alien race - and it's more likely to suffer from a mutiny from within. However, that is about all. They are unlikely to lack science, technology, or other interesting stuff.

Carl:

I don't think that automatic fear, suspicion and hatred of outsiders is a necessary prerequisite to a special consideration for close friends, family, etc. Also, yes, outgroup hatred makes cooperation on large-scale Prisoner's Dilemmas even harder than it generally is for humans.

But finally, I want to point out that we are currently wired so that we can't get as motivated to face a huge problem if there's no villain to focus fear and hatred on. The "fighting" circuitry can spur us to superhuman efforts and successes, but it doesn't seem to trigger without an enemy we can characterize as morally evil.

If a disease of some sort threatened the survival of humanity, governments might put up a fight, but they'd never ask (and wouldn't receive) the level of mobilization and personal sacrifice that they got during World War II— although if they were crafty enough to say that terrorists caused it, they just might. Concern for loved ones isn't powerful enough without an idea that an evil enemy threatens them.

Wouldn't you prefer to have that concern for loved ones be a sufficient motivating force?

@Eliezer: Can you expand on the "less ashamed of provincial values" part?

@Carl Shuman: I don't know about him, but for myself, HELL YES I DO. Family - they're just randomly selected by the birth lottery. Lovers - falling in love is some weird stuff that happens to you regardless of whether you want it, reaching into your brain to change your values: like, dude, ew - I want affection and tenderness and intimacy and most of the old interpersonal fun and much more new interaction, but romantic love can go right out of the window with me. Friends - I do value friendship; I'm confused; maybe I just value having friends, and it'd rock to be close friends with every existing mind; maybe I really value preferring some people to others; but I'm sure about this: I should not, and do not want to, worry more about a friend with the flu than about a stranger with cholera.

@Robin Hanson: HUH? You'd really expect natural selection to come up with minds who enjoy art, mourn dead strangers and prefer a flawed but sentient woman to a perfect catgirl on most planets?

This talk about "'right' means right" still makes me damn uneasy. I don't have more to show for it than "still feels a little forced" - when I visualize a humane mind (say, a human) and a paperclipper (a sentient, moral one) looking at each other in horror and knowing there is no way they could agree about whether using atoms to feed babies or make paperclips, I feel *wrong*. I think about the paperclipper in exactly the same way it thinks about me! Sure, that's also what happens when I talk to a creationist, but we're trying to approximate external truth; and if our priors were too stupid, our genetic line would be extinct (or at least that's what I think) - but morality doesn't work like probability, it's not trying to approximate anything external. So I don't feel so happier about the moral miracle that made us than about the one that makes the paperclipper.

Patrick,

Those are instrumental reasons, and could be addressed in other ways. I was trying to point out that giving up big chunks of our personality for instrumental benefits can be a real trade-off.

http://www.overcomingbias.com/2007/03/policy_debates_.html

Jordan: "I imagine a distant future with just a smattering of paper clip maximizers -- having risen in different galaxies with slightly different notions of what a paperclip is -- might actually be quite interesting."

That's exactly how I imagine the distant future. And I very much like to point to the cyclic cellular automaton (java applet) as a visualization. Actually, I speculate that we live in a small part of the space-time continuum not yet eaten by a paper clip maximizer. Now you may ask: Why don't we see huge blobs of paper clip maximizers expanding on the night sky? My answer is that they are expanding with the speed of light in every direction.

Note: I abused the term paper clip maximizer somewhat. Originally I called these things Expanding Space Amoebae, but PCM is more OB.

Probability of an evolved alien species:

(A) Possessing analogues of pleasure and pain: HIGH. Reinforcement learning is simpler than consequentialism for natural selection to stumble across.

(B) Having a human idiom of boredom that desires a steady trickle of novelty: MEDIUM. This has to do with acclimation and adjustment as a widespread neural idiom, and the way that we try to abstract that as a moral value. It's fragile but not impossible.

(C) Having a sense of humor: LOW.

Probability of an expected paperclip maximizer having analogous properties, if it originated as a self-improving code soup (rather than by natural selection), or if it was programmed over a competence threshold by foolish humans and then exploded:

(A) MEDIUM

(B) LOW

(C) LOW

the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.

I'm not convinced of that. First, "vast majority" needs to use an appropriate measure, one that is applicable to evolutionary results. If, when two equally probable mutations compete in the same environment, one of those mutations wins, making the other extinct, then the winner needs to be assigned the far greater weight. So, for example, if humans were to compete against a variant of human without the boredom instinct, who would win?

Second, it would seem easier to build (or mutate into) something that keeps going forever than it is to build something that goes for a while then stops. Cancer, for example, just keeps going and going, and it takes a lot of bodily tricks to put a stop to that.

it would seem easier to build (or mutate into) something that keeps going forever than it is to build something that goes for a while then stops.

On reflection, I realize this point might be applied to repetitive drudgery. But I was applying it to the behavior "engage in just so much efficient exploration." My point is that it may be easier to mutate into something that explores and explores and explores, than it would be to mutate into something that explores for a while then stops.

Thanks for the probability assessments. What is missing are supporting arguments. What you think is relatively clear - but why you think it is not.

...and what's the deal with mentioning a "sense of humour"? What has that to do with whether a civilization is complex and interesting? Whether our distant descendants value a sense of humour or not seems like an irrelevance to me. I am more concerned with whether they "make it" or not - factors affecting whether our descendants outlast the exploding sun - or whether the seed of human civilisation is obliterated forever.

@Jordan - agreed.

I think the big difference in expected complexity is between sampling the
space of possible singletons' algorithms results and sampling the space of
competitive entities. I agree with Eliezer that an imprecisely
chosen value function, if relentlessly optimized, is likely to yield a dull
universe. To my mind the key is that the ability to relentlessly
optimize one function only exists if a singleton gets and keeps an
overwhelming advantage over everything else. If this does not happen, we
get competing entities with the computationally difficult problem of
outsmarting each other. Under this scenario, while I might not like the
detailed results, I'd expect them to be complex to much the same extent
and for much the same reasons as living organisms are complex.

What if I want a wonderful and non-mysterious universe? Your current argument seems to be that there's no such thing. I don't follow why this is so. "Fun" (defined as desire for novelty) may be the simplest way to build a strategy of exploration, but it's not obvious that it's the only one, is it?

A series on "theory of motivation" that explores other options besides novelty and fun as prime directors of optimization processes that can improve the universe (in their and maybe even our eyes).

"This talk about "'right' means right" still makes me damn uneasy. I don't have more to show for it than "still feels a little forced" - when I visualize a humane mind (say, a human) and a paperclipper (a sentient, moral one) looking at each other in horror and knowing there is no way they could agree about whether using atoms to feed babies or make paperclips, I feel *wrong*. I think about the paperclipper in exactly the same way it thinks about me! Sure, that's also what happens when I talk to a creationist, but we're trying to approximate external truth; and if our priors were too stupid, our genetic line would be extinct (or at least that's what I think) - but morality doesn't work like probability, it's not trying to approximate anything external. So I don't feel so happier about the moral miracle that made us than about the one that makes the paperclipper."

Oh my, this is so wrong. So you're postulating that the paperclipper would be extinct too due to natural selection? Somehow I don't see the mechanisms of natural selection applying to that. With it being created once by humans and then exploding, and all that.

If 25% of its "moral drive" is the result of a programming error, is it still "understandable and as much of a worthy creature/shaper of the Universe" as us? This is the cosmopolitan view that Eliezer describes; and I don't see how you're convinced that admiring static is just as good as admiring evolved structure. It might just be bias but the later seems much better. Order > chaos, no?

@Jotaf, "Order > chaos, no?"

Imagine God shows up tomorrow. "Everyone, hey, yeah. So I've got this other creation and they're super moral. Man, moral freaks, let me tell you. Make Mennonites look Shintoist. And, sure, I like them better than you. It's why I'm never around, sorry. Thing is, their planet is about to get eaten by a supernova. So.. I'm giving them the moral green light to invade Earth. It's been real."

I'd be the first to sign up for the resistance. Who cares about moral superiority? Are we more moral than a paperclip maximizer? Are human ideals 'better'? Who cares? I don't want an OfficeMax universe, so I'll take up arms against a paperclip maximizer, whether its blessed by God or not.

Carl:

Those are instrumental reasons, and could be addressed in other ways.

I wouldn't want to modify/delete hatred for instrumental reasons, but on behalf of the values that seem to clash almost constantly with hatred. Among those are the values I meta-value, including rationality and some wider level of altruism.

I was trying to point out that giving up big chunks of our personality for instrumental benefits can be a real trade-off.

I agree with that heuristic in general. I would be very cautious regarding the means of ending hatred-as-we-know-it in human nature, and I'm open to the possibility that hatred might be integral (in a way I cannot now see) to the rest of what I value. However, given my understanding of human psychology, I find that claim improbable right now.

My first point was that our values are often the victors of cultural/intellectual/moral combat between the drives given us by the blind idiot god; most of human civilization can be described as the attempt to make humans self-modify away from the drives that lost in the cultural clash. Right now, much of this community values (for example) altruism and rationality over hatred where they conflict, and exerts a certain willpower to keep the other drive vanquished at times. (E.g. repeating the mantra "Politics is the Mind-Killer" when tempted to characterize the other side as evil).

So far, we haven't seen disaster from this weak self-modification against hatred, and we've seen a lot of good (from the perspective of the values we privilege). I take this as some evidence that we can hope to push it farther without losing what we care about (or what we want to care about).

(E.g. repeating the mantra "Politics is the Mind-Killer" when tempted to characterize the other side as evil)

Uh, I don't mean that literally, though doing up a whole Litany of Politics might be fun.

Maybe it's the types I of haunts I've been frequenting lately, but the elimination of all conscious life in the universe doesn't strike me as too terrible at the moment (provided it doesn't shorten my own lifespan).

We can sort the values evolution gave us into the following categories (not necessarily exhaustive). Note that only the first category of values is likely to be preserved without special effort, if Eliezer is right and our future is dominated by singleton FOOM scenarios. But many other values are likely to survive naturally in alternative futures.

- likely values for all intelligent beings and optimization processes
(power, resources)
- likely values for creatures with roughly human-level brain power
(boredom, knowledge)
- likely values for all creatures under evolutionary competition
(reproduction, survival, family/clan/tribe)
- likely values for creatures under evolutionary competition who cannot copy their minds
(individual identity, fear of personal death)
- likely values for creatures under evolutionary competition who cannot wirehead
(pain, pleasure)
- likely values for creatures with sexual reproduction
(beauty, status, sex)
- likely values for intelligent creatures with sexual reproduction
(music, art, literature, humor)
- likely values for intelligent creatures who cannot directly prove their beliefs
(honesty, reputation, piety)
- values caused by idiosyncratic environmental characteristics
(salt, sugar)
- values caused by random genetic/memetic drift and co-evolution
(Mozart, Britney Spears, female breasts, devotion to specific religions)

The above probably isn't controversial, rather the disagreement is mainly on the following:

- the probabilities of various future scenarios
- which values, if any, can be preserved using approaches such as FAI
- which values, if any, we should try to preserve

I agree with Roko that Eliezer has made his case in an impressive fashion, but it seems that many of us are still not convinced on these three key points.

Take the last one. I agree with those who say that human values do not form a consistent and coherent whole. Another way of saying this is that human beings are not expected utility maximizers, not as individuals and certainly not as societies. Nor do most of us desire to become expected utility maximizers. Even amongst the readership of this blog, where one might logically expect to find the world's largest collection of EU-maximizer wannabes, few have expressed this desire. But there is no principled way to derive an utility function from something that is not an expected utility maximizer!

Is there any justification for trying to create an expected utility maximizer that will forever have power over everyone else, whose utility function is derived using a more or less arbitrary method from the incoherent values of those who happen to live in the present? That is, besides the argument that it is the only feasible alternative to a null future. Many of us are not convinced of this, neither the "only" nor the "feasible".

- likely values for all intelligent beings and optimization processes
(power, resources)

Agree.

- likely values for creatures with roughly human-level brain power
(boredom, knowledge)

Disagree. Maybe we don't mean the same thing by boredom?

- likely values for all creatures under evolutionary competition
(reproduction, survival, family/clan/tribe)

Mostly agree. Depends somewhat on definition of evolution. Some evolved organisms pursue only 1 or 2 of these but all pursue at least one.

- likely values for creatures under evolutionary competition who cannot copy their minds
(individual identity, fear of personal death)

Disagree. Genome equivalents which don't generate terminally valued individual identity in the minds they descrive should outperform those that do.

- likely values for creatures under evolutionary competition who cannot wirehead
(pain, pleasure)

Disagree. Why not just direct expected utility? Pain and pleasure are easy to find but don't work nearly as well.

- likely values for creatures with sexual reproduction
(beauty, status, sex)

Define sexual. Most sexual creatures are too simple to value the first two. Most plausible posthumans aren't sexual in a traditional sense.

- likely values for intelligent creatures with sexual reproduction
(music, art, literature, humor)

Disagree.

- likely values for intelligent creatures who cannot directly prove their beliefs
(honesty, reputation, piety)

Agree assuming that they aren't singletons. Even then for sub-components.

- values caused by idiosyncratic environmental characteristics
(salt, sugar)

Agree.

- values caused by random genetic/memetic drift and co-evolution
(Mozart, Britney Spears, female breasts, devotion to specific religions)

Agree. Some caveats about Mozart.

I agree with Eliezer that an imprecisely chosen value function, if relentlessly optimized, is likely to yield a dull universe.

So: you think a "paperclip maximiser" would be "dull"?

How is that remotely defensible? Do you think a "paperclip maximiser" will master molecular nanotechnology, artificial intelligence, space travel, fusion, the art of dismantling planets and stellar farming?

If so, how could that possibly be "dull"? If not, what reason do you have for thinking that those technologies would not help with the making of paper clips?

Apparently-simple processes can easily produce great complexity. That's one of the lessons of Conway's game.

Maybe we don't mean the same thing by boredom?

I'm using Eliezer's definition: a desire not to do the same thing over and over again. For a creature with roughly human-level brain power, doing the same thing over and over again likely means it's stuck in a local optimum of some sort.

Genome equivalents which don't generate terminally valued individual identity in the minds they descrive should outperform those that do.

I don't understand this. Please elaborate.

Why not just direct expected utility? Pain and pleasure are easy to find but don't work nearly as well.

I suppose you mean why not value external referents directly instead of indirectly through pain and pleasure. As long as wireheading isn't possible, I don't see why the latter wouldn't work just as well as the former in many cases. Also, the ability to directly value external referents depends on a complex cognitive structure to assess external states, which may be more vulnerable in some situations to external manipulation (i.e. unfriendly persuasion or parasitic memes) than hard-wired pain and pleasure, although the reverse is probably true in other situations. It seems likely that evolution would come up with both.

Define sexual. Most sexual creatures are too simple to value the first two. Most plausible posthumans aren't sexual in a traditional sense.

I mean reproduction where more than one party contributes genetic material and/or parental resources. Even simple sexual creatures probably have some notion of beauty and/or status to help attract/select mates, but for the simplest perhaps "instinct" would be a better word than "value".

- likely values for intelligent creatures with sexual reproduction
(music, art, literature, humor)

Disagree.

These all help signal fitness and attract mates. Certainly not all intelligent creatures with sexual reproduction will value exactly music, art, literature, and humor, but it seems likely they will have values that perform the equivalent functions.

@Jotaf: No, you misunderstood - guess I got double-transparent-deluded. I'm saying this: * Probability is subjectively objective * Probability is about something external and real (called truth) * Therefore you can take a belief and call it "true" or "false" without comparing it to another belief * If you don't match truth well enough (if your beliefs are too wrong), you die * So if you're still alive, you're not too stupid - you were born with a smart prior, so justified in having it * So I'm happy with probability being subjectively objective, and I don't want to change my beliefs about the lottery. If the paperclipper had stupid beliefs, it would be dead - but it doesn't, it has evil morals. * Morality is subjectively objective * Morality is about some abstract object, a computation that exists in Formalia but nowhere in the actual universe * Therefore, if you take a morality, you need another morality (possibly the same one) to assess it, rather than a nonmoral object * Even if there was some light in the sky you could test morality against, it wouldn't kill you for your morality being evil * So I don't feel on better moral ground than the paperclipper. It has human_evil morals, but I have paperclipper_evil morals - we are exactly equally horrified.

Another way of saying this is that human beings are not expected utility maximizers, not as individuals and certainly not as societies.

They are not perfect expected utility maximizers. However, no expected utility maximizer is perfect. Humans approach the ideal at least as well as other organisms. Fitness maximization is the central explanatory principle in biology - and the underlying idea is the same. The economic framework associated with utilitarianism is general, of broad applicability, and deserves considerable respect.

But there is no principled way to derive an utility function from something that is not an expected utility maximizer!

You can model any agent as in expected utility maximizer - with a few caveats about things such as uncomputability and infinitely complex functions.

You really can reverse-engineer their utility functions too - by considering them as Input-Transform-Output black boxes - and asking what expected utility maximizer would produce the observed transformation.

A utility function is like a program in a Turing-complete language. If the behaviour can be computed at all, it can be computed by a utility function.

A utility function is like a program in a Turing-complete language. If the behaviour can be computed at all, it can be computed by a utility function.

Tim, I've seen you state this before, but it's simply wrong. A utility function is not like a Turing-complete language. It imposes rather strong constraints on possible behavior.

Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.

Here's another example: When given (A,B) a program outputs "indifferent". When given (equal chance of A or B, A, B) it outputs "equal chance of A or B". This is also not allowed by EU maximization.

Wei Dai: Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.

I don't know the proper rational-choice-theory terminology, but wouldn't modeling this program just be a matter of describing the "space" of choices correctly? That is, rather than making the space of choices {A, B, C}, make it the set containing

(1) = taking A when offered A and B,
(2) = taking B when offered A and B,

(3) = taking B when offered B and C,
(4) = taking C when offered B and C,

(5) = taking C when offered C and A,
(6) = taking A when offered C and A.

Then the revealed preferences (if that's the way to put it) from your experiment would be (1) > (2), (3) > (4), and (5) > (6). Viewed this way, there is no violation of transitivity by the relation >, or at least none revealed so far. I would expect that you could always "smooth over" any transitivity-violation by making an appropriate description of the space of options. In fact, I would guess that there's a standard theory about how to do this while still keeping the description-method as useful as possible for purposes such as prediction.

Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.

That is silly - the associated utility function is the one you have just explicitly given. To rephrase:

if (senses contain (A,B)) selecting A has high utility; else
if (senses contain (B,C)) selecting B has high utility; else
if (senses contain (C,A)) selecting C has high utility;

Here's another example: When given (A,B) a program outputs "indifferent". When given (equal chance of A or B, A, B) it outputs "equal chance of A or B". This is also not allowed by EU maximization.

Again, you have just given the utility function by describing it. As for "indifference" being a problem for a maximisation algorithm - it really isn't in the context of decision theory. An agent either takes some positive action, or it doesn't. Indifference is usually modelled as lazyness - i.e. a preference for taking the path of least action.

I think Eliezer is due for congratulation here. This series is nothing short of a mammoth intellectual achievement [...]

It seems like an odd place for congratulations - since the conclusion here seems to be about 180 degrees out of whack - and hardly anyone seems to agree with it. I asked how one of the ideas here was remotely defensible. So far, there have been no takers.

If there is not even a debate, whoever is incorrect on this topic would seem to be in danger of failing to update. Of course personally, I think it is Eliezer who needs to update. I have quite a bit in common with Eliezer - and I'd like to be on the same page as him - but it is difficult to do when he insists on defending positions that I regard as poorly-conceived.

The core problem is simple. The targeting information disappears, so does the good outcome. Knowing enough to refute every fallacious remanufacturing of the value-information from nowhere, is the hard part.

The utility function of Deep Blue has 8,000 parts - and contained a lot of information. Throw all that information away, and all you really need to reconstruct Deep Blue is the knowledge that it's aim is to win games of chess. The exact details of the information in the original utility function are not recovered - but the eventual functional outcome would be much the same - a powerful chess computer.

The "targeting information" is actually a bunch of implementation details that can be effectively recreated from the goal - if that should prove to be necessary.

It is not precious information that must be preserved. If anything, attempts to preserve the 8,000 parts of Deep Blue's utility function while improving it would actually have a crippling negative effect on its future development. Similarly with human values: those are a bunch of implementation details - not the real target.

Tim and Tyrrell, do you know the axiomatic derivation of expected utility theory? If you haven't read http://cepa.newschool.edu/het/essays/uncert/vnmaxioms.htm or something equivalent, please read it first.

Yes, if you change the spaces of states and choices, maybe you can encode every possible agent as an utility function, not just those satisfying certain axioms of "rationality" (which I put in quotes because I don't necessarily agree with them), but that would be to miss the entire point of expected utility theory, which is that it is supposed to be a theory of rationality, and is supposed to rule out irrational preferences. That means using state and choice spaces where those axiomatic constraints have real world meaning.

Wei: Most people in most situations would reject the idea that the set of options presented is part of the outcome - would say that (A,B,C) is a better outcome space than the richer one Tyrrell suggested - so expected utility theory is applicable. A set of preferences can never be instrumentally irrational, but it can be unreasonable as judged by another part of your morality.

Specifically, the point of utility theory is the attempt to predict the actions of complex agents by dividing them into two layers:

1. Simple list of values
2. Complex machinery for attaining those values

The idea being that if you can't know the details of the machinery, successful prediction might be possible by plugging the values into your own equivalent machinery.

Does this work in real life? In practice it works well for simple agents, or complex agents in simple/narrow contexts. It works well for Deep Blue, or for Kasparov on the chessboard. It doesn't work for Kasparov in life. If you try to predict Kasparov's actions away from the chessboard using utility theory, it ends up as epicycles; every time you see him taking a new action you can write a corresponding clause in your model of his utility function, but the model has no particular predictive power.

In hindsight we shouldn't really have expected otherwise; simple models in general have predictive power only in simple/narrow contexts.

To expand on my categorization of values a bit more, it seems clear to me that at least some human value do not deserved to be forever etched into the utility function of a singleton. Those caused by idiosyncratic environmental characteristics like taste for salt and sugar, for example. To me, these are simply accidents of history, and I wouldn't hesitate (too much) to modify them away in myself, perhaps to be replaced by more interesting and exotic tastes.

What about reproduction? It's a value that my genes programmed into me for their own purposes, so why should I be obligated to stick with it forever?

Or consider boredom. Eventually I may become so powerful that I can easily find the globally optimal course of action for any set of goals I might have, and notice that the optimal course of action often involves repetition of some kind. Why should I retain my desire not to do the same thing over and over again, which was programmed into me by evolution back when minds had a tendency to get stuck in local optimums?

And once I finally came to that realization, I felt less ashamed of values that seemed 'provincial' - but that's another matter.

Eliezer, I wonder if this actually has more to do with your current belief that rationality equals expected utility maximization. For an expected utility maximizer, there is no distinction between 'provincial' and 'universal' values, and certainly no reason to ever feel ashamed of one's values. One just optimizes according to whatever values one happens to have. But as I argued before, human beings are not expected utility maximizers, and I don't see why we should try to emulate them, especially this aspect.

In dealing with your example, I didn't "change the space of states or choices". All I did was specify a utility function. The input states and output states were exactly as you specified them to be. The agent could see what choices were available, and then it picked one of them - according to the maximum value of the utility function I specified.

The corresponding real world example is an agent that prefers Boston to Atlanta, Chicago to Boston, and Atlanta to Chicago. I simply showed how a utility maximiser could represent such preferences. Such an agent would drive in circles - but that is not necessarily irrational behaviour.

Of course much of the value of expected utility theory arises when you use short and simple utility functions - however, if you are prepared to use more complex utility functions, there really are very few limits on what behaviours can be represented.

The possibility of using complex utility functions does not in any way negate the value of the theory for providing a model of rational economic behaviour. In economics, the utility function is pretty fixed: maximise profit, with specified risk aversion and future discounting. That specifies an ideal which real economic agents approximate. Plugging in an arbitrary utility function is simply an illegal operation in that context.

Does this work in real life? In practice it works well for simple agents, or complex agents in simple/narrow contexts. It works well for Deep Blue, or for Kasparov on the chessboard. It doesn't work for Kasparov in life. If you try to predict Kasparov's actions away from the chessboard using utility theory, it ends up as epicycles; every time you see him taking a new action you can write a corresponding clause in your model of his utility function, but the model has no particular predictive power.

Biologists model organisms as maximising a utility function all the time. The utility is most frequently referred to as "inclusive fitness".

Organisms exhibit malfunctions - which means that the predictions of the theory do not always hold true in individual circumstances - but the theory is a powerful one nontheless.

Humans can be a bit of a tricky case for the theory. You have to bear in mind that many of their "expectations" are based on the (inaccurate) premise that they are a primitive hunter-gathering cave dweller. Also, their brains are infected by memetic symbiotes and pathogens - which manipulate human behaviour for their own ends. However, nontheless, the theory is still valuable and powerful.

I don't really get where you folk who don't think expected utility theory applies to humans are coming from. Of course it applies! Yes, there are some wrinkles - but not enough to warrant discarding the whole theory!

I don't really get where you folk who don't think expected utility theory applies to humans are coming from. Of course it applies! Yes, there are some wrinkles - but not enough to warrant discarding the whole theory!

It applies as a good enough approximation for some purposes in some contexts. I don't advocate discarding it anymore than I advocate discarding the model of atoms as billiard balls. Both are useful enough to be worth having -- as long as you don't start mistaking them for ultimate truth.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31