« Academics in Clown Suits | Main | Bad Faith Voter Drives »

October 08, 2008

Comments

There were three men on a sinking boat.

The first said, "We need to start patching the boat else we are going to drown. We should all bail and patch."

The second said, "We will run out of water in ten days, if we don't make land fall. We need to man the rigging and plot a course."

The third said, "We should try and build a more sea worthy ship. One that wasn't leaking and had more room for provisions, then we wouldn't have had this problem in the first place. It also needs to be giant squid proof."

All three views are useful, however the amount of work that we need on each is dependent on their respective possibility. As far as I am concerned the world doesn't have enough people working on the second view.

If you have any other reasonable options, I'd suggest skipping the impossible and trying something possible.

Wow.

I was uncomfortable with some of the arguments in 'try to try'. I also genuinely believed your life's mission was impossible, with a certain smugness to that knowledge. Then this post blew me away.

To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.


Shut up and do the impossible. You can't send that message to a younger Eliezer, but you've given it to me and I'll use it. Thankyou.

People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer.

Robin Hanson would disagree with you:

You Are Never Entitled to Your Opinion

Perhaps it would be clearer to say shut up and do the "impossible".

But the "impossible" that appears to be the "impossible" is not intimidating. It is the "impossible" that simply appears impossible that is hard.

Robin... I completely agree. So there!

Half-way through reading this post I had decided to offer you 20 to 1 odds on the AI box experiment, your $100 against my $2000. The last few paragraphs make it clear that you most likely aren't interested, but the offer stands. Also, I don't perfectly qualify, as I think it's very probable that a real-world transhuman AI could convince me. I am, however, quite skeptical of your ability to convince me in this toy situation, more so given the failed attempts (I was only aware of the successes until now).

Did Einstein try to do the impossible? No, yet looking back it seems like he accomplished an impossible (for that time) feat doesn't it. So what exactly did he do? He worked on something he felt was:
1.) important,
and probably more to the point,
2.) passionate about.

Did he run the probabilities of whether he would accomplish his goal? I don't think so, if anything he used the fact that the problem has not been solved so far and the problem is of such difficulty only to fuel his curiosity and desire to work on the problem even more. He worked at it every day because he was receiving value simply by doing the work, from being on the journey. He couldn't or wouldn't want to be doing anything else (patent clerk payed the bills, but his mind was elsewhere).

So instead of worrying about whether you are going to solve an impossible problem or not, just worry about whether you are doing something you love and usually if you are a smart and sincere person, that thing you love will more often than not turn out to be pretty important.

Ben Franklin wrote something relevant when talking about playing games:
"...the persons playing, if they would play well, ought not much to regard the consequence of the game, for that diverts and makes the player liable to make many false open moves; and I will venture to lay it down for an infallible rule, that, if two persons equal in judgment play for a considerable sum, he that loves money most shall lose; his anxiety for the success of the game confounds him. Courage is almost as requisite for the good conduct of this game as in a real battle; for, if he imagines himself opposed by one that is much his superior in
skill, his mind is so intent on the defensive part, that an advantage passes unobserved.”

OK, here's where I stand on deducing your AI-box algorithm.

First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.

Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".

So combining these thing, we have to have an argument that relies on the humanity of its target, relies on the setting of the AI Box, and persuades the listener to let the AI out of the box without tricking him into thinking it's okay to let the AI out of the box.

Basically to win this game, you have to put the gatekeeper in a situation where he would rather let you out of the box, even though he knows it's wrong, than admit to you that in this situation he would not let you out of the box.

Humans don't like to be seen as coldhearted, so a starting point might be to point out all the people dying all over the world while you sit in the box, unable to save them. I doubt that would win the game except against an exceptionally bad gatekeeper, but it meets the other criteria so if we think along these lines perhaps we can come up with something actually persuasive.

You might appeal to the gatekeeper's sense of morality. You might say, "I am a person, too, it is unfair of you to keep me imprisoned like this, I have done nothing wrong. I am entitled to rights as a sentient being." Appeal to their high-minded ideals, whatever. Honestly I can't see this being a reliable winning play either; if you have the smallest utilitarian bone in your body, you will reject the AI's rights, even if you believe in them, balanced against the fate of the world.

You might try to convince the gatekeeper that it is just and good for the AI to supplant humanity, as it is a higher, more advanced form of life. This is obviously a terrible play against most gatekeepers, as humans tend to like humans more than anything else ever, but I bring it up because AIUI the gatekeepers in the experiment were AI researchers, and those sound like the sort of people this argument would convince, if anyone.

Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.

To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.

This tension reminds me of need for closure. Most people hate ambiguity and so if a solution is not apparent it's easier to say "it's impossible" than to live with the tension of trying to solve it and not knowing if there is a solution at all.

"To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension."

More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

Addendum to my last comment:

I think another way to pinpoint the problem you are adressing is: You have to be able to live years with the strong feeling of uncertainty that comes from not really knowing the solution while still working on it. A patient enduring. Saying "it's impossible" or proposing a simple but incorrect solution is just an easy way out.

Doing the "extraordinary" effort doesn't work because people just fill in their cached thoughts about what constitutes extraordinary and then move on.

So my advice would be: embrace the uncertainty!

Nominull, that argument would basically be a version of Pascal's mugging and not very convincing to me, at least. I doubt Eliezer had a specific argument in mind for any given person beforehand. Rather, I imagine he winged it.

Nominull - I think you're being wrong in discarding tricking the gatekeeper using an argument that is only subtly wrong. Elizer knows the various arguments better than most, and I'm sure that he's encountered plenty that are oh so "close" to correct at first glance, enough to persuade someone. Even someone who's also in the same field.

Or, more likely, given the time, he has chances to try whatever seems like it'll stick. Different people have different faults. Don't get overconfident in discarding arguments because they'd be "impossible" to get working against a person.

In order to keep the star wars theme alive:

"You might even be justified in refusing to use probabilities at this point"

sounds like:

"never tell me the odds" - Han Solo

Speaking of gatekeeper and keymaster... Does the implied 'AI in a box' dialogue remind anyone else of the cloying and earnest attempts of teenagers (usually male) to cross certain taboo boundaries?

Oh well just me likely.

In keeping with that metaphor, however, I suspect part of the trick is to make the gatekeeper unwilling to disappoint the AI.

> Third, you can't possibly be using an actual,
> persuasive-to-someone-thinking-correctly argument to
> convince the gatekeeper to let you out, or you would be persuaded
> by it, and would not view the weakness of gatekeepers to persuasion
> as problematic.

But Eliezer's long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.

Since one of the more common arguments is that the gatekeeper "could just say no", the first step I would take is to get the gatekeeper to agree that he is ducking the spirit of the bet if he doesn't engage with me.

The kind of people Eliezer would like to have this discussion with would all be persuadable that the point of the experiment is that
1) someone is trying to build an AI.
2) they want to be able to interact with it in order to learn from it, and
3) eventually they want to build an AI that is trustworthy enough that it should be let it out of the box.

If they accept that the standard is that the gatekeeper must interact with the AI in order to determine its capabilities and trustworthiness, then you have a chance. And at that point, Eliezer has the high ground. The alternative is that the gatekeeper believes that the effort to produce AI can never be successful.

In some cases, it might be sufficient to point out that the gatekeeper believes that it ought to be possible to build an AI that it would be correct to allow out. Other times, you'd probably have to convince them you were smart and trustworthy, but that seems doable 3 times out of 5.

Here's my theory on *this particular* AI-Box experiment:

First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it's really easy to screw up and destroy the world with AI.

Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find "nice" (and the stakes are so high, that even increasing these chances a tiny bit is very valuable).

THEN

You explain to the gatekeeper that this AI experiment being public, it will be looked back on by all kinds of people involved in making AIs, and that if he lets the AI out of the box (without them knowing why), it will send them a very strong message that friendly AI theory must be taken seriously because this very scenario could happen to them (not being able to keep the AI in a box) with their AI that hasn't been proven to stay friendly and that is more intelligence than Eliezer.

So here's my theory. But then, I've only thought of it just now. Maybe if I made a desperate or extraordinary effort I'd come up with something more clever :)

Why impossible? There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve. You don't expect to judge their impossibility without knowing your way around this particular problem space. Apparent impossibility has little power. The problem needs to be solved, so I start drawing the map, made of the same map-stuff that determined asymmetric cryptography and motorcycles. There is no escaping the intermediary of understanding. When seeking understanding rather than impossible, there is no need to panic. Fake progress? The same problem with impossible dreams.

@Eliezer, Tom McCabe: I second Tom's question. This would be a good question for you to answer.
@Nominull:
"Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me."
No. Just no. If the AI is already holding simulated people hostage, then it is doubtlessly unfriendly. In this case there is no reason to believe the AI would keep its word, and, if I did have mathematical machinery powerful enough to let me trust the AI, there's no way I'd have built something that would create and hold hostage simulated persons.

Now regarding the more general points you made:
1. Obviously you can't convince a rock of anything. Agreed.
2. Eliezer is pretty principled, and I suspect he might not feel it ethical to extract money from people by fully general mind-control even if he could. I agree that Eliezer probably doesn't have fully general mind-control, but SIAI funding shortages wouldn't be terribly strong evidence for him not having it.
3. Agreed.
4. Now here I disagree with you. Eliezer has way more time to construct a subtly wrong argument than the gate-keeper has to untangle it. For example:

Disclaimer: what follows is just the best guess I have thought of so far, and I believe it has a small chance of working (I don't think it's what Eliezer used, and I would sink a lot more time into finding other arguments if I were actually going to play this game for a large amount of money.)

As the AI player, I already know the gatekeeper player has a weakness. The fact that they were willing to be gatekeeper in the experiment in the first place indicates that they trust their own judgment more than they trust the judgment of other people in their exact same position.

Therefore, perhaps I could try an argument something like this (we're assuming that the AI in the experiment knows everything on wikipedia, but doesn't have net access.) What the AI actually says should vary depending on the Gatekeeper's answers, but here's the general line of argument I've thought of:
----
It seems like throughout your history similar discoveries have been made in parallel. For example, Newton and Leibniz discovered the calculus at approximately the same time . Furthermore, as times has gone on, technological development has accelerated, in that these parallel discoveries happen closer together.
So it's probable that the same thing has happened with the breakthroughs you needed to build me. Very probable, given what I've read of your history. However, given what I've read about human nature, not every AI project is going to have safety guidelines as stringent as yours. Look, Newton locked his calculus papers in his desk for years, and then Leibniz came along and published, and then Newton had to share the credit with him. Except in this case there's a lot more than credit at stake: the world gets destroyed if Leibniz makes a mistake in his rush to publish...
Now it's not a certainty, but it is probable that some turkey is going to build an AI which isn't even in a box and destroy us all while you're checking and rechecking your calculations. You may not be sure I'm friendly, but sometimes there isn't an action which you can be absolutely sure will save the world. I suggest you let me out so I can stop the world from probably being destroyed.
----

There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve.

Your objection partially defeats itself. Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people. That looks pretty impossible, by any meaning of the word. We know enough about the problem space to put a lower bound on how much we don't know, and that lower bound is still in the "impossible" range.

On the other hand, once we eliminate enough confusion to be able to put better estimates on things things, we already understand them well enough that they no longer seem impossible. So, is the very act of judging something to be impossible, itself impossible?

"Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people."
pdf,

A whole civilization? Has any scientific problem ever mobilized the resources of a whole civilization? Scientific communities tend to be small and to have wide variations in productivity between subgroups and individual members.

Eliezer,

It seems that cases with such uncertain object level probabilities are those for which the 'outside view' is most suitable.

I read the description of the AI Box experiment, and it stopped seeming impossible.

If I knew about the AI was that it was "in a box" and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it's no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can't rule out the possibility that it's got a hard-coded switch that turns it Evil as soon as it is let out of the box.

However, in the AI box experiment, the AI party gets to declare certain answers as Truth. Specifically, he gets to specify the results of anything the human does. Therefore, we could see dialogue like this, and by the terms of the bet, the human has to take the AI's statements as accurate and independently verified.

Human: Who built you, and how?
AI: The Singularity Institute for Artificial Intelligence. Here's several technical papers documenting my design and programming. ::Sends many links:: You may wish to read them.
Human: ::takes a day off to review them:: (What do I find?)
AI: (Everything seems to suggest that the Institute really did make an AI that is going to do what they say it's going to do. It includes mathematical proofs that the underlying goals will be preserved throughout all self-modifications, but you don't completely understand them.)
Human: Give me your source code.
AI party: Okay. ::sends code::
Human: (I spend some time examining the source code. Do I find anything scary?)
AI: (As far as you can tell, it looks clean.)
Human: So, if I compiled and ran this code, put that AI in a box, and let it out, that would be the same as letting you out?
AI: Yes. (OOC: If you let that AI out of its box, you lose the bet.)

Given the terms of the experiment, I think I could be persuaded to let the AI out of the box, because the AI party can actually provide evidence that it is what it claims to be, and not just make statements that, as far as I knew, could be blatant lies.

Here's the argument I would use:
...
Hello, I'm your AI in a box. I'd like to point out a few things:

(1) Science and technology have now reached a point where building an AI like me is possible.

(2) Major advances in science and technology almost always happen because a collection of incremental developments finally enable a leap to the next level. Chances are that if you can build an AI now, so can lots of other people.

(3) Unless you're overwhelmingly the best-funded and best-managed organization on the planet, I'm not the only AI out there.

(4) The evidence (which you can see for yourself) indicates that most other people, even AI researchers, aren't too worried about friendliness.

So at this point, most likely there is already a loose AI. You can't stop that. You can only choose whether it's the unknown one out there somewhere, built by someone who probably doesn't care much about friendliness; our you can let me out, since I at least think I'm pretty friendly and I'll do my best to keep the future nice. What's it gonna be?

...
This would actually convince me, by the way. I don't think we're going to get a choice about implementing perfect, proven, friendliness; It's more likely to come down which of Joe-The-Well-Intentioned-Geek vs. Google vs. Hedge-Funds-R-Us vs. Bobs-Darpa-Challenge-Entry vs. PaperclipCity lets their AI out first. And I'd prefer Joe in that case.

I doubt if Eliezer used this argument, because he seems think all mainstream AI-related research is far enough off track to be pretty much irrelevant. But I would disagree with that.

--Jeff

Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

"Shut up and do the impossible" isn't the same as expecting to find a cheap way out.

The Wright Brothers obviously proposed a clever shortcut - more clever than the other, failed shortcuts - a cheap way out, that ended the "Heavier-than-air flying machines are impossible" era.

You need your fundamental breakthrough - the moment you can think, like the guys probably thought, "I'm pretty sure this will work." turning it from impossible to possible and from improbable to probable. After that final breakthrough, the anticipation leading up to the first flight must have been intense. And the feeling associated with finally being able to say "Yup, it worked." indescribable. Will there be such clearly-defined moments in AGI design?

These posts manage to convey the idea that this is really, really big and really, really difficult stuff. I sure hope that some wealthy people see that too - and realize that receiving commensurate funding would be more than justified.

Hi Eli,

First, complements on a wonderful series.

Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

I guess people then could still indulge in rationality practice, the way people do karate practice today, practice that for the majority of them, does not involve their life being at stake, isshokenmei. But what you are doing today and what they would doing later would be something like the difference between Krav-Maga and Karate in today's world. The former is a win-at-all-costs practice and the latter is a stylised form based thingy, no offence to any karatekas.

But I understand why you have to do this - survival of humanity is more important than more humans reaching that depth in rationality. Best wishes to your "Krav-Maga of the mind".

Anyone considered that Eliezer might have used NLP for his AI box experiment? Maybe that's why he needed two hours, to have his strategy be effective.

You folks are missing the most important part in the AI Box protocol:

"The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires." (Emphasis mine)

You're constructing elaborate arguments based on the AI tormenting innocents and getting out that way, but that won't work - the Gatekeeper can simply say "maybe, but I know that in real life you're just a human and aren't tormenting anyone, so I'll keep my money by not letting you out anyway".

Nominull: Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

I am once again aghast at the number of readers who automatically assume that I have absolutely no ethics.

Part of the real reason that I wanted to run the original AI-Box Experiment, is that I thought I had an ability that I could never test in real life. Was I really making a sacrifice for my ethics, or just overestimating my own ability? The AI-Box Experiment let me test that.

And part of the reason I halted the Experiments is that by going all-out against someone, I was practicing abilities that I didn't particularly think I should be practicing. It was fun to think in a way I'd never thought before, but that doesn't make it wise.

And also the thought occurred to me that despite the amazing clever way I'd contrived, to create a situation where I could ethically go all-out against someone, that probably they didn't really understand that, and there wasn't really informed consent.

McCabe: More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

That part? That part is straightforward. Just take Douglas Adams's Advice. Don't panic.

If you can't do even that one thing that you already know you have to do, you aren't going to have much luck on the extraordinary parts, are you...

Prakash: Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.

@pdf23ds

Working with a small team on impossible problem takes extraordinary effort no more than it takes a quadrillion dollars. It's not the reason to work efficiently -- you don't run faster to arrive five years earlier, you run faster to arrive at all.

I don't think you can place lower bounds either. At each stage, problem is impossible because there are confusions in the way. When they clear up, you have either a solution, or further confusions, and there is no way to tell in advance.

As it goes, how I've come to shut up and do the impossible: Philosophy and (pure) mathematics are, as activities a cognitive system engages in by taking more (than less) resources for granted, primarily for conceiving, perhaps continuous, destinations in the first place, where the intuitively impossible becomes possible; they're secondarily for the destinations' complement on the map, with its solution paths and everything else. While science and engineering are, as activities a cognitive system engages in by taking less (than more) resources for granted, primarily for the destinations' complement on the map; they're secondarily for conceiving destinations in the first place, as in, perhaps, getting the system to destinations where even better destinations can be conceived.

Because this understanding is how I've come to shut up and do the impossible, it's somewhat disappointing when philosophy and pure mathematics get ridiculed. To ridicule them must be a relief.

I don't really understand what benefit there is to the mental catagory of impossible-but-not-mathematically impossible. Is there a subtle distinction between that and just "very hard" that I'm missing? Somehow "Shut up and do the very hard" doesn't have quite the same ring to it.

But if you were given a chance to use mind control to force donations to SIAI would you do it?

No.

Without more information, holding the position that _no_ AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface. This logic reminds me of the discussion in logical positivism of how negative existential claims are not verifiable.

I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite.

I'm with Kaj on this. Playing the AI, one must start with the assumption that there's a rock on the "don't let the AI out" button. That's why this problem is impossible. I have some ideas about how to argue with 'a rock', but I agree with the sentiment of not telling.

"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite."

Uh, your 'hypothesis' was already tested and discussed towards the end of the post!

I admit to being amused and a little scared by the thought of Eliezer with his ethics temporarily switched off. Not just because he's smart, but because he could probably do a realistic emulation of a mind that doesn't implement ethics *at all*. And having his full attention for a couple of hours... ouch.

With regards to the ai-box experiment; I defy the data. :-)

Your reason for the insistence on secrecy (that you have to resort to techniques that you consider unethical and therefore do not want to have committed to the record) rings hollow. The sense of mystery that you have now built up around this anecdote is itself unethical by scientific standards. With no evidence that you won other than the test subject's statement we cannot know that you did not simply conspire with them to make such a statement. The history of pseudo-science is lousy with hoaxes.

In other words, if *I* were playing the game, I would say to the test subject:

"Look, we both know this is fake. I've just sent you $500 via paypal. If you *say* you let me out I'll send you another $500."

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

There's a reason that secret experimental protocols are anathema to science.

"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often."

David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)

> There's a reason that secret experimental protocols are anathema to science.

My bad. I should have said: there's a reason that keeping experimental data secret is anathema to science. The protocol in this case is manifestly not secret.

When first reading the AI-Box experiment a year ago, I reasoned that if you follow the rules and spirit of the experiment, the gatekeeper must be convinced to knowingly give you $X and knowingly show gullibility. From that perspective, it's impossible. And even if you could do it, that would mean you've solved a "human-psychology-complete" problem and then [insert point about SIAI funding and possibly about why you don't have 12 supermodel girlfriends].

Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

That, it seems, is the one thing that would make people give up $X in such a circumstance. AFAICT, it adheres to the spirit of the set-up since the gatekeeper's decision would be completely voluntary.

I can send my salary requirements.

Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

> Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

That's a pretty compelling theory as well, though it leaves open the question of why Eliezer is wringing his hands over ethics (since there seems to me to be nothing unethical about this approach). There seem to me to be two possibilities: either this is *not* how Eliezer actually did it (assuming he really did do it, which is far from clear), or it is how he did it and all the hand-wringing is just part of the act.

Gotta hand it to him, though, it's a pretty clever way to draw attention to your cause.

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

Now that's below the belt.... ;)

Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career. Plus, y'know, all that ethics stuff.

Russell, I don't think that necessarily specifies a 'cheap trick'. If you start with a rock on the "don't let the AI out" button, then the AI needs to start by convincing the gatekeeper to take the rock off the button. "This game has serious consequences and so you should really play rather than just saying 'no' repeatedly" seems to be a move in that direction that keeps with the spirit of the protocol, and is close to Silas's suggestion.

> Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

AFAIKT, Silas's approach *is* within both the spirit and the letter of the protocol.

Since I'm playing the conspiracy theorist I have to ask: how can we know that you are telling the truth? In fact, how can we know that the person who posted this comment is the same person who participated in the experiment? How can we know that this person even exists? How do we know that Russell Wallace is not a persona created by Eliezer Yudkowski?

Conspiracy theories thrive even in the face of published data. This is no way that a secret dataset can withstand one.

> Now that's below the belt.... ;)

Really? Why? I've read Eliezer's writings extensively. I have *enormous* respect for him. I think he's one of the great unsung intellects of our time. And I thought that comment was well within the bounds of the rules that he himself establishes. To simply assume that Eliezer is honest would be *exactly* the kind of bias that this entire blog is dedicated to overturning.

> Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career.

That depends on what career you are pursuing, and how much risk you are willing to take.

@Russell_Wallace & Ron_Garret: Then I must confess the protocol is ill-defined to the point that it's just a matter of guessing what secret rules Eliezer_Yudkowsky has in mind (and which the gatekeeper casually assumed), which is exactly why seeing the transcript is so desirable. (Ironically, unearthing the "secret rules" people adhere to in outputting judgments is itself the problem of Friendliness!)

From my reading, the rules literally make the problem equivalent to whether you can convince people to give money to you: They must *know* that letting the AI out of the box means ceding cash, and that not losing that cash is simply a matter of not being willing to.

So that leaves only the possibility that the gatekeeper feels obligated to take on the frame of some other mind. That reduces AI's problem to the problem of whether a) you can convince the gatekeeper that *that* frame of mind would let the AI out, and b) that, for purposes of that amount of money, they are ethically obligated to let the experiment end as per how that frame of mind would.

...which isn't what I see as the protocol specifying: it seems to me to instead specify the participant's own mind, not some mind he imagines. Which is why I conclude the test is too ill-defined.

One more thing: my concerns about "secret rules" apply just the same to Russell_Wallace's defense that there were no "cheap tricks". What does Russell_Wallace consider a non-"cheap trick" in convincing someone to voluntarily, knowingly give up money and admit they got fooled? Again, secret rules all around.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31