« Friendliness Factors | Main | All Are Skill Unaware »

November 17, 2008

Comments

It was necessary for people doing AI to disassociate themselves from previous attempts at doing AI to get funding (see the various AI winters), as it came into disrepute for promising too much. Hence terms like GOFAI and the connectionist/logical dichotomy.

You are lucky not to be on that treadmill. Sadly nowadays you have to market your speculative research to be successful.

"So I'm just mentioning this little historical note about the timescale of mathematical progress, to emphasize that all the people who say "AI is 30 years away so we don't need to worry about Friendliness theory yet" have moldy jello in their skulls."

It took 17 years to go from perceptrons to back propagation...

... therefore I have moldy Jell-O in my skull for saying we won't go from manually debugging buffer overruns to superintelligent AI within 30 years...

Eliezer, your logic circuits need debugging ;-)

(Unless the comment was directed at, not claims of "not less than 30 years", but specific claims of "30 years, neither more nor less" -- in which case I have no disagreement.)

I'd be interested in an essay about "the nonobvious difficulty of doing math".

Russell, I think the point is we can't expect Friendliness theory to take less than 30 years.

"Russell, I think the point is we can't expect Friendliness theory to take less than 30 years."

If so, then fair enough -- I certainly don't claim it will take less.

>It took 17 years to go from perceptrons to back propagation...

>... therefore I have moldy Jell-O in my skull for saying we won't go from manually debugging buffer overruns to superintelligent AI within 30 years...

If you'd asked me in 1995 how many people it would take for the world to develop a fast, distributed system for moving films and TV episodes to people's homes on an 'when you want it, how you want it' basis, internationally, without ads, I'd have said hundreds of thousands. In practice it took one guy with the right algorithm, depending on whether you pick napster or bittorrent as the magic that solves the problem without the need for any new physical technologies.

The thing about self-improving AI, is that we only need to get the algorithm right (or wrong :-() once.

We know with probability 1 it's possible to create self-improving intelligence. After all, that's what most humans are. No doubt other solutions exist. If we can find an algorithm or heuristic to implement any one of these solutions, or if we can even find any predecessor of any one of them, then we're off - and given the right approach (be that algorithm , machine, heuristic, or whatever) it should be simply a matter of throwing computer power (or moore's law) at it to speed up the rate of self-improvement. Heck, for all I know it could be a giant genetically engineered brain in a jar that cracks the problem.

Put it this way. Imagine you are a parasite. For x billion years you're happy, then some organism comes up with sexual reproduction and suddenly it's a nightmare. But eventually you catch up again. Then suddenly, in just 100 years, human society basically eradicates you completely out of the blue. The first 50 years of that century are bad. The next 20 are hideous. The next 10 are awful. The next 5 are disastrous... etc.

Similarly useful powerplant-scale nuclear fusion has always been 30 years away. But at some point, I suspect it will suddenly be only 2 years away, completely out of the blue....

"If you'd asked me in 1995 how many people it would take for the world to develop a fast, distributed system for moving films and TV episodes to people's homes on an 'when you want it, how you want it' basis, internationally, without ads, I'd have said hundreds of thousands."

And you'd have been right. (Ever try running Bit Torrent on a 9600 bps modem? Me neither. There's a reason for that.)

> And you'd have been right. (Ever try running Bit Torrent on a 9600 bps modem? Me neither. There's a reason for that.)

Not sure I see your point. All the high speed connections were built long before bittorrent came along, and they were being used for idiotic point-to-point centralised transfers.

All that potential was achieving not much, before the existance of the right algorithm or approach to exploit it. I suspect a strong analogy here with future AI.

"Not sure I see your point. All the high speed connections were built long before bittorrent came along, and they were being used for idiotic point-to-point centralised transfers."

No they weren't. The days of Napster and Bit Torrent were, by no coincidence, also the days when Internet speed was in the process of ramping up enough to make them useful.

But of course, the reason we all heard of Napster wasn't that it was the first peer-to-peer data sharing system. On the contrary, we heard of it because it came so late that by the time it arrived, the infrastructure to make it useful was actually being built. Ever heard of UUCP? Few have. That's because in its day -- the 70s and 80s -- the infrastructure was by and large not there yet.

A clever algorithm, or even a clever implementation thereof, is only one small piece of a real-world solution. If we want to build useful AGI systems -- or so much as a useful Sunday market stall -- our plans must be built around that fact.

On the one hand, Eliezer is right in terms of historical and technical specifics.

On the other hand neural networks for many are a metoynym for continuous computations vs. the discrete computations of logic. This was my reaction when the two PDP volumes came out in the 80s. It wasn't "Here's the Way." It was "Here's and example of how to do things differently that will certainly work better."

Note also that the GOFAI folks were not trying to use just one point in logic space. In the 70s we already knew that monotonic logic was not good enough (due to the frame problem among other things) so there was an active exploration of different types of non-monotonic logic. That's in addition to all the modal logics, etc.

So the dichotomy Eliezer refers to should be viewed as more of a hyperplane separator in intelligence model space. From that point of view I think it is fairly valid -- the subspace of logical approaches is pretty separate from the subspace of continuous approaches, though Detlef and maybe others have shown you can build bridges.

The two approaches were even more separate culturally at the time. AI researchers didn't learn or use continuous mathematics, and didn't want to see it in their papers. That probably has something to do with the 17 years. Human brains and human social groups aren't very good vehicles for this kind of search.

So yes, treating this as distinction between sharp points is wrong. But treating it as a description of a big cultural transition is right.

[comment deleted]

Perhaps Eliezer goes to too many cocktail parties:

X: "Do you build neural networks or expert systems?"
E: "I don't build anything. Mostly I whine about people who do."
X: "Hmm. Does that pay well?"

Perhaps Bayesian Networks are the hot new delicious lemon glazing. Of course they have been around for 23 years.

Well if the AGI field had real proof that AGI was possible sure. The problem is the proof for AGI is in the doing and the fact that you think its possible is baseless belief. Just because a person can do it does not mean a computer can.

Reduction to QED

The question of AGI is an open question and there is no way silence the opposition logically until and AGI is created, something you won't be doing.

Aside: I know the Quantum Physics Sequence was sort of about this, and the inside vs. outside view argument is closely related, but I wouldn't mind seeing more discussion of the specific phenomenon of demanding experimental evidence while ignoring rational argument. Also, this is at least the second time I've seen someone arguing against AGI, not on the object level or a standard meta level, but by saying what sounds like I won't believe you until you can convince everyone else. What does it matter that the opposition can't be silenced, except insofar as the opposition has good arguments?

[comment deleted - all, please don't respond to obvious trolls]

IgnoranceNeverPays: "This is a common thing among people who know enough that they think they know something but don't actually know enough to really know something."

Can you say that really really fast?

It *wasn't* seventeen years. It was five years. See http://www.citeulike.com/user/napvasconcelos/article/3396716

The reason NN aren't considered Lemon Glazing is because they are an approach towards at least one of the known models that does produce intelligence- you. Of course, backprop and self organizing maps are far less complicated than the events in a single rat neuron. Of course, computer simulations of neurons and neural networks are based upon the purely logical framework of the CPU and RAM. Of course, the recognized logical states of all of those hardware components are abstractions laid upon a complex physical system. I don't quite say irreducible complexity, but, at the least, immense complexity at some level is required to produce what we recognize as intelligence.

Mr. Art, I get 1974 - 1957 = 17. Does the referenced book (in contrast to its abstract) give an invention date other than 1974? Who and when?

Ah, I see. I assumed you meant 17 years from 'Perceptrons' - the 1969 book that pointed out the problem. Failed pedantry on my part!

Umm, It looks like he did not read the book "Perceptrons," because he repeats a lot of misinformation from others who also did not read it.

1. First, none of the theorems in that book are changed or 'refuted' by the use of back-propagation. This is because almost all the book is about whether, in various kinds of connectionist networks, there exist *any* sets of coefficients to enable the net to recognize various kinds of patterns.
2. Anyway, because BP is essentially a gradient climbing process, it has all the consequent problems -- such as
getting stuck on local peaks.
3. Those who read the book will see (on page 56) that we did not simple show that the "parity function" is not linearly separable. What we showed is that for a perceptron (with one layer of weighted-threshold neurons that are all connected to a single weighted-threshold output cell), there must be many neuron, each of which have inputs from every point in the retina!
4. That result is fairly trivial. However, in chapter 9 proves a much deeper limitation: such networks cannot recognize *any topological features* of a pattern unless either there is one all-seeing neuron that does it, or exponentially many cells with smaller input sets.

A good example of this is: try to make a neural network that looks at a large two-dimensional retina, and decides wither the image contain more than one connected set. That is, whether it is seeing just one object, or more than one object. I don't yet have a decent proof of this (and I'd very much like to see one) but it is clear from the methods in the book that even a multilayer neural network cannot recognize such patterns--unless the number of layers is of the order of the number of points in the retina!!!! This is because a loop-free network cannot do the needed recursion.

5. The popular rumor is that these limitations are overcome by making networks with more layers. And in fact, networks with more layers can recognize more patterns, but at an exponentially high price in complexity. (One can make networks with loops that can compute some topological features. However, there is no reason to suspect that back-propagation will work on such networks.)

The writer has been sucked in by propaganda. Yes, neural nets with back-propagation can recognize many useful patterns, indeed, but cannot learn to recognize many other important ones—such as whether two different things in a picture share various common features, etc.

Now, you readers should ask about why you have not heard about such problems! Here is the simple,incredible answer: In Physics, if you show that a certain popular theory cannot explain important phenomenon, you're likely to win a Nobel Prize, as when Yang and Lee showed that the standard theory could not explain a certain violation of Parity.
Whereas, in the Connectionist Community, if your network cannot recognize a certain type of pattern, you'll simply refrain from announcing this fact, and pretend that nothing has happened--perhaps because you fear that your investors will withdraw their support.
So yes, you can indeed connectionist networks that learn which movies a citizen is likely to like, and yes, that can make you some money. And if your robot connectionist robot can't count, so what! Just find a different customer!

But the real cleverness is in how neural networks were marketed. They left out the math.

Not entirely true, my recollection is the PDP book had lots of maths in it.

I didn't say Perceptrons (the book) was in any way invalidated by backprop. Perceptrons cannot, in fact, learn to recognize XOR. The proof of this is both correct and obvious; and moreover, does not need to be extended to multilayer Perceptrons because multilayer linear = linear.

That no one's trained a backprop-type system to distinguish connected from unconnected surfaces (in general) is, if true, not too surprising; the space of "connected" versus "unconnected" would cover an incredible number and variety of possible figures, and offhand there doesn't seem to be a very good match between that global property and the kind of local features detected by most neural nets.

I'm no fan of neurons; this may be clearer from other posts.

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31