« Beware Hockey Stick Plans | Main | Gas Arbitrage »

December 04, 2008


Don't be seduced by the fantasies of the formalists.

Cryptography has some of the hardest-math formalized guarantees that we know of, but it hasn't solved the problem of data breaches. That's because of the human factor.

Software might be "guaranteed" not to lose data, but the hardware running the software is incredibly leaky: the RF transmissions of the CPU bus, the display, the keyboard. The abstractions are leaky too: processor and hard disk access contention, and memory pressure, all betray the operations of running process. These things can be used to snoop on machines, either from a distance or from another process running on the same machine, even if the software is as sound as it can ever be.

More economically speaking, the human and business costs of rigour and opportunities foregone are too large for any but the most naive to think that software will even begin to approach full correctness. There aren't enough highly-trained mathematically-oriented solution implementors in the world to solve all the business problems that need solving, and there never will be. Similarly, it would be highly inefficient for businesses to employ them, when they could be far more productive by investing that capital elsewhere, and living with only moderately correct software.

The problem in writing software is in specification, not implementation, since the implementation is the specification at the minimally sufficient level of detail. Formal methods focus on proving implementations correct with respect to specifications, but those specifications are necessarily more abstract than the only real specification, the source code, and thus they are focusing on the wrong problem.


I haven't yet read the whole Drexler post yet, but I'll just free-jazz it for now:

Isn't the point that if you can *prove* that it is possible, then a superior intelligence could potentially do it? Even if *we* can't do it now, it makes a difference to know that it is possible, no?

Michael, formal methods is a computer science discipline that is composed of a number of parts: specification, development and verification.

The problem is that verification requires a specification, and a specification that is complete enough to be correct is also complete enough to be automatically implemented through stepwise transformation (though it may be inefficient).

The costs are in the specification, but these same costs are in systems that don't use formal methods - implementation is almost trivial given a complete specification. As they say, the devil is in the details; if the specification is to be sufficient, it must include all necessary details.

This is related to my biggest problem with Eliezer's FAI program. Modern computer programs are too complex to be provably true. I have read a lot of Djikstra's less technical work, while he advocated (continually) proving programs correct, he never actaully came up with a formal system that would work on a real computer system. And if current systems are too complex, how much more so is it likely to be with AI. I do think that formalizing analysis and design has promise to streamlining and improving systems that are increasingly well understood, but only as they become better understood - large systems and newer systems I think will always lag in being provable.

Barry Kelly doesn't go far enough. I'd sidestep the difficulties in providing formal specifications for a wide variety of methods for a moment. Formal methods rely on assumptions that aren't valid.

Consider Peano's Axiom numbers 6 and 7:
6: For every natural number n, S(n) is a natural number.
7: For every natural number n, S(n) ≠ 0. That is, there is no natural number whose successor is 0.

At least one of these is provably false in every programming language. When N is a max value, S(N) either throws an error, returns a non-natural number or value, or returns a negative or 0 result. There are no computers that can represent every natural number. They'll break and violate either #6 or #7 in the process.

Formal methods would go a long way to ensuring stability and security, but they're not a magic bullet.

Kelly Barry, your example is very poorly chosen. No programmer worthy of the name believes that a 32-bit unsigned int in C (for example) is supposed to be a natural number obeying the Peano axioms, so this is hardly a case of failing to implement a specification correctly. If you don't want integers modulo 2^32 or whatever, most languages have some form of memory-bounded integral type, but nobody thinks those are supposed to represent the natural numbers either, since memory is exhaustible and the natural numbers are not.

Barry Kelly:

Formal methods focus on proving implementations correct with respect to specifications, but those specifications are necessarily more abstract than the only real specification, the source code, and thus they are focusing on the wrong problem.

Except, a "more abstract" specification is preferable, as long as it can be reliably translated into a correct implementation. The reverse direction (implementation -> specification) isn't the point. I doubt many people find it productive to translate assembly language into Python.

If, given a formal specification of a problem, you can produce a correct implementation, then you are a successful compiler. Do you believe there exist specifications for which humans can produce a correct encoding, but computers cannot? If not, that appears inconsistent with your view that formal methods are focusing on the wrong problem.

I agree with Barry Kelly that this is likely fantasy.

The mud that we're building on isn't the mud of bad software. It's of incomplete and evolving understanding, both of what is and what we want to do.

Drexler mentions compilers and microprocessors as something that he hopes we can formally prove. That would make no effective difference to average users, as compilers and microprocessors are already sufficiently reliable. My computer already doesn't crash and doesn't get infected, because I'm not using a for-profit operating system. Windows is shaky because Microsoft is not there to make software that they're happy with; they're just making the software that they can get you to buy.

As far as I've seen, formal proof of software is hoping to polish a few very shiny pebbles that they've picked off the continent-spanning mountain range of the software that people actually need and use. They're welcome to dream their dreams, but I don't expect them to make much practical difference this century.

It is worth noting that the most recent method to overcome the software correctness problem is UNIT TESTING.

In that, you basically create specification as testing procedure, then write the final code and use testing procedure to verify it.

There are many advantages to this process, but the best part IMO is that the test is bidirectional. You can have bugs in testing procedure or in the final code, but very likely it is not the SAME bug. So both tend to be debugged at once.

Also, you can have more than one testing procedure and, most importantly, testing procedure can be written by somebody else.

This BTW might have interesting implications to AI as well. One thing comes to mind is to create testing procedure only and develop some sort of algorithm capable of building an algorithm to match the testing procedure :) Or use Cyc to create testing procedure and create some sort of network of values and algorithm to process them in a feedback loop to match Cyc's database.

Last time I paid any attention to formal verification of computer programs was in the 1980s. This was basically what I saw:

1. Person decides to implement a way of proving the correctness of computer programs.

2. Person develops grammar for re-expressing code written in Pascal (since these were academics in the 1980s) in their formalism.

3. Person expands their formalism until it covers nearly all of Pascal.

4. Person has then re-developed Prolog without extralogicals.

5. People trying to use Person's formalism rediscover why Prolog is unusable without extralogicals.

If you want formally verifiable programs, the only solution is to program using some variation on Prolog without extralogicals. Have fun.

loqi is completely correct. Formal software verification is in widespread use already, under the name of "type safety": most type systems in common use can only prevent certain limited classes of implementation errors, but others are sophisticated enough to encode a formal specification and verify that its implementation is correct.

There is even a well-known correspondence between strongly typed algorithms and constructive formal proofs: correctly implementing a formal spec is equivalent to providing a constructive proof of the corresponding theorem.

anon raises a good point. The irony, of course, is that current trends in programming languages are often moving away from strong/static/safe type systems, towards weaker type systems in languages such as Python.

Another factor that can make programs dramatically easier to formally reason about is maintaining referential transparency, but most current languages pay little attention to this issue.

A language and environment supporting high-level abstractions, strong type safety, and absolute referential transparency would allow for a lot more formal verification than languages like Java, which do mediocre to poorly on all three...


In the research community, there is still lots of work and progress going on with regard to type systems, and much of that work will definitely make its way into production languages eventually. The advantages are too great to ignore, and a dynamically typed language can never be as safe or as fast as a statically typed language. I think the future of programming languages lies in making languages with improved type systems derived from things like Hindley-Milner (Haskell, ML) as easy to use as languages with type systems like Python.

For example, Philip Wadler and Robby Findler are working on being able to rigorously and safely combine typed and untyped sections of code in a single program, which would give you the advantages of a good type system (type safety, security, performance, etc.) and still allow you to use untyped code when that is needed. Additionally, languages like Haskell that provide a static type system with type inference (which infers types for you rather than requiring you to specify them yourself as in Java...) provide many of the advantages of a dynamically typed language like Python while still being type-safe, more secure, and faster. (See 38:17 and on in Faith, Evolution, and Programming Languages for information about the work on "evolutionary types".)

In the long term, I think we'll see stronger and richer type systems than anything that exists today, but with better inference and much better tool support. The main reason that dynamic languages are so popular at present is because of their ease of use and convenience, which translates into productivity and happy programmers. Better inference on the part of the type system combined with good IDE support could make a statically typed language almost as convenient while still providing huge advantages in terms of reliability, security, and performance.

The irony, of course, is that current trends in programming languages are often moving away from strong/static/safe type systems, towards weaker type systems in languages such as Python.

Off topic - but: languages should not force programmers to use one or the other paradigm: static typing is just dynamic typing with a contract imposed on the types.

Programming is the art of figuring out what you want so precisely that even a machine could do it.

As others have pointed out, once you know enough that you can tell the difference between a correct program and an incorrect program, you've already constrained your design enough that actually writing the code becomes trivial by comparison. If you have a specification that's precise enough to use formal methods on, the hard work has already been done.

Tim Tyler: That's how a lot of programmers feel, but you lose some of the value of strong static typing when you make it optional. Note that for the purpose of this post any language where you can force invalid conversions (such as C or Java) doesn't really count as "strong static typing".

Dynamic languages have their place, and I've written plenty of Python. Java is just horrible, because the type system is strong enough to get in your way but doesn't protect you from doing fishy stuff like casting things to and from Object.

In a language that strictly requires strong, static typing across the board you have more guarantees about the behavior of code you aren't familiar with, compilers can produce faster code, and the code is made more tractable to static analysis. Ideally, errors such as possible null references and invalid type casts should be caught at compile time, not runtime. The main issue is to make the type system simultaneously as strong as possible and as unobtrusive as possible.

Peanut Gallery: I agree completely. In my previous post feel free to translate mention of "high-level abstractions, strong type safety, and absolute referential transparency" as "I think Haskell is pretty nifty". It may be worth noting that Microsoft has hired a lot of Haskell guys and C# has moved in that direction with lambdas, some limited type inference, lazy sequences, &c. So that's at least one mainstream, "respectable" language taking direction from Haskell...

Unit testing has improved the quality of software immensely. Test-driven development (or behavior-driven development, which is really only slightly different) consists of writing tests (or specifications that are automatically tested) before the actual tested code is written.

IMO, the big improvements are coming with more microlanguages on VMs. Programming languages (like Java) become popular for some good reasons, then get locked in due to network effects. Its hard to displace them, and its hard to improve upon them significantly without breaking backwards compatibility. But nowadays with the JVM and CLR, you have multiple languages running in the same VM, and many can talk to each other. Think Java's type system sucks? You can code in Scala and still call your old Java code, use Java deployment systems (Tomcat, Glassfish), and all that. Like dynamic languages? Then you can code in JRuby, Groovy or Jython and talk to your old Java code too. Lock-in is no longer as big of a hurdle as it was.

As some other people have mentioned, functional languages seem to be becoming more popular. I'd really like to see Scala and F# take off, but I wouldn't bet that they will.

On the subject of formal proofs, I pretty much agree with Doug S.

A lot of people are saying that incremental progress gets made (especially in Haskell), but others on this thread seem to deny that incremental progress is possible. I really like Barry Kelly's beginning about crypto, but his last two paragraphs seemed basically wrong. Yes, there aren't mathematically-oriented people writing business code, so we need tools for other people, but there is incremental progress for more usable tools.

Phil Goetz, just because a lot of people went down the same wrong road doesn't mean there aren't other routes. It's a sign that it's premature to design languages for provable correctness, but progress is being made on fragments, like type safety.

Re: You lose some of the value of strong static typing when you make it optional.

Static typing sucks for most things - languages at that level should be prototype-based - but there are some cases where it is useful. Therefore, it should be optional - provided the syntactic cost is not too great. IMO, the syntactic cost is negligible - if you treat type safety as just another sort of contract.

Did Robin post earlier that he was going to cut back on blogging so that it's not a surprise to anybody?

The comments to this entry are closed.

Less Wrong (sister site)

May 2009

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30