Wednesday, March 07, 2007

"Sorry, you're not my Type!"

Before we explore the future potential with Blue OOP, I thought it only fair to address the perceived advantages of pink OOP first. After all, I have labelled pink OOP as just an extension of the "old thing", but who says that the old thing was all that bad? Was there anything about the "old thing" worth holding onto?

The old thing I am referring to is C. C was one of the first 3rd generation languages to be used to write an Operating System for micro-computers (I think?). That Operating System was Unix. Prior to C most micro-processor OSes were written in assembly. I mention microcomputers, as this is/was the name for computers built using micro-processors. Prior to the micro-processor computers where huge boxes of electronics built from discrete components.

Early microelectronics placed considerable constraints on computer software. Many of the Computer languages used on big "mainframe" computers just weren't suitable for microcomputers especially personal computers. Outside research organisations, personal computers had very little processing power and very little memory.

The success of C was largely due to the success of Unix. Unix, was ported to a wide range of computer systems. Also, with C you could get very close to the efficiency of assembly language, and unlike assembly language your code was portable.

This is a longer introduction than I had hoped, but a lot of people have forgotten this history and it is useful to remind ourselves of it. So by the early 80's C was the personal computer language of choice.

Then along came Objects. So the challenge was how to bring OOP to PC's and still retain the efficiency of C. There were two candidate languages, both derivatives of C. These were C++ and Objective C. C++ replaced the message passing of Smalltalk with a virtual function call. This ensured that method dispatches would be as efficient as possible. The downside is that C++ is an early bound langauge as binding to concrete methods occurs at compile-time. Objective C however, chose to retain message sends, this means that Objective C is late bound, but as a consequence is less efficient at method dispatch then C++.

Given the hardware restraints at the time, the majority of the industry went with C++. The only PC company I know of that went with Objective-C was Steve Job's Next with their NextStep OS.

So the big advantage of pink OOP is efficiency. As time has moved on however, some in the industry have tried to re-write history and claim that the big advantage of pink OOP is type safety. Now I must admit, I do not know exactly what type safety means. There are a few things that I do know however:

* A Class is not a Type
* Late/early binding and Static type checking are orthogonal concerns
* Static typing is usually associated with early binding
* Static typing can be applied to a late-bound dynamic language like Smalltalk.

The first bullet is a conceptual flaw in C++, that Java attempts to solve by introducing Interfaces. The problem with Interfaces though is that they are selective. So sometimes in Java you bind to a Type and at other times you bind to an Implementation (Class), an unsatisfactory compromise IMO.

I'm going to get myself up to speed on "type safety". My experience has shown that static typing as used in languages like C++ and Java can greatly reduce the expressiveness of the language. So instead of the compiler being my friend, it ends up being a straight jacket, stopping me doing what I know would work best, if only I was allowed.

This is just opinion of course. I have come across one static type system that I believe will allow me to have full flexibility. This is a Typechecking system for Smalltalk called Strongtalk. Here is a link to a paper on The Strongtalk Type checking System. The current Strongtalk is slightly different to the description in this paper. If you are interested in the difference you will need to look in the Strongtalk documentation in the Strongtalk download bundle. I believe Scala is an attempt to bring more expressiveness to static typing on the JVM so I will be taking a more detailed look at Scala too.

It should make a neat comparison. Two static OO type systems one targetting a late bound langauge (Smalltalk), the other targetting an early bound language (Java), it will be interesting to see how they compare.

BTW. If there is anyone out there who can answer the question: What is type-safety? I would be more then happy to hear from you.

Revised 07/03/2007: Modified to acknowledge the role of Unix in the rise in popularity of C - Thanx Steve.

11 comments:

steve said...

First, I think that static typing can in some circumstances greatly increase expressiveness. (I am currently coding some math in Java, and I am finding that code like:

Matrix m = b.inverse();

Gives me a clear understanding of what is going on).

...I think you may have got the history of why the industry went with C++ a bit wrong. It was not to do with hardware constraints (after all, Pascal and even some versions of LISP were very fast even in the early 80s). It was because C++ was a descendant of C, which was used in Unix. Other languages had been used to write operating systems before C (such as IBM's PL/1), but implementations of C were portable, allowing Unix to be put onto all sorts of hardware very rapidly. All the tools of Unix were written in C, and the APIs were written in C. This made C the 'natural language' for development at the time because of the ease of integration with the system. This popularity of C spread to other systems (such as CP/M and DOS, and then Windows). C++ was widely used because you could use most of your old C code with it, and you were forced entirely into a new paradigm. I happen to think the decision to use C++ for major applications was a very bad one, but I can see why it happened.

By the way - as to 'what is type safety'. My current situation is that I really am not sure! This seems to be a very confusing area, which I am putting quite a bit of time into researching right now, motivated by your blog. I am certainly going to look into StrongTalk's and Scala's mechanisms.

Paul said...

Hi Steve,

Thanks for reminding me about the influence of Unix. I agree Unix had a tremendous impact on the popularity of C/C++. I guess efficiency came to mind for me because of my background in embedded systems.

Unix can’t explain the dominance of C++ over Objective C though. Objective C integrated as well (if not better) with C/Unix as C++ did. In fact with Objective C you can include both inline C and C++ code.

I guess this one was efficiency, but who knows why these things happen :^)

….As for your code example, I couldn't resist, I think you left a little out:

Matrix b = new Matrix();
Matrix m = b.inverse();

Which is a little more verbose. Omitting the type declarations, and using a naming convention (all matrices variables start with m), you would have:

mb = Matrix.new()
mm = mb.inverse()

Which is a little more concise. Granted this is not a good example, but once you take method declarations into account, you will end up typing 'Matrix' an awful lot of times. I’m sure that the purpose of most variables can be inferred through good naming.

….The term 'Type safety' seems to be as ill-defined as OOP :^) Writing these series of posts and trying to get solid definitions for things as illustrated to me just how ill-defined most things are. Once a label becomes a by-word for "goodness" it's meaning becomes stretched until the term becomes meaningless.

For an industry that claims to be scientific, we don't seem to be very scientific in defining terms!


We may need to invent new terms of our own that are more precise to aid comprehension. This is why I came up with the terms pink and blue OOP.

….I'm researching type safety too. It will be interesting to compare notes later.

steve said...

Good point about Objective C. I think the reason why C++ took off instead was because you did not need a C++ compiler. C++ was introduced as a 'translator' - called 'cfront', which produced pure C code. It was a far simpler transition.

"I’m sure that the purpose of most variables can be inferred through good naming."

Actually, that does not help so much with math work, as by its nature one tends to want to express things as if they were equations, often with single letters.

Isaac Gouy said...

What is type-safety?
See page 263, chapter 28 in this textbook Programming Languages: Application and Interpretation

me22 said...

I'm not convinced I could formalize it, but I'd say that type safety is a guarantee that functions (including operators and such) will only be given arguments whose bit patterns meet the invariants (implicit or explicit) of the parameter types.

That's general enough that it doesn't prevent dynamic languages from being type-safe and doesn't force it to any particular paradigm...

Paul said...

Hi Isaac,
I've read the reference you supplied. I'm not sure it is as definitive as it tries to sound. It all depends on your starting assumptions. All I've read on types thus far implies that static typing is a mechanism to help detect program errors early.

The fundamental problem with all types systems I think is that it is very difficult, if not possible for a type checker to tell you whether your program is correct (early or late).

In fact for an imperative program it is mathemenatically impossible to prove correctness (in a predicate calculus sense), hence why some use pure functional languages like caml ( no side effects, tail recursion etc).

So any static type system for an imperative language is a compromise between providing a mechnanism for the programmer to describe 'type intent' that can be checked statically whilst still allowng the flexibility for the program to express behaviour which can only be shown to be type safe at runtime.

As an obvious example of what I mean, no type checker can check the semantic equivalence of two types in a given context (e.g. Type A or Type B are acceptable here) unless the programer indicates as such. The 'indication' provided by the programmer becomes a declarative statment which in itself could be in error.

So your program could be self consistently wrong!

So semantics (what is meant by a message protocol) is difficult to check without restricting the program structure to what can be statically expressed using the type system.

So for late-bound expressive languages like Smalltalk, where for example several return types may be valid from a single method call or were say a collection may validly contain a number of types, then the declaratve static typing 'language' needs to be as equally as expressive.

More I read of this stuff, the more convinced I become that the effort entailed in declaring types using a flexible delcarative typing 'language' would be better spent declaring tests.

After all no 'type language' will be as expressive as your tests and as capable of expressing intent.

So my view so far is that 'static typing' is a well intended idea, but ultimately mis-guided. Tests, whether as a form of assertion like in Eiffel or as programs in their own right as with XUnit are the best way to infer correctness. So to detect errors early, you should test early and run those tests all the time. This approach is what Kent Beck as labelled TDD. After all there is no earlier time to express intent (your tests) then before you write the code. Hence the reason to write your code tests first!

I'm still reading though...

steve said...

I think there is an interesting issue here, in that I often see the argument that because static typing can't protect you in all cases, you might as well abandon it. That just does not make sense to me, and I will try and explain why.

Increasingly, software is getting more complex, and with that complexity is a move towards automating much of what used to involve hand-coding. Garbage collection is finally starting to replace manual memory management, and run-time optimisation (as in Hotspot) is able to achieve more than manual coding for efficiency. So why buck the trend, and dump automated checking of types? Tests are great, but why not use them as well as automated checks?

Paul said...

Hi Steve,

I'm not talking about abandoning type checking. There is the whole area of dynamic type checks at runtime which we haven't discussed.

I also believe that static type declarations have a role to play too. To me their primary role is helping out with tooling and comprehension of large systems. Mehtod name completon in Eclipse is a great example of this.

Where I have issue is when the stated purpose of static type checking becomes program correctness, the elimation of runtime errors (bugs).

"Tests are great, but why not use them as well as automated checks?"


I would reverse this sentence:

"Static type checking is great, but why not use them as well as automated tests?"

To me when it comes to testing program correctness, then that is what you should do, test program correctness. In a belt and braces approach, static type checks help too, but the tests should come first.

The problem with type checks is they tell you little about intent, and they restrict your code to what can be expressed in the typing language as I said before.

I like the optional static type checking approach, where I can reason about my program behaviour in my tests, and in addition document my code using type declarations. Where type declarations deliver deminishing returns I can choose to omit them and rely on my tests.

For example I would rather write a test then rely on a complex type declaration (see some of the more complex examples in he Strongtalk paper).

Paul said...

Hi Steve,

As an example of what I mean, in Java if you have a complex Type situation (multiple allowed types for example) what you tend to do is abandon the use of the static type system all together (use Object) and rely on dynamic casts. Here your static type system isn't helping and you need tests. So in a sense you can use the optional static typing approach with Java.

In the simple cases you declare types, but as you know from your Smalltalk experience, in the simple cases you seldom introduce typing errors anyway and if you do they are easily recognised. Yet having the type declarations there is useful as a form of documentation.

So the bottom line for me is that for the type of errors we tend to find in programs (e.g null pointer exceptions in Java) your static typing doesn't help much and you need tests. But as a means of documenting code and providng greater possibilities for tooling like completion and refactoring, type declarations are indeed useful.

I would like the chance to experiment with a flexibe optional static type system like Strongtalks' using advanced tooling like you find in Eclipse. Sadly unless I write it myself then this isn't going to happen any time soon :^)

Paul said...

Hi Steve,

"Increasingly, software is getting more complex, and with that complexity is a move towards automating much of what used to involve hand-coding. Garbage collection is finally starting to replace manual memory management, and run-time optimisation (as in Hotspot) is able to achieve more than manual coding for efficiency. So why buck the trend, and dump automated checking of types?"

I think, you're right here. Automation has little to do with safety however. Good programmers write safe software, period.

Tooling has got it's place though, and type declarations do help, but programming is hard and relies on highly skilled people.

Gauranteed automated Type Safety is a bit of a myth IMO. I've made this the theme of my latest post.

Isaac Gouy said...

paul wrote In fact for an imperative program it is mathemenatically impossible to prove correctness (in a predicate calculus sense), hence why some use pure functional languages like caml ( no side effects, tail recursion etc).

Paul you've got it wrong again - maybe you could check the videos and websites before you comment - the MLs are not side-effect free, pure functional languages.