Thursday, April 10, 2008

Generics - An OO Anti-Pattern

I'm obliged to use Java 1.5 at my latest client. One of my gripes with Java is that it doesn't encourage an OO programming style. As a Coach, I tend to find that most Java programmers lack a full understanding of OO design principles. In fact I can count on one hand the number of Java programmers I've come across who have an understanding of OO which is at least as good as mine.

In contrast all the Smalltalk programmers I've met, understand Objects very well and I'm sure most could teach me a thing or two. So Sun decided to revamp Java, a supposedly OO language. You would have assumed that they would have borrowed even more from Smalltalk, but no. Instead we get Generics. So why?

Before answering this question. I should really spell out why Generics are not compatible with OO design principles. Objects are meant to be loosely coupled runtime entities. Objects do not exist at compile time, they come into existence when you run your program (or with Smalltalk, they come into existence the moment you load your image). Objects should hide both their state and implementation from others, this is how they achieve low coupling. All that is exposed is their message interface. In Smalltalk, this interface is known as the Object's Protocol. So to communicate with an object, and get it to do something useful, you need to know its protocol and nothing else.

OK. Lets compare this with Generics. Firstly in the Java view of the world, message sends are replaced with virtual functional calls. A Function call as a way of sending a message to an Object is only possible in Java if you know the type of the receiver. In Java the receivers' Type is either a reference to its implementation (its Class) or to one of its implemented Interfaces. So straight away the idea of hiding knowledge of the implementation through message sends is lost. So along comes generics. Does it improve matters any? Well no, in fact it makes things worse, a lot worse.

If the answer (return value) to the message sent to the receiver is a collection, then the Type of Objects that it may contain is also part of the message interface in addition to the Type of the Collection (container) itself. So Generics leak a lot more information about containers. It gets worse when you consider subclassing and method overriding with generic containers. The complexities of what should represents a valid answer to the same message is mind boggling.

So generics leak information like a sieve, and break the basic OO tenant of information hiding (encapsulation) and low coupling. Information hiding is useful, because it allows you to substitute objects at runtime. The substitute could extend the protocol of the original, or implement the same protocol in new ways with different side effects. This idea is what is commonly known as polymorphism, and gave birth to the idea that OO programming could lead to components and re-use.

So back to the question Why? Information hiding is powerful when it comes to malleability and extensibility, but for some perhaps it is too powerful. In Smalltalk to know the Type of an object you need to send it a message. There is no other way. The Type is not manifest in the code. So the Smalltalk IDE is a running Smalltalk program containing a bunch of Objects. Classes themselves are Objects and to know what Type an Object is the Class Browser Object sends it a message to which the answer is the Objects Class. So with Smalltalk you only get to know anything about an object once it is running. Because of this the Smalltalk environment is always alive, always running, all the way through the programming cycle. The Smalltalk image, when loaded contains both IDE objects such as the Class Browser and developer Objects such as application Classes. The Browser sends messages to Class Objects to reveal their methods, and to method objects to reveal their source code. Programmers edit the code, and then send a message to the Compiler Object to compile the method. None of this is possible without running the image.

This approach doesn't help you if you don't like running the code to find out what it does. If you want to perform a static analysis you need more information at compile time. So using Generics is a way of providing this information removing the need for dynamic casts. So why would you want to get rid of casts? Casts are one of the gaps in your ability to fully analyse your program statically. A cast is an explicit admission that static analysis can't help in some scenarios, and you still need to defer some type checking until runtime. Generics is an attempt to bridge this gap, in an attempt to provide complete type safety at compile time.

For programmers who like their code to tell them what it will do before they run it, then this meta-data laden declarative approach is viewed as a benefit, but as Steve Yegge points out, programmers should not need such "training wheels". To know what a program does, what you should do is run it (test it). This unnecessary meta-data obfuscates the code and limits the degree to which the code can be deemed fully Object Orientated.

True Object Orientation relies on late-binding, which occurs at runtime. The whole point is that "you don't know for sure" what it is you are sending a message to. Allowing the receiver to be substituted. Manifestly stating that you know, limits polymorphism and artificially restricts the computational model.

6 comments:

Yardena said...

Hi Paul,

I enjoy reading your blog for some time now.

Regarding this post, I think it's rather about static versus dynamic typing and not OO. Smalltalk is a dynamic language, Java is not. Generics were enhancement of the compiler with the goal of making type checking even more strict, so you advocate dynamic typing and Generics is the opposite direction, obviously.

I'm not necessarily disagreeing on the advantages of dynamic systems, but I find the terminology in this post somewhat confusing.

Keep up blogging!

Paul said...

Hi Yardena,

I understand. I'm somewhat biased :) Artima ran an Article recently that makes the same point:

http://www.artima.com/cppsource/type_erasure.html

BTW. Alan Kay who coined the term "Object Orientation" is quoted to have said that when he coined the term he didn't have in mind C++.

The meaning of the term OO has been stretched and re-interpreted in many ways. I guess I should make it clear that I am referring to the original concept as pioneered at Xerox ParcPlace in the 1970's by Alan Kay and his team when designing the Smalltalk language.

Thanks for the interest.

Yardena said...

Hi Paul,

Yes, I came across Alan Kay's quote on OO and C++ in the past :-) If you don't already know this, "Scandinavian vs. American" approaches to object orientation may be of an interest for you - another thing I came across quite recently after hearing the terms several times.

Paul said...

Hi Yardena,

No, I wasn't aware of this. I'm not sure there is a big gap between Alan Kay and Smalltalk, and David Ungar and Randall Smith with Self and the Scandinavian view as stated here.

The gap between the Scandinavian view and Bjarne Stroustrups' view and C++ seems much larger to me. Stroustrups approach is very pragmatic:). And you know Alan Kays attitude to C++ :)

So is it the Americans versus the Scandinavians..? I don't know.

Thanks for the link, interesting...

Paul Beckford said...

Hi Yardena,

I'm reviewing my blog on its 10th year anniversary, and looking back there is a lot I got wrong :) I think this post was about the worst though :) Yes you were absolutely correct, that OO and Generics are orthogonal concerns and can work together well if implemented properly. Betrand Meyer describes how:

http://c2.com/cgi/wiki?GenericsVsSubtyping

Looking back, I wonder why I was so hostile to the idea? Well I'm guessing that I was still shell shocked by prior experiences with C++ and the STL :) Plus Java 1.5 doesn't implement Generics very well, the Scala type system with its ability to express covariance and contra-variance does a much better job (not to mention Java's type erasure at runtime :))

Now having been exposed some to much better type systems (like the on in Haskell), I now see that type systems can be really useful as a way of reasoning about the correctness of your code.

At the time of writing this, I hadn't done any functional programming, and had only experienced type systems as a pain that mostly got in the way of the computational (OO) model. With dynamic languages that pain just wasn't there.

Paul.

Lars Olson said...

You said "To know what a program does, what you should do is run it (test it)."

This is a complete misunderstanding of computing science, I'm afraid. When you write in your program x := 1 + 1 do you need to test it to figure out that x will be assigned a value of 2? Testing this would be a complete waste of time since you can formally verify it in your brain in 2 seconds or less...

I'm afraid you've fallen into the same trap as the Royal Surgeons did back in the day when sterilizing your surgery utensils was considered a waste of time... You don't want to sterilize your code with formal type systems because it's a waste of time... Hint hint: writing millions upon millions of unit tests is a waste of time, when one quick formal verification takes only a few seconds.

True, complex programs are much harder to verify, but you can verify a lot of the program ahead of time and literally run the code inside your head and verify it will work mostly, before even hitting the compile button (or the run button). If you don't know what your code should end up doing and you have to run/test it to find out, you should seriously consider leaving the programming industry for another career like trial and error engineering. But even engineers use formal methods such as mathematical calculations with well known proven formulas, to construct their bridges, buildings, traffic systems, chemical plants... No no it's not all a bunch of trial and error buffoonery.

Are you seriously saying that one cannot understand his own program, until he runs it and sees what magic happens. Hint: determinism, deterministic, turing machine. Input, Output. Predeterminism.

It's not magic.