Monday, February 12, 2007

Java, Objects and Static Types

I seem to be on a roll myth busting. This one is prompted by a comment by Peter Kriens, in response to my previous post on Java and component models. The sentences that drew my attention where "interestingly, the type information in the language allows us to provide quite a few guarantees." and "for large systems where you get legacy parts from all over the place there is something to say for type information ..."

Now it would be only fair to allow Peter to explain himself further and not to read to much into what he has said, but his words did bring to mind a common myth, namely that dynamic languages have no type information, and perform no type checking. Well they do!

I'm old enough to remember programming in C, with no stack trace and the dreaded "Segmentation fault - Core dump" Unix message on program failure. C allowed no type information at runtime. You could declare pointers of type (void *), which meant that they could point to anything you liked. You could cast to an assumed type and the language would not check for you - in the end you would try to access a segment of memory not allocated to you by the System and bang!

Here is where my memory gets a bit hazy, but I believe things improved with C++. There was still void* pointers, but I believe C++ introduced a dynamic cast that did some checking at runtime.

So a long intro, but I wanted to get everyone on to the same page. So dynamic languages like Smalltalk do their type checking at runtime. If a type mismatch is detected, the program doesn't go bang like with C, but the language notifies you of the error and will even launch a debugger at the appropriate line of code. So Smalltalk is a strongly typed language, the difference between it and say C++ is that the type checking occurs at runtime not compile time (Smalltalk also gets rid of void* pointers, Java does likewise).

So what are the consequences of these differences?

* Well with Smalltalk, all type mismatch errors can only be detected by running your programming. So no coincidence then that test driven development and SUnit came from the Smalltalk community. With C++ the compiler at compile-time performs checks statically. So some believe that unit testing is less critical (I disagree).

* With Smalltalk, the overhead of dynamic message dispatch and the runtime checks makes the language inherently slow. With C++ there are few runtime checks, all function call indirection is removed at compile time and hard coded into a VTable. This allows for the use of an optimising static compiler, which inherently produces faster runtime code.

* Smalltalk is fully polymorphic at runtime. What this means is that objects can take many forms (implementations) at runtime. So any class of object that satisfies a caller’s request can be substituted in at runtime. This is known as late-binding, and has significant implications. C++ is only partially polymorphic at compile-time, common interfaces must share a common implementation (A common base class or abstract base class). At runtime object type is fixed (hard-coded), so no polymorphism.

So along comes Java. Without justification, I will assert that Sun produced Java in a hurry. Oak was aimed at low powered, low memory home devices, so the static optimising compiler route was the obvious one to take. So Java has inherited many of the properties of C++. Namely, highly performant and static (fixed).

For Objects that come 'alive', you require different properties. Sun did a lot of research in this area. The Self Project determined that Objects needed to poses a number of properties:

* Directness (you manipulate objects directly)
* Uniformity (everything is an object)
* Liveliness (modeless, no run/edit mode, fully dynamic)

These properties are inherent in Smalltalk, and the Self Project hoped to build on them. Self explored a number of important OO concepts that are still relevant today, but the implementation was a memory hog, and you needed a super fast computer to run Self.

Here is a good video on Self.

So the reason for saying all of this is that "compile-time' type safety was always an after thought with Java. The real reason why Java is static is performance. Also from a Marketing point of view the static label was useful, as the bulk of the developer community out there were C++ programmers and comfortable with the static programming approach. To them liveliness meant nothing.

In 1995 some of the original Self team were ready to launch a new Smalltalk implementation known as Strongtalk. The idea was to address what was seen by many as the shortfalls of Smalltalk - slow performance and no compile time type checking. They addressed the performance problem by optimising out dynamic method dispatch at runtime using a dynamic compiller. The approach was similar to that used in the JVM JIT today, but their approach also allowed for de-optimisation on the fly back to interpreted code, allowing them to satisfy the semantics of dynamic dispatch needed for 'liveliness'.

They also looked at the runtime type checking issue. At first they made the same mistake as C++/Oak etc and assumed that compile-time type checking had to do with the runtime implementation. It doesn't. Type declarations in a static language can be thought of as annotations. They annotate the code and tell the compiler how best to optimise method calls. They also allow the compiler to perform checks. What the Strongtalk team did was to delegate the optimising role to a dynamic runtime and retain the static type checking at compile time. To do this they needed to add Type declarations to the Smalltalk source code, this is done by way of annotation. So Strong talk can compile both Type annotated code and un-annotated code. At runtime Strongtalk maintains full dynamic messaging semantics - so you end up with the best of both worlds.

As the startup company who built Strongtalk in secret where about to launch it on the world, Sun came along and bought them up. The Strongtalk developers where moved over to work on Java and the JVM and are responsible for the Java hotspot JIT technology we have today. Type-feedback and Type annotations sat on the shelf for over 10 years.

Fortunately Sun as recently released the Strongtalk code as Open Source:

Strongtalk Home Page

But imagine how things would have been different if they had released Strongtalk as a product in 1995? Or better still if they had ported Java to Strongtalk using Type annotations?

* There would be no OSGi (no need)
* Ruby probably would not have blossomed beyond the Perl community (no need).
* Smalltalk would perhaps still be thriving today.

Paul.

6 comments:

Ricky Clarkson said...

C++ does actually support polymorphism, if you use pointers or references rather than stack objects. Otherwise it'd be fairly hard to say that you can write OO code in it.

"C allowed no type information at runtime."

Not really true, it just didn't give you any. You could add your own in, e.g., make the first byte of a struct be some kind of type id.

You speak of C and C++ like they are dead languages, but they are massively used.

It would be possible to make a runtime for C or C++ that provides a debugger on crashes. I think Visual Studio can do that, for example.

"This allows for the use of an optimising static compiler"

There's nothing to stop that from being done with Smalltalk as far as I know. Of course, the optimisation wouldn't apply all the time, because the language is so dynamic, but if you optimised the bits that you could statically tell, you'd be ok.

Another approach would be to analyse the running program to see what bits can be optimised.

Paul Beckford said...

Hi Ricky,

Polymorphism means different things to different people. The literal translation from greek is "many forms". In a dynamic OO language an object can take many more forms at runtime than it can in a static OO language like C++. This is due to a seperation of concenrs between the messaging interface (message sends) and the implementation (class or even prototype for a language like Self).

As for "C allowed no type information at runtime". This is 100% correct. It "allowed" no type information. In Smalltalk all objects have an associated class which can be access through a meta-api. I agree you can write dynamic code in C, but that isn't the same as it being built into the language.

My goal wasn't to knock C/C++ - these languages both have their uses. For example OpenGL is a great library written in C++.

My goal was to indentify the constraints of the Java language and identify where they came from. Java was pitched as a "good enough" Smalltalk. A VM based language with garbage collection that can run anywhere.

Over the years though Java programmers have hit limitations. Most do not realise that these limitations are inherent in the language and have been solved elsewhere.

IMO C/C++ are ideal for low level, highly performant APIs like device drivers and OpenGL (See Croquet a Smalltalk application that wraps the the C++ OpenGL library). Dynamic languages are ideal for high level user-centric interfaces like web applications. See Ruby and Rails (which wraps C).

My point is the right tool for the right job.

Peter Kriens said...

As the muse of this blog I'd like to make clear I was talking about static type information. I am a smalltalker since the early eighties.

I always liked dynamic languages and written a few myself. I never liked all the typing that goes with static type safety. However, I am impressed with the completion features in Eclipse that do require the static type information that Java provides and most dynamic languages lack. I had to do some Javascript work lately and I really missed that help. With libraries exploding in number, that kind of help is crucial. Actually, I feel that Eclipse is making Java almost not a pain to write.

I have also seen that when you get your modules from different people/groups then the static type safety helps to detect many problems early.

That said, programming for fun still requires a more dynamic programming language.

Peter Kriens said...

As the muse of this blog I'd like to make clear I was talking about static type information. I am a smalltalker since the early eighties.

I always liked dynamic languages and written a few myself. I never liked all the typing that goes with static type safety. However, I am impressed with the completion features in Eclipse that do require the static type information that Java provides and most dynamic languages lack. I had to do some Javascript work lately and I really missed that help. With libraries exploding in number, that kind of help is crucial. Actually, I feel that Eclipse is making Java almost not a pain to write.

I have also seen that when you get your modules from different people/groups then the static type safety helps to detect many problems early.

That said, programming for fun still requires a more dynamic programming language.

Paul Beckford said...

Hi Peter,

When you first mentioned Smalltalk, I guessed that you understood where I'm coming from. I like the completion features of eclipse too.

My point though is that we really do not need to choose (IDE completion support and static versus no completion and dynamic), we can have both. Strongtalk is testimony to this fact. Also the Smalltalk IDE's are getting better at completion, even without type declarations.

Eclipse was based on Smalltalk (VisualAge) and creates an "image" from the Java source. If the source was Smalltalk, and Annotated with type information like with Strongtalk, then the IDE would be both dynamic and have completion.

I think the real issue here is that we have set down a given path for expediency (short-term gain), I am suggesting that perhaps in 1995, in hindsight we took the wrong fork in the road!

Dynamic languages are making a come back, and I can see java getting squeezed.

me22 said...

C++ does allow void pointers (since it's nearly a superset of C), but they're strongly discouraged. Templates allow you to program generically while still keeping type information with no speed cost.

dynamic_cast is used for checked casts from base classes to derived classes. It only work on things with virtual functions though, so it doesn't help with void*s.