Tuesday, December 02, 2008

Object 101 - What is an Object?

Judging from the responses to my last post on uniformity it looks as though I started off with the bar too high. So lets lower it a bit. Lets go right back to basics and try and define what I mean by Object.

Concept


Firstly, the idea of Objects is a conceptual one. Alan Kay and his team did research for over a decade trying to understand how best to structure computer programs and settled on the idea of Objects. The concept of Objects can be explained by taking a biological analogy. Alan Kay speaks of the idea of an identifiable cell, which has a membrane encapsulating its insides from the outside world. Each cell is autonomous and goes about its job independently from other cells, but cells do collaborate in a loosely coupled way by sending (chemical?) messages to each other.

So Objects are analogous to cells and the key defining characteristics of an object are identity, encapsulation and messaging. Notice I haven't mentioned classes or inheritance. These things are merely just one approach to implementing objects and defining what goes on inside the membrane. There are object orientated languages like Self which eschew classes all together.

Lets explore these fundamental characteristics in more detail:

Encapsulation

The power of encapsulation is that it hides the implementation from the outside world. In the same way that a cell membrane hides the inside of a cell. The outside world need only know the message interface of an object (known in Smalltalk speak as the message protocol). So when I send a message to an object, I have no idea how that object will process that message. The object is free to process the message in anyway it likes. This leads to the idea of polymorphism, which is a Greek word meaning "many forms". Since an objects implementation is encapsulated behind its message interface, the implementation can take any form it likes. This means that message implementations are free to perform any side effect they wish as long as they satisfy the message interface. Notice again I have not mentioned classes or subclassing. Again subclassing is just one approach to achieving polymorphism. Any object that satisfies the message protocol is viewed as a polymorphic variant, and can substitute any other object that shares the same protocol.

Messaging

During their research Alan's team looked at a number of ways of getting their objects to communicate with each other in a loosely coupled fashion. If you take the biological analogy to its extreme, then each object should be totally autonomous and share nothing with the others. This means that each object should have its own cpu, its own memory, it own code, Alan Kay as even argued that each object could/should have its own IP address as a global identity. So an object in this view of the world is a networked computer. OK how to hook these objects together. Well the natural solution is asynchronous messaging, just like you get with e-mail. Since each object has its own cpu it doesn't want to block and wait until the receiving object processes the message. So an object can send a message without blocking and the receiver will send back an answer into the objects inbox its own good time. This approach is what we call the Actor model today, and as someone kindly pointed out, Alan Kay first explored this approach in an early Smalltalk back in the 1970's. Interestingly the Actor model has come back en vogue with Erlang which has adopted it to implement "share nothing" concurrency. This is why some people say that Erlang is Object orientated.

The share nothing approach to objects could be considered to be a bit inefficient. I'm not sure of the history, but Alan Kay and his team decided to move on from asynchronous messaging to a more restricted synchronous approach. With synchronous messaging all objects share a common processor (or thread) and message sends block until the receiving object has completed processing the message and has responded with an answer. This is the messaging approach that was settled on in Smalltalk-80 and released to the world in 1983.

State and Behaviour

So we now have objects sharing cpu, but each object still encapsulating its own memory (program state) and its own code (behaviour). The state and behaviour of an object is private (encapsulated). So what happens when two objects share common behaviour, but have different state? As an example what if I have two bouncing balls, one red and one blue? Do I implement two objects separately, duplicating the code? Obviously there is an opportunity here for these two objects to share common code.

As part of the private implementation of these two objects, sharing code is desirable ( the DRY principle). One approach is to create a new object to encapsulate the common behaviour. In Self they call this a trait object. This leaves the two original objects just containing state (red for one and blue for the other). Common messages sent to the red ball and the blue ball (like the message 'bounce') are delegated to the shared trait object (through something called a parent slot) which encapsulates the common code. The red ball however may decide that in addition to bouncing, it can blink too. Blink meaning changing colour from red to white and back again repeatedly. So in addition to 'bounce' the red ball adds the message 'blink' to its message interface. This behaviour is not shared with the blue ball, so the red ball will need to have its own blink code which is not shared. So in Self objects can choose to share some behaviour or may choose not to share any behaviour at all.

In Smalltalk-80, the idea of not sharing implementation was relaxed, although conceptually the idea is still useful. So in Smalltalk-80 all objects share behaviour with other objects of the same kind, leading to the idea of classes of objects. Again I'm not sure of the history, but I believe this is a more efficient approach then the approach adopted by Self (Self came about in the 1990s many years after Smalltalk when memory was cheaper and CPUs faster). So in Smalltalk all objects share common behaviour with objects of the same kind through a common Class object.

Classes

So we finally get to the idea of a Class. A class is merely an implementation convenience, and unlike what the C++ proponents would have you think, the idea of class is not central to OO. In the same way that objects can share behaviour through a common class object, so can class objects share behaviour through a common superclass, leading to the class hierarchies we are all familiar with and tend to associate with OO programming.

Notice I use the term Class object. In Smalltalk a class is a factory object. Incidentally a lot of the so called OO patterns in the Gang of four book are really C++ patterns. So for example there is no factory object pattern needed in Smalltalk where you get factory objects for free.

A factory object is an object that creates other objects. In Smalltalk such objects are stored in a global hashmap called Smalltalk and are available throughout your program. Global objects in Smalltalk are identified with a capitalised first letter in their name by convention. Classes in Smalltalk are just global factory objects and hence all have a capital first letter. This convention has been carried forward into C++ and Java.

So how do you create a class? Well you send a message to another class you wish to subclass. It will come as no surprise to know that the message is called 'subclass'. This message answers a new class. To this class you can add class methods and instance methods again by sending messages. Instance methods are methods that you want to appear on objects that are created by this class (remember a class is a factory), class methods are methods that belong to this class itself and defines the class's behaviour. Also you may want to add class and instance variables to your class to encapsulate state.

So how do you create an object? Well you send a message to the factory responsible for generating the kind of object you want. This means that you send the message 'new' to the class object.

This is where languages like Java and C++ take a cop out. 'new' in such languages is not a message send on an object, instead it is a keyword in the language. This means that you cannot override new and hence the need for the gang of four 'factory object pattern' in these languages.

Back to Smalltalk. In response to the message 'new 'the class will answer a new instance object. So now we have a new object. Incidentally we skipped over how objects are created in Self. In Self any object can act as a factory and is able to create a copy of itself. So in Self you send the message 'copy' to the object you want to copy. The copied object is now a prototypical instance which is why Self is called a prototype based OO language (rather than a class based OO language).

MetaClass

Back again to Smalltalk. We have factory objects (classes), and we have instance objects (objects), but where do the class methods live? Where for example is the code for 'new'? As I said before a class has two roles, one to define it own behaviour (such as defining new) and the other to define the behaviour of its instance objects. I called its own behaviour class methods. This behaviour belongs in another object, the classes class (classes have a class too). I think we need an example:

aTranscriptStream := TrancriptStream new.

The 'new' message implementation is defined in the class of the TranscriptStream Class object. The class of 'TranscriptStream' is called 'TranscriptStream class' which is also known as the meta-class. 'TranscriptStream class' also has a class: 'TranscriptStream class class', and so it continues. 'TranscriptStream class' and 'TranscriptStream class class' etc are all implemented as the same object, the Meta-class object (otherwise we could go on for ever). This circularity is one of the beauties of Smalltalk. Meta-classes do not exist in C++ and Java, and hence why 'new' is a keyword in these languages.

So the 'new' message is sent to the TranscriptStream object (which is a class) and the implementation is defined in the 'TranscriptStream class' object (which is a meta-class). Now how do we end up with 'Transcript'? Remember that in Smalltalk globals all start with a capital letter. To make something global I need to add it to a global hashmap called Smalltalk along with a global identifier (symbol):

Smalltalk at: #Transcript put: aTranscriptStream.

Then I can do this:

Transcript show: '3 is less than 4'.

Uniformity

The message 'show:' in the previous example is sent to the Transcript object (which is an instance) whose implementation is defined in the TranscriptStream object (which is a class). Interestingly you cannot tell whether Transcript is a global instance or a class. I initially mistook it for a class in my discussion with Stephan until I looked it up. The thing is with Smalltalk is that it doesn't matter. Instances, classes and meta-classes are all the same thing, they are all objects. Smalltalk is uniform and everything is an object, which is where I started.

Updated 4/12
Replaced Transcripter with TranscriptStream. Instances of both these classes satisfy the 'Transcript' protocol, but in Squeak Smalltalk the global 'Transcript' object is an instance of TranscriptStream. I found this out by printing 'Transcript class' in a workspace. Another approach is to print 'Smalltalk at: #Transcript'.

8 comments:

Andy said...

Paul, good article. The more I read, the less I realize I really know about objects. This is fascinating stuff. It is such a shame that in many respects software development seems to be moving backwards.

Any recommendations for further reading would be much appreciated.

Paul Beckford said...

Hi Andrew,

the Smalltalk team first announced Smalltalk to the world in an article in Byte magazine in 1981:

http://users.ipa.net/~dwighth/smalltalk/byte_aug81/design_principles_behind_smalltalk.html

It wasn't until a couple of years later that a book on Smalltalk was released:

http://stephane.ducasse.free.fr/FreeBooks/BlueBook/

One of the reasons why I believe that there is so much confusion over OO is that no one bothers to reads primary sources anymore. No one bothers with original papers and everyone is comfortable taking their information second hand.

Bjarne Stroustrup felt that OO could be boiled down to data abstraction with inheritance, hence his 'C with Classes' approach, leading to C++. This is the OO that most people are familiar with.

Very few people have gone back to the original research at Xerox Parc. The Self research team at Sun Microsystems did in the 1990's and now Gilad Bracha had done the same with Newspeak, which he plans to release in January.

But for most people their understanding of Objects is as defined in C++.

Paul.

Unknown said...

Hi Paul,

Having taught the Smalltalk object model to students, I think I can appreciate the difficulties in writing this (excellent) post.

Have you thought about identities? I think that a key property of objects is that they have distinct identities so you can tell them apart even if their states are the same.

Paul Beckford said...

Hi Itay,

Thanks. I knew I'd missed something :) Yes, objects have an unique identity. I guess thats where Alan Kay was going with the IP address thing.

Have you been teaching Smalltalk recently? To be honest I'm no expert and was self taught. At least I've got it down on paper now, so next time I sense that someone doesn't share the same definition of an object as I do, I can point them to this.

Its nice to hear that this stuff is being taught in schools, which makes it even more bizarre why no body gets it.

Paul.

Unknown said...

Paul,

I taught Smalltalk until year and a half ago. The course is still being taught by a colleague of mine.

If you're interested, here's the link to the object-model presentation:
http://webcourse.cs.technion.ac.il/236703/Spring2008/ho/WCFiles/object-model.pdf

Paul Beckford said...

Itay,

Thanks for the course material. There is stuff in there that is new to me. Like lazy meta-classes, and languages that I haven't heard of before.

Thanks Paul.

Akbar Pasha said...

Thanks for this wonderfully lucid post. I have never used Smalltalk but my intro to OOP was through Java. Then when I moved on to Python & Ruby there was always this lingering doubt about as to where does this new reside also how in this world are the methods of the classes are stored. I never understood no matter how hard I tried.

Finally your article explains it much much better way. I am gonna re-read it again and again to digest it deep inside my mind.

One suggestion though - if you can add some images - like bunch of boxes and circles with arrows showing relationships between objects and class object and meta classes - I think it would be more lucid and clear.

But I loved the post. Thanks again.

::akbar

Paul Beckford said...

Akabar,

Thanks. Your enthusiasm makes it all worth while. If you take a look at the course material provided by Itay it covers all the object relationships I've described and a few more besides.

Paul.