Saturday, February 09, 2008

IS ORM a Dead End?

I have never been over the moon with ORM. It solves the need to write SQL in your code, and to iterate through database results sets to form data structures, but it never really addressed the mismatch between true Objects and Data in my opinion.

Before going on to explain why, I would also like to say that ORMs also facilitate Domain Driven Design. To gain this benefit though, your application is no longer relational database centric. Instead your database becomes a pseudo object repository, storing object state and object graphs. This is fine for OO programmers but can look strange to those from a pure relational DBMS background. What DDD is really saying IMO is that OO driven application design calls for an OO database. ORM and a RDMS will do, but your DDD model is still OO not Relational.

So where does this leave us? Well hopefully acknowledging that the OO Model and the Relational Data Model are just two different models. Two different ways of modeling the world around us. Which model should we choose? Well it depends on what we are trying to model and why.

Bob Martin wrote an interesting blog post on the OO Model, ORM and the Active Record pattern. According to uncle Bob, the OO Model tries to model the world in a way that provides immunity to data changes. The idea is that data is hidden (encapsulated) inside objects, and that externally no one knows or depends on the objects data type. Instead you communicate with objects through messages. The data types can change as long as the messages and the objects answer to those messages (behavior) remains the same. This is the basic idea behind polymorphisms, which provides for immunity to implementation change, including changes to the encapsulated data type.

The relational model is very different however. Its goal is to allow you to find the data you want quickly. It does this by recognising the relationships between data types and using set based maths to "join" record sets. To do this, the relational model chooses to expose all entity data. The exact opposite of encapsulation. Exposed object (entity) data is used to join records and filter results sets during querying.

So how do we square the circle? Have a model where data types are hidden, but also where we can perform powerful queries? I don't think you can, and this is why IMO almost all ORM solutions end up exposing the underlying data (all those getters and setters). In a DDD application you learn to live with this ignoring the exposed data and augmenting the data mutation methods with true OO "business" methods that provide data encapsulating, "business domain" behaviour. But given that the data is exposed, there is nothing stopping others accessing the data themselves, breaking encapsulation and bypassing business rules. In fact if you intend to do queries you need exposed data to perform joins, filters etc.

So there is no squaring the circle, and your DDD isn't truely OO. What you have are data records that can also behave like objects, but due to the lack of encapsulation afforded to true objects and the opportunity this provides to violate OO semantics, you cannot say that your design is immune to data type changes. I believe that this is Uncle Bobs main complaint with the Active Record Pattern.

I turn the argument on its head. You are producing an application where you are interested in data types, where you want to display those data types to your users and where you want to explore relationships between data types. This is what we would call a classic database application. In these applications immunity to data type changes is an impossibility. You cannot hide data types, because data is what your user is interested in. Your user is also interested in some behaviour which express business rules, but most of those rules have to do with maintaining data integrity and enforcing relationships between data entities. Your user wants to view his data.

In such an application OO encapsulation amongst domain objects serves very little purpose. Polymorphism is only useful as a means of grouping data types with common attributes, but not as a means of grouping 'objects' with common behavior, and data encapsulation becomes meaningless. So why not forget about objects and data encapsulation and use exposed mutable data types instead? Well functional languages have been using this approach for years, an hashmap (Dictionary) with name/value pairs is a mutable data type. You can represent any data type you like by nesting hashmaps. Accepting that all data will be exposed, and that data types are likely to change is a much better fit for database applications where users want to store, navigate and query data.

Given the hashmap as the primary abstraction, where does behaviour fit in? Well there is data agnostic behaviour such as "Create", "Retrieve", "Update", and "Delete" which applies to all data types. In addition to basic CRUD is querying behaviour like "Select From", "Join" and "Where condition". These are all data type agnostic and could be provided by a framework like LINQ or Rails ActiveRecord. Then there is data type specific behaviour like "Age" which calculates the age of any data type that contains the attribute "date of birth". This would need to be provided by the application programmer and associated with a set of data types.

This is why I think that Bob got it wrong in his conclusion. Data is King in data centric applications (in contrast to behaviour being King). The Active Record pattern as implemented in Rails acknowledges this fact and treats domain entities as pseudo Objects and doesn't try to pretend that they are proper objects with encapsulation. LINQ takes the same approach too.

For database applications where set based data queries are important, then ORM has always been a misnomer in my view. What we have really been doing is data-structure relational mapping. With Rails and LINQ we are now moving into "dictionary relational mapping", which in my opinion is a more natural way to model data centric applications then "ORM".