Saturday, February 09, 2008

IS ORM a Dead End?

I have never been over the moon with ORM. It solves the need to write SQL in your code, and to iterate through database results sets to form data structures, but it never really addressed the mismatch between true Objects and Data in my opinion.

Before going on to explain why, I would also like to say that ORMs also facilitate Domain Driven Design. To gain this benefit though, your application is no longer relational database centric. Instead your database becomes a pseudo object repository, storing object state and object graphs. This is fine for OO programmers but can look strange to those from a pure relational DBMS background. What DDD is really saying IMO is that OO driven application design calls for an OO database. ORM and a RDMS will do, but your DDD model is still OO not Relational.

So where does this leave us? Well hopefully acknowledging that the OO Model and the Relational Data Model are just two different models. Two different ways of modeling the world around us. Which model should we choose? Well it depends on what we are trying to model and why.

Bob Martin wrote an interesting blog post on the OO Model, ORM and the Active Record pattern. According to uncle Bob, the OO Model tries to model the world in a way that provides immunity to data changes. The idea is that data is hidden (encapsulated) inside objects, and that externally no one knows or depends on the objects data type. Instead you communicate with objects through messages. The data types can change as long as the messages and the objects answer to those messages (behavior) remains the same. This is the basic idea behind polymorphisms, which provides for immunity to implementation change, including changes to the encapsulated data type.

The relational model is very different however. Its goal is to allow you to find the data you want quickly. It does this by recognising the relationships between data types and using set based maths to "join" record sets. To do this, the relational model chooses to expose all entity data. The exact opposite of encapsulation. Exposed object (entity) data is used to join records and filter results sets during querying.


So how do we square the circle? Have a model where data types are hidden, but also where we can perform powerful queries? I don't think you can, and this is why IMO almost all ORM solutions end up exposing the underlying data (all those getters and setters). In a DDD application you learn to live with this ignoring the exposed data and augmenting the data mutation methods with true OO "business" methods that provide data encapsulating, "business domain" behaviour. But given that the data is exposed, there is nothing stopping others accessing the data themselves, breaking encapsulation and bypassing business rules. In fact if you intend to do queries you need exposed data to perform joins, filters etc.

So there is no squaring the circle, and your DDD isn't truely OO. What you have are data records that can also behave like objects, but due to the lack of encapsulation afforded to true objects and the opportunity this provides to violate OO semantics, you cannot say that your design is immune to data type changes. I believe that this is Uncle Bobs main complaint with the Active Record Pattern.

I turn the argument on its head. You are producing an application where you are interested in data types, where you want to display those data types to your users and where you want to explore relationships between data types. This is what we would call a classic database application. In these applications immunity to data type changes is an impossibility. You cannot hide data types, because data is what your user is interested in. Your user is also interested in some behaviour which express business rules, but most of those rules have to do with maintaining data integrity and enforcing relationships between data entities. Your user wants to view his data.

In such an application OO encapsulation amongst domain objects serves very little purpose. Polymorphism is only useful as a means of grouping data types with common attributes, but not as a means of grouping 'objects' with common behavior, and data encapsulation becomes meaningless. So why not forget about objects and data encapsulation and use exposed mutable data types instead? Well functional languages have been using this approach for years, an hashmap (Dictionary) with name/value pairs is a mutable data type. You can represent any data type you like by nesting hashmaps. Accepting that all data will be exposed, and that data types are likely to change is a much better fit for database applications where users want to store, navigate and query data.

Given the hashmap as the primary abstraction, where does behaviour fit in? Well there is data agnostic behaviour such as "Create", "Retrieve", "Update", and "Delete" which applies to all data types. In addition to basic CRUD is querying behaviour like "Select From", "Join" and "Where condition". These are all data type agnostic and could be provided by a framework like LINQ or Rails ActiveRecord. Then there is data type specific behaviour like "Age" which calculates the age of any data type that contains the attribute "date of birth". This would need to be provided by the application programmer and associated with a set of data types.

This is why I think that Bob got it wrong in his conclusion. Data is King in data centric applications (in contrast to behaviour being King). The Active Record pattern as implemented in Rails acknowledges this fact and treats domain entities as pseudo Objects and doesn't try to pretend that they are proper objects with encapsulation. LINQ takes the same approach too.

For database applications where set based data queries are important, then ORM has always been a misnomer in my view. What we have really been doing is data-structure relational mapping. With Rails and LINQ we are now moving into "dictionary relational mapping", which in my opinion is a more natural way to model data centric applications then "ORM".

13 comments:

Greg said...

Perhaps you can elaborate on how LINQ does this as in my mind it doesn't.

LINQ is monads, LINQ is just as valid in say a domain model as it is in an AR model.

Perhaps you are thinking about LINQ to SQL? http://msdn2.microsoft.com/en-us/library/bb425822.aspx which is another product.

Cheers,

Greg

Paul said...

Hi Greg,

Yes. I mean LINQ to SQL. You could do the same with Lisp, which doesn't encapsulate object state either. But of course with Lisp you would have to write the functions yourself.

What do you mean when you say LINQ is monads? I've never used LINQ. I read the article you linked to at around the same time I was looking into common lisp and a light bulb switched on for me.

ReverseBlade said...

I think Linq to SQL is just pointless. It goes from database to domain. With NHibernate for example "YOU FIRST FORGET ABOUT THE DB AND DESIGN YOUR DOMAIN MODEL CLASSES REGARDLESS OF DB INCLUDING INTERFACES and INHERITENCE" then let the Nhibernate to the job for you (creating the tables and relations)

I am using NHibernate for more than 1 year extensively and it really helps me to forget about Relational Database

David said...

@ReverseBlade

The problem is that, in 80% of the cases, you CANNOT forget about databases. In more than 15 years doing development, I have seldom started a program from scratch. Usually, the databases I'm using have already been designed. And I have to live with them.

Mega said...

I couldn't agree more with paul when he mention "Data is King in data centric applications (in contrast to behaviour being King)".

In most of the application serving business, I think that data is the king instead of behaviour. But when does the behaviour thing come into the play? In my opinion, behaviour is the king when we are preparing the presentation layer. More precisely, it is the user interface part of an application that really care about behaviour.

In terms of MVC architecture, the model part cares more about business rules instead of the behaviour of those artifical data objects. Think about CRM, ERP,core banking, payment system and the other applications for business. The business itself care about data. They don't care about data objects. Data objects are invented by software designers. After we rock our brain, we software workers work hard to model data as objects. At the end, we have to work even harder to translate objects back to data so that the users can understand. Is it worth fooling around like this?

I have been doing development in mainframe for over 15 years, too. And now playing with Java, ruby, and rails. I like what David said "Usually, the databases I'm using have already been designed. And I have to live with them."

Paul said...

@reverseblade
1 year with NHibernate is still early days. Over time you are likely to find that ORM solves some problems, but also creates problems of its own. If I were to take your suggestion of ignoring the relational model completely, then OODBMS would be the logical solution. For one reason or the other OODBMS just haven't taken off.

@mega
"Usually, the databases I'm using have already been designed. And I have to live with them."

I've been in this situation too. My take is it often leads to procedural code. My view is that in such situations refactoring the DB schema to make it more OO is perhaps a good idea. Of course this could break other DB clients, but this raises the issue of whether applications should be coupled through a shared database anyway or whether applications should interact solely through an SOA? You can also use database views to decouple your applications of course.

Not refactoring the DB when needed, is a cop out IMO and will probably lead to short lived, hard to maintain applications. Having said this, faced with deadlines, you often have no choice other than to live with the database you've got.

William said...

Hello Paul.
I’m back from a long silence (several months).

I was recently reading an article at InfoQ, where Mike Keith from TopLink and JPA, Ted Neward who is an evangelist to OO databases, Carl Rossenberger from db4objetcts, and Craig Russell from JDO, discussed the current state of java persistence. The responses clearly indicate there is a problem in ORM since we are working our Business logic in a model and the data is in another one.

My take is different, actually taking into account that the requirements for both models are precisely different. Here I have some points I think are fundamental about the discussion:

1. We have a problem space set in a domain, and a solution usually set into a technological domain. OO is all about setting the solution as a model in the problem space.

2. Applications are there to solve problems. Thus, there is not such thing as a Data-Centric Application. Maybe developers think of Information-Centric Applications, where users look at information and not raw data. The cooking of the data is out of sight, forcing the user to deal with detailed data is not solving a problem at all.

3. Data storage and processing are a very different kind of animal from business logic implementation.

Thus, the main idea is: OO is not to handle massive raw data, it will try to group data into problem domain’s concepts and treat them as a whole. On the other hand, relational modeling is to model data, not real world business logic. The mathematical model is to handle relations as abstractions and allows applying those computations to one or to infinite number of records just the same.

What should I do then? I should separate the data processing logic from the business logic. Data processing needs to be done in a specialized environment. I model the solution of the problem in the problem space, using the same concepts in the domain as objects. I don’t care of types or how a date is composed of 7 subfields. Inside the object, away from my eyes, the behavior either requests someone else (a store procedure?) to handle the data processing, raw data cooking, to obtain rough grain information. Don’t you want to handle SQL? You can use ActiveRecord to run queries “natively” in your language (not that language and object modeling are two different things). But those AR things are skeletons and belong to your close: no one in the solution (I mean, the other objects) should know about them, since they are not part of the modeling of the solution.

Yes, this requires a mind shift. I will probably blog more about this (and surely answer a couple of blogs out there ).

William Martinez Pomares
http://blog.acoscomp.com

Paul said...

Hi William,

Nice to hear from you.

"2. Applications are there to solve problems. Thus, there is not such thing as a Data-Centric Application. Maybe developers think of Information-Centric Applications, where users look at information and not raw data. The cooking of the data is out of sight, forcing the user to deal with detailed data is not solving a problem at all."

No such thing as a data centric application? This statement has got me thinking. You may be right. I guess I'm focusing too much on how IT systems are traditionally used. My users see our application as a database, albeit a database with a number of data integrity constraints. Perhaps they shouldn't?

"Yes, this requires a mind shift. I will probably blog more about this (and surely answer a couple of blogs out there )."

I agree with the mind shift. In my experience most CRUD applications are behaviour light. Seldom straying beyond data validation. I agree that OO is a means of modeling the problem. The issue I think is how the problem is envisioned by the users.

If you see your problem as a bunch of data that needs to be processed using set based maths (MS Excel style), then the relational model works. If you see your problem as a bunch of interacting agents that display behaviour and hide state then pure OO is a better fit I guess.

It seems to me that how the users envision themselves interacting with the system is key here. My users are very comfortable with MS Excel and tend to think (specify) that way. But they could possibly be helped to think in a better way I guess. I look forward to your blog post.


Paul.

Paul said...

Hi William,

I just had a thought. In my problem space there are agents that display behviour, these agents fall outside the system scope. My agents are people. So the people are the Objects. UI Forms are the way these Objects interact with their state (not all users have access to all forms and hence all state) and a database is how state is persisted.

Viola! - a data-centric system.

My application is basically a communication mechanism. Much like a notice board. The people exhibit behaviour and the system just stores state.

Is this a good idea? My gut instinct is that it depends on the business context. Deciding what the system should do is a people question. I've seen people frustrated by systems that tried to do to much (like workflow systems that made hardwired business decisions where human judgment would have provided for more flexibility). I've also seen people frustrated by system that did too little (like leaving the user to compute stuff himself with a calculator).

In my experience just getting people sharing common data is a massive process improvement for most organisations. So CRUD although technically primitive can deliver huge business benefits.

I may have missed your point. But I thought this idea of system context may be something you want to address in your post.

William said...

What a spicy discussion! :)

Sure, the idea of modeling in the problem space is to have the user happy, since it will work with the system that talks the same concepts he is talking about!

If the problem is how to persist detailed data, and the user knows what a record is and what each little data field is about, then your system should model archives and documents! Probably the user may not need to know about foreign keys or codes, or IDs, so those may be hidden. Actually, if documents are the main concept, we may be talking not relational but XML native databases?

So, we are on the right track! Actually, I totally agree: systems are for people, not for languages or technologies, nor even developers. So, we shouldn't make the user learn about the implementation to use it!. That is information Hiding, the great concept from old Parnas.

Let me work a little on that.

William.

Rick said...

I was watching a television program concerning the benefits of the Viagra Online and then the link for this blog showed up on the screen that's cool.

jesonsmith said...

I commonly see unexceptional views on the subject but yours it's written in a pretty unusual fashion. Surely, I will revisit your site for additional info.generic viagra

philip said...

Its really a nice post.thanks a lot for sharing.


Kamagra Online