Friday, October 03, 2008

Comprehending Large Software Systems

William Martinez has started blogging again. He as an interesting blog outlining the roots of emergent design. It is worth a read. One factor that limits peoples abilities to apply emergent design effectively in a team context, is the high degree of design communication needed across the team. Since the design is always changing, maintaining a shared comprehension of the entire system design across the whole team can be a challenge.

The XP Solution
Emergent design is the design approach adopted by XP, and like other XP practices emergent design relies on the presence of other complimentary practices which are cleverly woven together to reinforcing each other, the whole being greater then the sum of the parts. The practices that come to mind are common code ownership where every developer on the team "owns" all the code and hence is responsible for comprehending all the design. Another is pair programming, where the quality of the code is policed by continuous peer review and design knowledge is communicated across the team by rotating pairs. TDD helps by documenting the design specification as executable tests, the design specification is then kept up to date by refactoring the tests along with the code. Another XP practice is small teams working in a "bull pen" or some other type of congenial space. The small size reduces the number of paths of communication, so if one developer makes a design change he only needs to broadcast that change across a small number of paths to ensure that a fully connected network of design knowledge is maintained. The congenial space facilitates 'osmotic' communication, where the communication of small day-to-day design changes and refactors occurs coincidentally, merely by people over hearing design discussions amongst pairs of programmers, or perhaps by the team breaking out and quickly discussing a design choice facing someone, coming to a joint decision, and going back to work with a new shared understanding.

Its All about People
With small teams seeded with people skilled in these XP practices, the XP approach to maintaining system comprehension amongst a team works very well. It does require a number of subtle skills though, many of which are right brained 'soft skills'. I usually recommend a ratio of 1 to 3, skilled practitioners to novices. With such a ratio I find that the novices pick up the idea pretty quickly. Interestingly Williams discussion on emergent design also identifies the need for teachers and learners, when it comes to mastering the skills needed, and of course we are all learners when it comes to discovering the emerging design of a new system.

Code as a Communication Medium
In my experience once you have experienced this informal means of maintaining system comprehension, you never really see the need for the types of traditional system documentation we have all gotten use to. Novices in these informal communication tools soon become masters themselves and go on to seed other teams. Unfortunately, in many environments we do not have the luxury of creating the right conditions for informal osmotic communication. So what then?

William made the following comment which is thought provoking:
Finally: Documents per se are not evil, but I have come to realize that the documentation tools may not serve the goals. I mean, I need to be able to take a 10000 feet look to a system, to see the big picture. But, I will not be able to see it looking at just code. A word document is of no help. I need something else.

The code is not sufficient? Why? I guess it depends on the prior context of the reader, and also on the quality of the code. In many organisations, people not directly responsible for the code are called upon to make comment and review 'other peoples' code. Now in this scenario the reader has very little context. The XPer would say, well you've broken the tenant of common code ownership. If the reader was part of the team, then he would also be part of the osmotic process and would have a great deal of context to draw upon when reading any given section of code. Yes, but lets assume that the environment is not conducive to "common code ownership". What then?

Readable Code
The other issue is the readability of the code itself. This can be a self fulfilling prophecy: people looking elsewhere other then the code to gain system comprehension, so the code itself need not be comprehensible. For XPers, their code is the main means of design communication, so they are very fussy about their code. For XPers production code and test code together encapsulate a body of knowledge that describes the design choices that they have made along the way. Kent Beck chooses to make a distinction between 'quality' code and 'healthy' code. 'Quality' code is code which has desirable external attributes, like a low number of bugs. Healthy code is code with desirable internal attributes, that allow the code to be maintained, changed and extended even when the team looses people and new people come on board. Healthy code is easy to comprehend and well designed. I don't make this distinction myself. I see health as part of quality, but it is interesting that Kent does make a distinction, perhaps in an attempt to raise the profile of internal code quality which in some environments is readily overlooked.

Domain driven design and the idea of an ubiquitous domain language which is carried through into the code, is yet another method to improve both the comprehension and the design of code. Cryptic names in code have long been seen as a barrier to comprehension, and lots have been written about 'self documenting' code and how to write code which is more comprehensible. So we do know how to write readable code. The real issue is whether readable code is truly valued by the team and seen as a necessity.

My experience is that after being given an overview of a system, perhaps at a white board, if the code is healthy and good package, class, method etc names are chosen and if the code as a good accompanying test suite, then with the help of an IDE, comprehending a system just from the code is possible. For me it normally takes an after noon or two and in the end I usually end up with a list of questions to be answered by someone who "owns" the code. Questions answered, I normally have a reasonable comprehension, and more importantly an understanding of the 'true' design , not what was originally envisioned by the architect at the beginning of the project, but has long since been superseded as the true design emerges during 'implementation'.

Other Documents
Having said this, two days of my time, and perhaps a day of someone else's is a lot to spend on comprehending a system, especially if I do not intend to work on it. If the code is not very readable it could take me a lot longer. It is ironic that the kind of organisation that does not value readable code is also likely to expect people external to the team to comprehend and evaluate the system in an instant purely from documents, forgetting that the code is the system and hence the most important document of them all. In many organisations this scenario exists so what then? I have found the use of a Software Maintenance Handbook very useful in the past. A guide for the explorer who needs to comprehend the system. A user guide, can be most informative too, especially when it comes to answering the question "what problem is this system trying to solve ?" But even with these documents, assuming that they are kept up to date, there will still be gaps in understanding, so what should you do?

Magical Tools
Lets assume that I had a magic wand that would produce the 10,000 foot view of a system at an instant, just by tapping my wand on the hard disk containing the code :) What then? would I really comprehend the system without speaking to anyone else? In my experience the answer is no. I would understand the structure of the system, the 'what', but I wouldn't know the 'why'. To know the why, I need to know abut the problem domain. In a complex domain space, my lack of understanding of the domain, would be a barrier to system comprehension in itself. So even with a magic wand there is no easy answers, and in the end you need to speak to some one, perhaps face-to-face.

Better languages
If your organisation doesn't allow people to take the time to explain the system to you, then perhaps the left side of our brain may be able to help out a bit. Domain driven design and object orientation both promise to help system comprehension through good naming and the use of modularity. Whilst some OO languages are turtles (objects) all the way down, most are not modular when it comes to high level 'architectural' abstractions. Newspeak is a new OO language that has borrowed ideas from Beta to allow you to define abstractions larger then a class. Once you have a language like Beta or Newspeak that allows you to capture high level design decisions in code, then you can use a code browser to project views of just the high level abstractions, filtering out all the details and providing the 10,000 foot view at an instant. Martin Fowler has been talking about such browsers under the label 'language work benches'. Martins focus has been DSLs and their comprehension and use, separate from the host language in which they reside, but the same idea can be applied to architectural layers, modules, components, libraries, packages etc, any abstraction that can be captured by your language and projected independently as a view. The Beta language already shows how this can be done, by allowing a graphical view of system components gleamed from the code. This idea should be familiar to anyone who has used TogetherJ, the Java/UML documentation tool.

Fix the root cause
As you can tell, my view is that system comprehension is often a people problem, born out of the ways people choose to organise themselves and the values and principles they choose to adhere to. The whys and wherefores for any given system should be tacit knowledge within the team that created that system. Tacit knowledge is born out of daily work and socialised informally across the team. It is only when we restrict the flow of tacit knowledge by creating artificial barriers within the team, or geographically splitting the team, or creating teams that are too large, that we create the need for formal communication. Formal intermediate work products are then handed-off from one part of the team to the next. These hand-offs are points of weakness and are best avoided.

Most software organisations are dysfunctional in this regard, by this I mean that their organisation is not well suited to their intended purpose, namely the production of high quality software. The solution? Change the organisation. I accept that most of us aren't empowered to make this type of change, but lets not forget that the root cause of poor comprehension is often self imposed.

Conclusion
So in summary, system comprehension is about communication. Ultimately the code is the design and hence the best medium for communication. With advances in language design the code can be made a better medium for expressing system structure at the 10,000 feet view, but comprehension is not all about structure, it is about meaning and purpose too, and to comprehend these requires a grasp of the domain. Gaining this knowledge often means talking to people.

As human beings competent in applying both sides of our brains, talking and listening as a way of comprehending a system should come naturally. When it doesn't then perhaps its a sign of a deeper problem, such as organisational dysfunction.

9th October 2008 - Expanded the section on 'Fixing the root cause' to clarify what I feel is the main barrier to system comprehension, poor organisation; in response to comments made by William Martinez.

5 comments:

wmartinez said...

Hello Paul.
I'm glad my comment triggered this long study of the communications needs in team development.

I guess you ask two questions related to one of my lines. I hope I can help with an answer:

"The code is not sufficient? Why? I guess it depends on the prior context of the reader, and also on the quality of the code".

Actually, it is enough, but not helpful. You see, you can draw a map of the New York city by walking all its streets, but it will take long enough. From a satellite photo, it may be easier! So, you may want to explain someone new how components interact, and you may draw boxes in a whiteboard, not actually give him thousands on code lines and let him figure it out.

Now: "Lets assume that I had a magic wand that would produce the 10,000 foot view of a system at an instant, just by tapping my wand on the hard disk containing the code :) What then? would I really comprehend the system without speaking to anyone else? In my experience the answer is no."

There are actually some of those wands. Tools to draw the classes and dependencies from the code. But still, they are not helpful when there are thousands of classes. But my needs are not to understand the system I've never seen (that is software archeology). I would need to see the picture from a level where I can understand the problems I can't see at code level, from a system I know. I know that you say you may able to see the problems at code level, but in my experience I'm able to see different problems at different levels, so I do need that. Maybe my brain is different :0D

Lastly, "My experience is that after being given an overview of a system, perhaps at a white board, if the code is healthy and good package, class, method etc names are chosen and if the code as a good accompanying test suite, then with the help of an IDE, comprehending a system just from the code is possible. For me it normally takes an after noon or two and in the end I usually end up with a list of questions to be answered by someone who "owns" the code. Questions answered, I normally have a reasonable comprehension, and more importantly an understanding of the 'true' design , not what was originally envisioned by the architect at the beginning of the project, but has long since been superseded as the true design emerges during 'implementation'."
The time it takes depend on how well the code is. And as you mention, the result is what the code is and not what it was meant to be. So, opportunity is important. But where you are wrong and right at the same time is in the architecture spot. There are many examples of an architecture as an initial, fixed in stone structure form the beginnings of time. Those are useless. But not all architects work that way. I need to see the true architecture as you call it, and that is why a word document is not enough. I need something live and depicting in the correct way, not lines of text that I will read just like code.

Hope this helps clarify.
William Martinez.

Paul Beckford said...

Hi William,
All good stuff. When I mentioned architects, it was gratuitous and had little to do with my main point. I apologise. What I mean is that designs change over time who ever creates them and documents are often an out of date snap shot in time.


My central theme, is communication amongst people. I would not expect to have a deep understanding of the McClaren F1 racing car just from a casual glance. In fact perhaps the only people in the world that have a deep comprehension of that complex technical system is the McClaren F1 racing team itself.

This makes sense right? So why should software be any different? Why would we expect to gain an understanding without immersing ourselves into the team and the project?

What is the purpose of this casual knowing at a glance? If the team are the experts and are trusted as such, then comprehension is their preserve and rightly so. If I want to joined them in that understanding, then I must be willing to make the necessary investment.

This was my main point, and why I mentioned root causes and dysfunctional Organisation.

Paul.

Paul Beckford said...

I know that you say you may able to see the problems at code level, but in my experience I'm able to see different problems at different levels, so I do need that. Maybe my brain is different :0D

Hi William,

Now this is an interesting one. Here I guess we are starting with the assumption that you (or I) are the authors of the code, and we are trying to comprehend our own code.

For this I agree that models are important, but I would also say that models are ephemeral. For me I have a model in my mind, but as the design evolves so does my mental model.

I agree that it helps to take the model out of my mind at times, and to stand back and look at it. For this a white board is great, or even the back of a napkin. I reason, and perhaps share and discuss the model with a colleague and come up with improvements.

I then get back to writing software based on a new mental model. Once I have captured my new model in code, I wipe the white board clean, or throw away the napkin. These temporary, ephemeral models have served their purpose and are no longer needed.

This is how it works for me. I'm interested in finding out how it works for you.

(BTW - I do sometimes keep models, maybe placing them up on the wall or on a wiki, as a reminder to myself and others, but I never forget that anything other then the code is likely to be ephemeral and a momentary snapshot in time)

Paul.

wmartinez said...

Hi Paul.
Hello Paul.
Let me explain using your examples!

1. Imagine a document with all the lines of code, printed. That document becomes obsolete at the next second since one developer changes a line. And Who needs to have the code printed anyway?
But now, imagine the document is not printed. Then, the document is the code itself! (See, I didn't say the document was a Word document,it could be the code file).
But now imagine we are reading the code, and I want review my thoughts about how communication takes place. I may need to go 15 different places in the code. I can open 15 editor pages to see them all at the time, or I may draw an image in the whiteboard with 15 boxes. I can call that a document, representing the actual code. Changing a line in the 15 places may or may not change the 15 boxes. If one line changes one box, then I can go a change it and I have another snapshot.
The issue may be the vision of a document we are not sharing. A document for me is something I can read, that tells me something about my system that is valuable. The old promised boxes that turned out to be new very different boxes in the current snapshot, may not help.
You can send the whole million of lines of code to the company president to let answer one question he has. I would better send him a line in business language he can understand.

2. The F1 car. Makes some sense, but not all. I don't need to know the car in the inside, nor every electrical line or physical component. But I may need to know the circuit it will race in. If slippery, snowy, hot, levelling, all that may indicate a change I need to suggest in tires, direction, stabilization. SO, I'm embedded in the team, working on the tires system, suddently I need to stand back and see the road, take my decision and go back to the tires.
Again, you think I'm talking about a totally new guy that only wants to see the car on the outside and then suddenly decide we need a new engine. That makes no sense and of course that is not the idea. (Still sometimes people comes to me for help, I drae a 10000 feet view and from there I can go drilling down to the line of code I suggest to change).

3. The ephemeral and persistent model is another good one. We can model realities. One is the problem, another one a proposed solution, and another one the final solution. So, you work with ephemeral models of the real solution. The problem model will not be the same, it will change since clients made up their minds every minute. The proposed model... you know. The real model changes everytime a line changes the esential element that is modeled.
Following that, I would create a process to create the model of the proposed solution and morph it as we go developing. Why? Simply because it takes less time than producing the module from scratch everytime I have to discuss or make my mind about something.
Of course, creating a model does not mean documenting it. A model written down is a model in a document. There is where the word document is of no help. It is static, and I need something that is adjusting with code, and that will not call to redraw all the things again when I need it.
Again, I may have an idea about parallel solution and my friend erlang developer is the one that implements it. Maybe I need to match that with Java, which I knew more. Thus, the model is a team solution to understand the problem.


Cheers!
William.

Paul Beckford said...

Hi William,

I agree with your discussion. Lets define some terms. We create all sorts of models for all sorts of reasons. The term 'model' is ambiguous, so I am going to take a concrete example.

When I create a use-case model what am I doing? I am modelling a chosen aspect of the world. This is the thing about models, they are selective. I select an aspect of the world to model depending on what I feel is important and what I want to communicate.

What is important is fluid and can change depending on a number of things, including my audience, which is why I question the worth of investing a lot of time in models that are solely there for communication. For this purpose I find that a white board is fine.

Code on the other hand is concrete and executable and hence is the Software, the end product of my work. It makes a lot of sense investing time here, because high quality Software is what my customer ultimately cares about.

It is not black and white of course. We are talking values and I am saying I value working software over intermediate work products.

OK. I think we agree here, but I just wanted to be clear. So in an imperfect world how do I communicate and deliver high quality code/software at the same time? The thing about code is that it already serves these two purposes:

1. Execute on a system and 'work'.
2. Communicate intent to other programmers.

I believe what you are suggesting is that code can be made a much better communication medium. I agree. Take a close look at Newspeak a language created by Gilad Bracha. He has removed static state, so there is no implicit coupling between software modules (no global names). If you want to use a module within another module then you must explicitly plug them together, passing one as a parameter to the other. Not unlike how you would assemble a car.

Modules themselves are explicit. A module is an object that acts as a container for other objects. Classes are objects and classes can be contained and nested within modules. So you can think of a module as a name space. A class defined within a module is not visible elsewhere within the system unless you explicitly pass the containing module as a parameter. No global state.

Gilad has an interesting paper about this:

http://dyla2007.unibe.ch/?download=dyla07-Gilad.pdf

So Newspeak is like lego blocks. You can build abstractions of any size, label them explicitly and using them as new building blocks.

It is this explicit labeling and linking of modules that lends itself to producing a 10,000 feet view directly from code.

Perhaps we should take this off line. I am interested in the approach you are taking, it sounds interesting. My e-mail is beckfordp at btinternet dot com.

Looking forward to hearing from you.