Friday, October 03, 2008

Comprehending Large Software Systems

William Martinez has started blogging again. He as an interesting blog outlining the roots of emergent design. It is worth a read. One factor that limits peoples abilities to apply emergent design effectively in a team context, is the high degree of design communication needed across the team. Since the design is always changing, maintaining a shared comprehension of the entire system design across the whole team can be a challenge.

The XP Solution
Emergent design is the design approach adopted by XP, and like other XP practices emergent design relies on the presence of other complimentary practices which are cleverly woven together to reinforcing each other, the whole being greater then the sum of the parts. The practices that come to mind are common code ownership where every developer on the team "owns" all the code and hence is responsible for comprehending all the design. Another is pair programming, where the quality of the code is policed by continuous peer review and design knowledge is communicated across the team by rotating pairs. TDD helps by documenting the design specification as executable tests, the design specification is then kept up to date by refactoring the tests along with the code. Another XP practice is small teams working in a "bull pen" or some other type of congenial space. The small size reduces the number of paths of communication, so if one developer makes a design change he only needs to broadcast that change across a small number of paths to ensure that a fully connected network of design knowledge is maintained. The congenial space facilitates 'osmotic' communication, where the communication of small day-to-day design changes and refactors occurs coincidentally, merely by people over hearing design discussions amongst pairs of programmers, or perhaps by the team breaking out and quickly discussing a design choice facing someone, coming to a joint decision, and going back to work with a new shared understanding.

Its All about People
With small teams seeded with people skilled in these XP practices, the XP approach to maintaining system comprehension amongst a team works very well. It does require a number of subtle skills though, many of which are right brained 'soft skills'. I usually recommend a ratio of 1 to 3, skilled practitioners to novices. With such a ratio I find that the novices pick up the idea pretty quickly. Interestingly Williams discussion on emergent design also identifies the need for teachers and learners, when it comes to mastering the skills needed, and of course we are all learners when it comes to discovering the emerging design of a new system.

Code as a Communication Medium
In my experience once you have experienced this informal means of maintaining system comprehension, you never really see the need for the types of traditional system documentation we have all gotten use to. Novices in these informal communication tools soon become masters themselves and go on to seed other teams. Unfortunately, in many environments we do not have the luxury of creating the right conditions for informal osmotic communication. So what then?

William made the following comment which is thought provoking:
Finally: Documents per se are not evil, but I have come to realize that the documentation tools may not serve the goals. I mean, I need to be able to take a 10000 feet look to a system, to see the big picture. But, I will not be able to see it looking at just code. A word document is of no help. I need something else.

The code is not sufficient? Why? I guess it depends on the prior context of the reader, and also on the quality of the code. In many organisations, people not directly responsible for the code are called upon to make comment and review 'other peoples' code. Now in this scenario the reader has very little context. The XPer would say, well you've broken the tenant of common code ownership. If the reader was part of the team, then he would also be part of the osmotic process and would have a great deal of context to draw upon when reading any given section of code. Yes, but lets assume that the environment is not conducive to "common code ownership". What then?

Readable Code
The other issue is the readability of the code itself. This can be a self fulfilling prophecy: people looking elsewhere other then the code to gain system comprehension, so the code itself need not be comprehensible. For XPers, their code is the main means of design communication, so they are very fussy about their code. For XPers production code and test code together encapsulate a body of knowledge that describes the design choices that they have made along the way. Kent Beck chooses to make a distinction between 'quality' code and 'healthy' code. 'Quality' code is code which has desirable external attributes, like a low number of bugs. Healthy code is code with desirable internal attributes, that allow the code to be maintained, changed and extended even when the team looses people and new people come on board. Healthy code is easy to comprehend and well designed. I don't make this distinction myself. I see health as part of quality, but it is interesting that Kent does make a distinction, perhaps in an attempt to raise the profile of internal code quality which in some environments is readily overlooked.

Domain driven design and the idea of an ubiquitous domain language which is carried through into the code, is yet another method to improve both the comprehension and the design of code. Cryptic names in code have long been seen as a barrier to comprehension, and lots have been written about 'self documenting' code and how to write code which is more comprehensible. So we do know how to write readable code. The real issue is whether readable code is truly valued by the team and seen as a necessity.

My experience is that after being given an overview of a system, perhaps at a white board, if the code is healthy and good package, class, method etc names are chosen and if the code as a good accompanying test suite, then with the help of an IDE, comprehending a system just from the code is possible. For me it normally takes an after noon or two and in the end I usually end up with a list of questions to be answered by someone who "owns" the code. Questions answered, I normally have a reasonable comprehension, and more importantly an understanding of the 'true' design , not what was originally envisioned by the architect at the beginning of the project, but has long since been superseded as the true design emerges during 'implementation'.

Other Documents
Having said this, two days of my time, and perhaps a day of someone else's is a lot to spend on comprehending a system, especially if I do not intend to work on it. If the code is not very readable it could take me a lot longer. It is ironic that the kind of organisation that does not value readable code is also likely to expect people external to the team to comprehend and evaluate the system in an instant purely from documents, forgetting that the code is the system and hence the most important document of them all. In many organisations this scenario exists so what then? I have found the use of a Software Maintenance Handbook very useful in the past. A guide for the explorer who needs to comprehend the system. A user guide, can be most informative too, especially when it comes to answering the question "what problem is this system trying to solve ?" But even with these documents, assuming that they are kept up to date, there will still be gaps in understanding, so what should you do?

Magical Tools
Lets assume that I had a magic wand that would produce the 10,000 foot view of a system at an instant, just by tapping my wand on the hard disk containing the code :) What then? would I really comprehend the system without speaking to anyone else? In my experience the answer is no. I would understand the structure of the system, the 'what', but I wouldn't know the 'why'. To know the why, I need to know abut the problem domain. In a complex domain space, my lack of understanding of the domain, would be a barrier to system comprehension in itself. So even with a magic wand there is no easy answers, and in the end you need to speak to some one, perhaps face-to-face.

Better languages
If your organisation doesn't allow people to take the time to explain the system to you, then perhaps the left side of our brain may be able to help out a bit. Domain driven design and object orientation both promise to help system comprehension through good naming and the use of modularity. Whilst some OO languages are turtles (objects) all the way down, most are not modular when it comes to high level 'architectural' abstractions. Newspeak is a new OO language that has borrowed ideas from Beta to allow you to define abstractions larger then a class. Once you have a language like Beta or Newspeak that allows you to capture high level design decisions in code, then you can use a code browser to project views of just the high level abstractions, filtering out all the details and providing the 10,000 foot view at an instant. Martin Fowler has been talking about such browsers under the label 'language work benches'. Martins focus has been DSLs and their comprehension and use, separate from the host language in which they reside, but the same idea can be applied to architectural layers, modules, components, libraries, packages etc, any abstraction that can be captured by your language and projected independently as a view. The Beta language already shows how this can be done, by allowing a graphical view of system components gleamed from the code. This idea should be familiar to anyone who has used TogetherJ, the Java/UML documentation tool.

Fix the root cause
As you can tell, my view is that system comprehension is often a people problem, born out of the ways people choose to organise themselves and the values and principles they choose to adhere to. The whys and wherefores for any given system should be tacit knowledge within the team that created that system. Tacit knowledge is born out of daily work and socialised informally across the team. It is only when we restrict the flow of tacit knowledge by creating artificial barriers within the team, or geographically splitting the team, or creating teams that are too large, that we create the need for formal communication. Formal intermediate work products are then handed-off from one part of the team to the next. These hand-offs are points of weakness and are best avoided.

Most software organisations are dysfunctional in this regard, by this I mean that their organisation is not well suited to their intended purpose, namely the production of high quality software. The solution? Change the organisation. I accept that most of us aren't empowered to make this type of change, but lets not forget that the root cause of poor comprehension is often self imposed.

Conclusion
So in summary, system comprehension is about communication. Ultimately the code is the design and hence the best medium for communication. With advances in language design the code can be made a better medium for expressing system structure at the 10,000 feet view, but comprehension is not all about structure, it is about meaning and purpose too, and to comprehend these requires a grasp of the domain. Gaining this knowledge often means talking to people.

As human beings competent in applying both sides of our brains, talking and listening as a way of comprehending a system should come naturally. When it doesn't then perhaps its a sign of a deeper problem, such as organisational dysfunction.

9th October 2008 - Expanded the section on 'Fixing the root cause' to clarify what I feel is the main barrier to system comprehension, poor organisation; in response to comments made by William Martinez.