In the keynote at Code Generation, I mentioned that empirical research shows that using UML does not improve software development productivity: depending on the study, reports ranged from -15% to +10% compared to just coding. I guess most people these days know those results from their own experience, but as the reports I was aware of were from the 1990s, it was interesting to see a more up-to-date article recently:
WJ Dzidek, E Arisholm, LC Briand: A Realistic Empirical Evaluation of the Costs and Benefits of UML in Software Maintenance, IEEE Transactions on Software Engineering, Vol 34 No 3, May/June 2008
Unlike many earlier studies, this uses professional developers and reasonably large tasks. The tasks all extended the same Java Struts web application, in total about 30 hours per developer. 10 developers performed the tasks with Java, and another 10 performed the same tasks with Java and Borland's Together UML tool. The developers using UML were somewhat more experienced -- 256 kLOC of Java under their belts rather than 187 kLOC, and 44% longer Struts experience -- but otherwise the groups were similar. Time was measured until submission of a correct solution, giving a reasonably sound basis for comparison. Here are the results:
Compared to just coding, using UML took 15% longer to reach a correct solution (the green bar). In addition, it looks like even using UML to help you understand the code gives no benefit over just reading the code: the blue and red bars are the same length as the purple bar. As the tasks only looked at extending an existing system with existing models, we can't say for sure whether the story is the same in initial implementation, but other studies indicate it.
One bad thing about the article is that it tries to obfuscate this clear result by subtracting the time spent on updating the models: the whole times are there, but the abstract, intro and conclusions concentrate on the doctored numbers, trying to show that UML is no slower. Worse, the authors try to give the impression that the results without UML contained more errors -- although they clearly state that they measured the time to a correct submission. They claim a "54% increase in functional correctness", which sounded impressive. However, alarm bells started ringing when I saw the actual data even shows a 100% increase in correctness for one task. That would mean all the UML solutions were totally correct, and all the non-UML solutions were totally wrong, wouldn't it? But not in their world: what it actually meant was that out of 10 non-UML developers, all their submissions were correct apart from one mistake made by one developer in an early submission, but which he later corrected. Since none of the UML developers made a mistake in their initial submissions of that particular task, they calculated a 100% difference, and try to claim that as a 100% improvement in correctness -- ludicrous!
To calculate correctness they should really have had a number of things that had to be correct, e.g. 20 function points. Calculated like that, the value for 1 mistake would drop by a factor of 20, down from 100% to just 5% for that developer, and 0.5% over all non-UML developers. I'm pretty sure that calculated like that there would be no statistically significant difference left. Even if there was, times were measured until all mistakes were corrected, so all it would mean is that the non-UML developers were more likely to submit a code change for testing before it was completely correct. Quite possibly the extra 15% of time spent on updating the models gave the developer time to notice a mistake, perhaps when updating that part of the model, and so he went straight back to making a fix rather than first submitting his code for testing. In any case, to reach the same eventual level of quality took 15% longer with UML than without: if you have a quality standard to meet, using UML won't make you get there any more certainly, it will just slow you down.
To their credit, the authors point out two similar experiments as related work. One showed UML took 27% longer, the other 48% longer. The percentage of time spent updating models was also larger: 30-35% (which may be because those studies only measured time until the first submission of a solution: correcting bugs was probably mostly coding, so if measured to a correct solution the UML time would only increase a little and hence the percentages would drop).
So what do we learn from all this? Probably nothing new about UML, but at least a confirmation that earlier results still apply, even for real developers on realistic projects using today's UML tools. Maybe more importantly, we can see that empirical research, properly written up, is valuable in helping us decide whether something really improves productivity or not. Ignore the conclusions (they probably existed in the minds of the authors before the paper was written), but look at the data and the analysis. Throw out the chaff, and draw your own conclusions from what is left. Above all, don't blindly accept or reject what they say, just because it agrees or disagrees with your existing prejudice. There's at least a chance that you might learn something!