hide all comments

DSM

DSM for full code generation

October 04, 2005 00:45:40 +0300 (EEST)

Alan Cameron Wills wrote about the big Domain-Specific Languages track at JAOO, remarking that DSM is "really gathering steam". He went on to say that Microsoft differs from us in generally expecting to generate only part of the code from models, whereas:

MetaCase (usually represented by Juha-Pekka or Steve Kelly) always emphasise how you can build whole systems from DSLs, without any (or much) extra 'glue' code. I guess this is because in general they're working with mature product lines - often in embedded systems like phones.

Alan's right that we have had good success with embedded systems customers throughout the last ten years, but my experience doesn't suggest that either embedded systems or mature product lines are necessary for full code generation.

Indeed, one of the success stories on our web pages is for a start-up company building J2EE web applications in the finance domain. I remember being given a model in the newly-created modeling language, plus 50 pages of hand-coded Java, and being asked to create a code generator from one to the other.

After plastering the wall and floor with the Java, and liberal use of highlighters, sticky tape, scissors and diff, I got the generator as far as I could, which was about 99%. I sent off the results to the customer, along with questions about the information presumably missing from the models or generator that would explain the differences in the last 1%.

The answer was short and sweet: the generated code was 100% right - the problems were in the hand-written code :-).

I've seen that same pattern repeated time and time again over the years. Of course, just generating reams of code which used to be hand-written doesn't mean you've finished that project. Often that is a good time to step back and take stock of what the code looks like. If there's lots of it, more likely that not there are large elements that are duplicated. Those should be abstracted out and replaced by either reuse of the relevant model elements, or new functions that the generated code calls.

This set of new functions and components is what we refer to as the Domain Framework: a set of code which is needed for the majority of products in the domain, and thus should be reused rather than copied. Of course, the better your lead developers have been, the less they will have allowed such code to be hand-written: pretty much everyone agrees that "copy-paste reuse" is even worse than GOTO. Indeed, one main use of GOTO is to make explicit that a piece of code could be used by more than one path through the code. Which reminds me: did I mention that I was taught by The Man Who Invented The Subroutine? But I digress... :-)

So, Alan's right insofar as you often end up moving commonly used blobs of code out of the generated code and into a library. It's clear that it's going to be easier to maintain those bits of code as honest-to-goodness functions/methods/whatever, rather than as long inline boilerplate sections in the middle of a code generator. It's also much better from any number of software development perspectives, and should have been done regardless of the use of DSM. Having a mature product line, such as embedded systems companies often have, just means that they've probably gone a way down that road already: no magic there.

I'd certainly say that in these cases you are still generating 100% of the code you need for a new product. Since you already have the domain framework, that's not some extra coding that needs to be done by hand for a new product: it's simply linked in with the generated code, forming a thin layer between it and whatever components you use, on top of your language library, OS, hardware etc.

For the rest then sure, you should aim for 100%, and more often than not you can achieve it. When you can't, make sure that you have a good separation between the generated code and hand-written code. A good way to do this is to put hand-written code in separate files that call or are called by parts of the generated code. Less aesthetically pleasing ways include putting code fragments into models or using protected sections in the generated files. And never, ever, change the generated code. But then you figured that out anyway, didn't you?