show all comments

DSM-tech

Code generation performance comparison

April 17, 2008 19:55:21 +0300 (EEST)

I've just finished booking our trip to Code Generation 2008, whose program is now published. One talk I'm particularly looking forward to is Bran Selic's keynote on how DSM can meet the highest standards of performance needed for generated code in Quality of Service constrained applications. Our experience too is that while generated code cannot outperform the best handwritten code (how could it?), with DSM and domain-specific generators it does outperform the average handwritten code, so the overall speed of the whole system is better.

Thinking back to Code Generation 2007 reminded me that the performance of the code generator itself can also be an issue. One company I talked to had a modeling tool generating code as part of a nightly build. The problem was, they kept running out of "night". They were already up to a four processor machine dedicated to running the generation, and that wasn't able to finish the job overnight. What really surprised me was that they only had a few hundred diagrams. I've seen organizations with many gigabytes of models - orders of magnitude more than this case - managing just fine

An article in IEEE Software (Cuadrado & Molina, Sep-Oct 2007) examined the performance of the Eclipse MDD tools for code generation, comparing it with DSLs made with Ruby. They took the same UML model as a starting point, and followed the common Eclipse practice of a model-to-model transformation first to make a "Java model", then a model-to-text transformation to make the .java code files. For the former they used ATL, and for the latter MOFScript; in Ruby there were two corresponding DSLs.

The model was very simple: 40 classes and 50 inheritance relationships, with each class defining around 6 attributes. The code generated was what you'd expect: a one-to-one mapping to .java files, with accessor methods for each attribute, giving a total of 2550 LOC.

Since I'm inherently incapable of resisting a competitive challenge, I imported the UML model from the Eclipse format into MetaEdit+, and made a code generator in MERL to output the same code. Here then are the results: times for Eclipse and Ruby are from the article, my MetaEdit+ time is on comparable hardware.

Time to generate Java code for a UML model: Eclipse 5.423s, Ruby 3.557s, MetaEdit+ 0.176s

I guess the graph speaks for itself: MetaEdi+ is over 30 times faster than Eclipse, and over 20 times faster than Ruby. Even if we ignore all the reading and writing that Eclipse has to do, MetaEdit+ is still over 20 times faster. Since I imagine some people won't be too happy with those results, let's make some things clear:

  • Having two phases, M2M then M2T, roughly doubles the times for Eclipse and Ruby. All the M2M phase really does is add a pair of accessor operations for each attribute, which in my opinion belongs in the code generation phase anyway. The MetaEdit+ generator is both faster and simpler than the combination of the ATL and MOFScript generators, and IMHO this would continue to be true even for much more complicated generators.
  • A modeler in MetaEdit+ can just run the generator, which is executed on his model and corresponding metamodel in memory. In Eclipse, he must first save the model (I'm assuming above the time for that is zero), then the generator must parse that XML file, and the corresponding metamodel. For M2M stages (Eclipse and MDA proponents often envisage many), the generator must also read the metamodel for the output format, and perhaps serialize the result into XML to write a temporary model for input to the next stage. I believe the MetaEdit+ approach is better suited to what developers actually need.
  • For a nightly build, or other occasions when the model is not already open in MetaEdit+, it would have to be read first. This adds 0.276ms, although that figure may be rather unfair to MetaEdit+. We are loading a full UML class with all its information, as opposed to just the class name and attribute names and types in the Eclipse XML file. If all the extra class and attribute information was filled in, MetaEdit+ would be hardly any slower, but the Eclipse and Ruby tools' time to parse the XML models would increase considerably.
  • I could include the time to import the Eclipse XML file into MetaEdit+, but that seems unfair: it's the native format for the Eclipse tools and the Ruby DSLs here, so MetaEdit+ too should start from its native format, as a modeler would. If the Eclipse guys build an importer that reads MetaEdit+ repositories, we can include and compare the times for "import from other tool". For the record: reading the XML file took 5ms, executing the translation to MXM, which MetaEdit+ can import, took 146ms, and importing the MXM file to the MetaEdit+ repository took 1.72s. Building the translator from XMI to MXM took a little under an hour, and used MERL's reverse engineering features, new in 4.5 SR1.

Of course, the Eclipse tools will get faster -- as will MetaEdit+. I think the main difference is one of architecture, though, and internal data structures and algorithms. Changing some of those should be possible, but some -- like EMF -- will be hard to rip out of Eclipse modeling without breaking absolutely everything else.

It would be interesting to see the results for other tools like oAW or Microsoft's DSL Tools' T4. Any competitive natures in those teams? :-). Finally, many thanks are due to Jesús Cuadrado, who provided me the models and generators used in his article, as well as the details of the environment from their tests, to make mine as comparable as possible.