show all comments

DSM-tech

Code generation performance comparison

April 17, 2008 19:55:21 +0300 (EEST)

I've just finished booking our trip to Code Generation 2008, whose program is now published. One talk I'm particularly looking forward to is Bran Selic's keynote on how DSM can meet the highest standards of performance needed for generated code in Quality of Service constrained applications. Our experience too is that while generated code cannot outperform the best handwritten code (how could it?), with DSM and domain-specific generators it does outperform the average handwritten code, so the overall speed of the whole system is better.

Thinking back to Code Generation 2007 reminded me that the performance of the code generator itself can also be an issue. One company I talked to had a modeling tool generating code as part of a nightly build. The problem was, they kept running out of "night". They were already up to a four processor machine dedicated to running the generation, and that wasn't able to finish the job overnight. What really surprised me was that they only had a few hundred diagrams. I've seen organizations with many gigabytes of models - orders of magnitude more than this case - managing just fine

An article in IEEE Software (Cuadrado & Molina, Sep-Oct 2007) examined the performance of the Eclipse MDD tools for code generation, comparing it with DSLs made with Ruby. They took the same UML model as a starting point, and followed the common Eclipse practice of a model-to-model transformation first to make a "Java model", then a model-to-text transformation to make the .java code files. For the former they used ATL, and for the latter MOFScript; in Ruby there were two corresponding DSLs.

The model was very simple: 40 classes and 50 inheritance relationships, with each class defining around 6 attributes. The code generated was what you'd expect: a one-to-one mapping to .java files, with accessor methods for each attribute, giving a total of 2550 LOC.

Since I'm inherently incapable of resisting a competitive challenge, I imported the UML model from the Eclipse format into MetaEdit+, and made a code generator in MERL to output the same code. Here then are the results: times for Eclipse and Ruby are from the article, my MetaEdit+ time is on comparable hardware.

Time to generate Java code for a UML model: Eclipse 5.423s, Ruby 3.557s, MetaEdit+ 0.176s

I guess the graph speaks for itself: MetaEdi+ is over 30 times faster than Eclipse, and over 20 times faster than Ruby. Even if we ignore all the reading and writing that Eclipse has to do, MetaEdit+ is still over 20 times faster. Since I imagine some people won't be too happy with those results, let's make some things clear:

  • Having two phases, M2M then M2T, roughly doubles the times for Eclipse and Ruby. All the M2M phase really does is add a pair of accessor operations for each attribute, which in my opinion belongs in the code generation phase anyway. The MetaEdit+ generator is both faster and simpler than the combination of the ATL and MOFScript generators, and IMHO this would continue to be true even for much more complicated generators.
  • A modeler in MetaEdit+ can just run the generator, which is executed on his model and corresponding metamodel in memory. In Eclipse, he must first save the model (I'm assuming above the time for that is zero), then the generator must parse that XML file, and the corresponding metamodel. For M2M stages (Eclipse and MDA proponents often envisage many), the generator must also read the metamodel for the output format, and perhaps serialize the result into XML to write a temporary model for input to the next stage. I believe the MetaEdit+ approach is better suited to what developers actually need.
  • For a nightly build, or other occasions when the model is not already open in MetaEdit+, it would have to be read first. This adds 0.276ms, although that figure may be rather unfair to MetaEdit+. We are loading a full UML class with all its information, as opposed to just the class name and attribute names and types in the Eclipse XML file. If all the extra class and attribute information was filled in, MetaEdit+ would be hardly any slower, but the Eclipse and Ruby tools' time to parse the XML models would increase considerably.
  • I could include the time to import the Eclipse XML file into MetaEdit+, but that seems unfair: it's the native format for the Eclipse tools and the Ruby DSLs here, so MetaEdit+ too should start from its native format, as a modeler would. If the Eclipse guys build an importer that reads MetaEdit+ repositories, we can include and compare the times for "import from other tool". For the record: reading the XML file took 5ms, executing the translation to MXM, which MetaEdit+ can import, took 146ms, and importing the MXM file to the MetaEdit+ repository took 1.72s. Building the translator from XMI to MXM took a little under an hour, and used MERL's reverse engineering features, new in 4.5 SR1.

Of course, the Eclipse tools will get faster -- as will MetaEdit+. I think the main difference is one of architecture, though, and internal data structures and algorithms. Changing some of those should be possible, but some -- like EMF -- will be hard to rip out of Eclipse modeling without breaking absolutely everything else.

It would be interesting to see the results for other tools like oAW or Microsoft's DSL Tools' T4. Any competitive natures in those teams? :-). Finally, many thanks are due to Jesús Cuadrado, who provided me the models and generators used in his article, as well as the details of the environment from their tests, to make mine as comparable as possible.

DSM

Domain-Specific Modeling in universities

April 15, 2008 14:47:59 +0300 (EEST)

Alfonso Pierantonio contacted me about using our DSM book on an MDD course he is running at the University of L'Aquila, Italy. He's also researching important topics for DSM: model differencing, evolution and synchronization, and asked about our academic pricing for MetaEdit+. For some reason it only just occurred to me from that message that we should have a section on the DSM Forum site to list universities that teach or research DSM. I can find a lot of them from our customer list and my emails, but if you want to make sure you are included, please add a comment below or contact me.

Coincidentally the very next day I saw the following message in the comp.lang.smalltalk newsgroup, advertising for a PhD position in DSM:

"Supporting the Concept of Early Warning Analysis" (SCEWA) is a 5 year research project that began in January 2008. It is funded by the Irish Environmental Protection Agency.
The research will be focused on the development of methods and tools which are aimed at supporting the analysis, design, and development of early warning systems in engineering facilities and in critical infrastructures whose undisturbed operation is important for maintaining and improving the quality of our everyday life.
Currently, we are seeking to recruit one PhD student to work in the area of Domain Specific Modeling. The objective of the research is to define a domain specific modeling language for early warning systems.
More information about the project and the PhD position: SCEWA web site
PDF version of the announcement
Please forward this information to students who may be interested in this position.
Thank you in advance,
Dr. Ioannis Dokas, jdokas@gmail.com

DSM

XMI still a failure

April 09, 2008 13:58:50 +0300 (EEST)

Three years ago I posted about the lack of adoption and lack of tool interoperability for XMI:

The OMG has XMI versions 1.0, 1.1, 1.2 and 2.0, with 2.1 under development. Looking on Google, I note that there are 865 XMI files on the web using 1.0, 78 for 1.1, 64 for 1.2, and 34 for 2.0 (released in 2003). That gives some indication of the adoption of XMI as a format, and tallies with my own impression. Everybody was interested when it first came out, but most who actually tried to use it found it lacking. One can always hope the situation improves with newer versions...

So have things improved? In a word, no. Google finds 40 files for 2.1 (released in 2005), and the figures for 1.0 have dropped by over 90%. Now even the most used version, 1.2 from over five years ago, only has 136 files, and the figures decline from there. Yet still people I meet believe that XMI is a useful standard that will solve their problems.

What about tool interoperability then? In an article from the MODELS conference, Lundell et al. tested XMI with 14 UML tools. Obviously, the older tools can't load from the newer tools, but can the newer tools read models from the older tools? That's the important question after all, if you want to use XMI as insurance against your tool being discontinued: will this year's tools load last year's models? Here's the table, see the article for full details:

==> Borland Eclipse  Rational  MagicDraw UModel
ArgoUML   Failed Failed Failed Failed Failed
Fujaba Successful Failed Failed Failed Failed
Umbrello Failed Failed Failed Failed Failed
Artisan Failed Failed Failed Failed Failed
Poseidon Failed Failed Failed Successful Failed
Rhapsody Failed Failed Failed Failed Failed
Rose 1.0 Failed Failed Failed Failed Failed
Rose 1.1 Failed Failed Failed Failed Failed
Visio Failed Failed Failed Failed Failed

Looking at the tools supporting the latest version, 2.1, the picture looks darker still. None of these tools were able to import even 2.0. If XMI were really being implemented by these tools to offer interoperability and insurance, why would they drop support for the previous version? That leaves you rather open to claims that the intention is to pretend there is a standard, and then spread FUD by claiming that other tools don't support it. If the only tools that can interoperate are jointly developing the same code base (Eclipse, IBM, Borland), it's hardly impressive if their parallel versions work well together.

In the DSM world, of course, all this is somewhat academic: the tests were of the simplest possible UML Class Diagrams. None of the tools support working with DSM languages from another tool. If you want to move your DSM models from one tool to another, the main cost is the same as the cost of building support for that DSM language in the new tool -- translating the models is easy in comparison. In that area I'd say MetaEdit+ wins hands down: nothing comes close in terms of ease and speed for the metamodeler or features provided automatically for the modeler. But don't listen to me -- listen to our customers, industry gurus and even competitors all saying the same thing.

DSM

Re: A framework for cross platform DSL development

April 04, 2008 12:09:36 +0300 (EEST)

Now that "domain-specific" has become something of a buzzword, people are eager to claim that what they do is domain-specific. For some, just naming a variable or XML tag "person" instead of "x" seems to be enough. For a great take on this from April 1st, see Anders' blog: "A framework for cross platform DSL development".

There's a serious side to it as well: if your DSL or DSM solution locks you in to a certain programming language, IDE, or operating system, that's plain wrong. A Domain-Specific Modeling language should fit tightly with a certain problem domain, but your tools for it should allow you to change the solution domain later, just by building a new generator that operates on the same models. If the tools are tightly coupled to a certain language, IDE, or OS -- or even worse and sadly all too common, to a particular version of a given IDE or OS -- that's hardly giving you the freedom of choice that DSM is meant to bring.

Yes, there is a cost to supporting multiple platforms, and I sometimes have to defend why MetaEdit+ platform support covers all the common cases, but in the long run it pays off. If you focus on just one platform, it's all too easy for the tool to become highly coupled to implementation details in the current OS or IDE version. Not only does that make the tool vendor's life harder when a new version comes out, but for tools that require metamodelers to program, the metamodelers too will find themselves stuck. Their modelers will want the latest version of the IDE for coding in, but the modeling tool will require them to stick with the old version -- or continually pay the cost of this tight coupling at every version upgrade.

DSM

JP's on the road again

March 20, 2008 16:08:18 +0200 (EET)

Our intrepid CEO, Juha-Pekka Tolvanen, has a speaking engagement every week this month. I already mentioned how we enjoyed hosting a group from Japan for the first week: top figures from our customers, distributors, academia and related tool vendors. That was the easy one, since it was mostly on home turf -- just a little domestic flight or two. Next up was DevWeek in the UK, where his session was so full it was "standing room only" -- or actually since DevWeek is mostly developers, the latecomers simply sat on the floor.

The start of this week was one I'm particularly pleased about: SPA 2008. We'd tried a couple of times before to speak there, but weren't accepted. In 2002 the answer was "DSM is old stuff, we've heard it all before". In 2007 the answer was "never heard of these guys". I guess it's partly just the luck of the draw -- which program committee members review your proposal -- and partly who you know. The UK seems to have a relatively, err, tight-knit community in that respect :-). Fortunately, the other English stereotypes of friendliness and modesty also hold true, when you get to know them (or rather "us" - I hope!).

Next week there's a couple of days at a MoSiS workshop, and then he's off to give a keynote at GT-VMT 2008 in Hungary -- a more academic workshop on graphical modeling for generation, simulation and analysis. I'd feel more guilty about not sharing the travelling load this month if it wasn't for that keynote :-). They're always an honour, and of course a nice little ego boost -- hopefully with no long term damage if taken only once a year!

* For the tiny minority of you who aren't into ancient progressive rock, this blog entry's title is homage to the 1978Manfred Mann's Earth Band song, "Davy's on the Road Again" (MP3 clip).

DSM

Domain-Specific Modeling book now available

March 14, 2008 15:15:00 +0200 (EET)

A couple of years ago Juha-Pekka and I decided it was time to distill our experience of DSM over the last dozen years or so, and publish it in a book so others could benefit from the lessons we've learned (some of them the hard way!). That book is now finally released, and last week I had the pleasure of being able to give the first advance copy off the press to Mr. Yoshio Asano from Fujisetsubi, our MetaEdit+ distributors in Japan:

Steve and Juha-Pekka with their book and Yoshio-san

I'm not normally one for having my photo taken -- a policy I think our marketing department would agree with -- but I wanted some evidence that I had held the real finished article in my hands, to keep me going while I wait for our authors' copies to arrive from Wiley.

The first 40 pages introduce DSM and its characteristic benefits, followed by 50 pages defining an architecture for the various parts of a DSM solution. We're especially happy to have over 100 pages covering five in-depth examples of whole DSM solutions, including the modeling language, generators and domain framework, with the background of the cases and how the solution was designed and built. Those cases provide concrete examples used where necessary in the second half of the book, which describes how to build a DSM solution for your own situation: 200 pages of practical advice backed up by solid theory and real-world experience.

There's more information on the book's web site, dsmbook.com, including links to Amazon, where you can order or search inside. And if you want someone else's opinion, I'll blush most impressively as you read the Foreword which Dave Thomas very kindly wrote for us.

Blog-tech

Added CAPTCHA to prevent spam comments

March 14, 2008 01:16:59 +0200 (EET)

I finally caved in to the spammers, and added a CAPTCHA test to the "Add Comment" page. I hate having to inconvenience you to prevent the idiots messing up the commons, but the truth is I don't have time to be cleaning out the spam by hand, so it's either CAPTCHA for the commenters or a mess for all readers. Sorry.

The CAPTCHA system I chose is reCAPTCHA, from Carnegie Mellon:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. ...But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
Currently, we are helping to digitize books from the Internet Archive. In order to achieve our goal of digitizing books, we need your help. If you run a website that suffers from problems with spam, you can put reCAPTCHA on your site.

If you can't read the CAPTCHA image, just click the link to get another one -- neither clicking the link nor entering the wrong words will lose your comment (assuming the web works; I always write my comments elsewhere and paste them in, just in case!).

Since there wasn't a Smalltalk plug-in for the reCAPTCHA API, I made my own. It only took about 30 minutes for the client and server sides combined, and most of that was rejigging some bits to avoid adding an extra dependency on an HTTP helper client. Predictably, the result worked -- almost. This is the web, after all. For some reason, the field to enter the words disappeared if the cursor strayed into the TinyMCE JavaScript rich text editor toolbar. Add 6 hours of testing and hacking with newer JavaScript editor versions, IE, different <div> and CSS layouts etc. In the end I dumped the pretty reCAPTCHA frame and went with the longer-winded custom layout. Simple, boring, works perfectly.

If anybody wants the Smalltalk code for the client and server sides, take a look at Blog-ServletsExtensions from the Cincom public repository. This adds reCAPTCHA to the Silt blog server (Silt-Core 1.139), but it should be easy enough to extract the code for use elsewhere: see the package comment for instructions.

DSM-tech

Comparing tools, plus spatial relations in MetaEdit

January 15, 2008 17:54:37 +0200 (EET)

Steffen Mazanek writes about an interesting metamodeling practical that he's conducting: students have to implement a cut-down UML Class Diagram editor using MetaEdit+, Eclipse GMF or Microsoft DSL Tools. It reminds me of the "Use Case cartoon" experiment where MetaCase and Microsoft both built very simple Use Case diagram support in their respective tools. Doing it with MetaEdit+ was 6 times faster back then, but hopefully Microsoft have caught up somewhat since then. Mind you, those figures were when the tools were used by their developers: when used by students, I'd expect MetaEdit+ to fare better.

Steffen also wrote a nice mini-review of MetaEdit+ . He especially liked the ease of use, the Symbol Editor and the high level of integration (as opposed to the multiple mapping languages of GMF). He said MetaEdit+ would find it hard to support languages using spatial relations, e.g. VEX, which uses visual containment rather like Venn diagrams. I'm not sure I'd agree with that. Here's a picture of something like VEX: each circular object has its name in bold at the top, and at the bottom a list of the objects that it contains (recursively).

VEX-like diagram in MetaEdit+

To build the metamodel took a couple of minutes. MetaEdit+ already understands containment via the 'do contents' structure in its MERL generator language. However, that calculates contents based on the enclosing rectangle of the symbol, whereas for VEX it should be based on circles. Otherwise 4 above will be considered as completely contained in 1: true if you think of their enclosing rectangles, but not for the circles. The little bit of generator script that produces the text at the bottom of the circles therefore needed to be a bit longer than just "do contents { id }". Here's what it took:

Report '_contents'
  /* report all contents of the current object, flattening any nesting */
  do contents
  { subreport '_calcMargin' run
    if $margin >= '0' NUM then :Name endif
  }
endreport

We go through all the objects contained in this object, using the standard "do contents" rectangular definition of containment. For each little object we calculate the margin between it and this circle. If it's positive, the little object is contained and we print its name.

Calculating the margin is done in the calcMargin sub-generator, which saves it in a variable called margin. The formula is simple enough, but might take a moment's thought if your geometry is as rusty as mine:

Report '_calcMargin'
  variable 'margin' write
    /* big object radius - center difference - little obj radius */
    math 
      width;1 '/2 - '
      '((' centerX '-' centerX;1 ')^2 + (' centerY '-' centerY;1 ')^2)^(1/2)'
      ' - ' width '/2'  
    evaluate
  close
endreport

Basically we want to check that the big object's radius is bigger than the distance from the centre of the big object to the outer edge of the little object. The distance to the outer edge is the distance between the centres, plus the radius of the little circle. The distance between the centres is calculated with Pythagoras' theorem, and the radii are just half of the width of the objects. CenterX and width here refer to the little object, whereas the ;1 suffix in width;1 makes it return the width of the big object, one level further out on the element stack -- i.e. from outside the "do contents" loop.

When drawing the symbols, MetaEdit+ thus calculates and displays this list of contained objects. It's even updated on the fly as you drag and scale objects. This lets you do cool things like show big red error signs if someone drags an object into the wrong kind of container.

Putting _calcMargin in its own sub-generator allows us to reuse it from other generators, e.g. to produce an indented "tree" listing showing the containment hierarchy of all the objects (like the default "Object Nesting" generator in MetaEdit+).