show all comments

DSM

Code Generation 2009 round-up

June 24, 2009 01:25:20 +0300 (EEST)

Once again, Code Generation proved itself as the best European conference on Model-Driven Development. Lots of smart people, lots of experience, lots of enthusiasm, lots of willingness to listen and learn from others. Even though having to prepare and run some sessions hampered me from seeing as much of the rest as I'd like, there's still too much to write for one blog post. I'll post about things I'm certain of first, and come back to things like Xtext and MPS after further investigation.

Keynotes

The two keynotes, both presented as a double act by me and Markus Völter, seemed to go down well. Mark Dalgarno had a surprise up his sleeve, presenting us with a blind choice of weapons from a black bag. We then had to duel it out, graphical DSM against textual DSLs, with the plastic gun and dagger we picked. Since I got the gun, I think the result was a foregone conclusion :-). The dagger may be a "weapon from an earlier, more civilized age", but it's only useful if you can get in close to your adversary. Similarly, text may be more familiar, but it does often tie you closer to the code; problem domain DSLs in text seem as rare as accurate knife throwers. Markus successfully stabbed me in the back later on, so that evened things up and emphasized the point from our slides: both text and graphics are useful in the right place. Choose, but choose wisely.

It was fun to see the keynote get picked up on Twitter:

EelcoVisser: keynote by @markusvoelter and Steven Kelly at #cg2009: great overview of issues in model-driven development
HBehrens: Steven Kelly at #cg2009 keynote: "wizard based generators create a large legacy application you've never seen before"

The latter was picked up by several people. The reference was to vendor-supplied wizards, often found in IDEs or SDKs, that create skeleton applications for you based on your input. Since the vendors take pride in just how much boilerplate they can spew out, you're left with a mass of generated code that you've never seen before, but must extend with your own code. Worse, you're responsible for maintaining the whole ensuing mixture, and there's no chance of re-running the wizard to change some of the choices -- at least not without losing or invalidating the code you've added. That's in sharp contrast with generation in DSM, where your input is in the form of a model which you can edit at any time. You get the speed of generation but can remain at a high level of abstraction throughout.

MetaEdit+ Hands-on

We'd decided to try something special in the hands-on: building 5 different graphical modeling languages from scratch in under 3 hours. Rather than being random exercises, the languages were increasingly good ways of modeling the same domain. We started with something that was basically just the current code turned into graphics, and ended up with a language that reduced the modeling work to a third of what it was at its worst, with many possible errors ruled out by the language design and rules, and with much better scope for reuse. We showed how to make generators for all the languages, and actually built them for two. And of course since this was MetaEdit+, simply defining the metamodel already gave you a full graphical modeling environment -- we just tweaked the symbols to taste.

Never having run the session before, we were rather nervous about how much we could achieve in the time available. In the end, thanks to great slides from Risto Pohjonen and testing from Janne Luoma, it seems we pretty much hit our target. Only at the very end of the last language did we have some people only just starting the last section (the generator) while others were finishing it and going on to beautify the symbols or play around with other fun features of MetaEdit+. Hopefully people learned not just about MetaEdit+ as a tool, but also how to make better languages and improve existing ones. Feedback online was encouraging:

PeterBell: Great metaedit hands on - built and refactored language and generator in just a couple of hours at #cg2009
elsvene: been to a great hands-on session for MetaEdit+. Really interesting tool! #cg2009
HBehrens: for me MetaEdit is the most sophisticated graphical modeling tool currently available #cg2009. Thanks for this session!

Dinner

The conference dinner was of the high standard you'd expect from a Cambridge college. The airy hall and contemporary art lent a friendly ambience. The large round tables weren't particularly conducive to conversation: you could only really talk to the people either side of you without shouting or craning your neck. On long tables you can reach 5 people for the same effort. I was fortunate to be sitting between Scott Finnie and Jon Hurwitz, so I certainly didn't suffer.

The "suffering" started later, when there was a raffle in aid of Bletchley Park, the home of Allied code-breaking work in World War II. I ended up winning a prize donated by Microsoft: a screwdriver toolkit and MSDN T-shirt, causing much hilarity and bad jokes about finally getting Microsoft tools that didn't crash. The irony continued when Alan Cameron Wills won a signed copy of our Domain-Specific Modeling book -- despite having received one from us last year. Either the older British segment of the audience were most inclined to support Bletchley Park by buying raffle tickets, or then the draw was rigged to encourage vendor co-operation. The people on my table were having none of that, and encouraged me to cover up the Microsoft logos :-). All in all a good laugh, and in a good cause.

DSM-tech

Oslo Quadrant reviews

June 12, 2009 19:17:12 +0300 (EEST)

The May 26th CTP of Oslo includes the first public version of Quadrant, Microsoft's visual model editor. I've had my head down on other topics, so haven't had a chance to play with it yet, but here are some reviews from others.

Charles Young, Initial experiments

Microsoft is publically committed to providing strong UML and XMI support in 'Oslo' and this is our first glimpse of what they intend. ... My initial experiments with LoadUML suggest that the tool is not yet fully functional. For example, it fell over the use of the xmi:type attribute on the uml:Model element. It failed to handle a type element of an ownedAttribute, and it didn't recognise the packageImport element. The error messages were not always very helpful and the tool is slow...
Initial experiments with LoadAssembly went a little more smoothly. Again, the tool is very slow, and can take several minutes to complete imports...
This early version of Quadrant has big problems with big models. It could, in some cases, take several minutes of 100% CPU usage to display the contents of a folder. Memory usage can also grow to monumental proportions...
All in all, don’t expect Quadrant or the new loaders to behave very well. This is very early preview code.

Charles did manage to get an XMI file and .NET assembly imported after some messing around, so it wasn't all bad. But those speed and memory problems aren't going to go away just by optimising code: scalability is something that must be architected in from the start.

Frank Lillehagen, Quadrant - First Impressions (I had the pleasure of meeting Frank in May 2001, when he was VP at Computas and responsible for the Metis modeling tool - first released in 1991!)

Quadrant's user interface is novel, uniform, and functional, but a bit cumbersome, and as an early preview it exposes a lot of the underlying wiring, nuts and bolts. Some functionality is well supported, such as customizing views and interacting with large models in multiple workpads. On the other hand, services for e.g. relationship modeling are poor. ... Visualization is the focus, more than modeling.
The layout of diagrams is partially automated, however when you close and reopen a diagram, it will revert to an automatic layout, not keeping the location changes you made manually the last time.
The support for key visual modeling concepts like relationships is not native, and limited. Quadrant does not recognize many-to-many relationships from entities, leading to diagrams ... where [half] the shapes are really relationships that ... should be shown as links.

From the pictures Frank posted, the existing models in Oslo break many principles of good modeling design. Having automatic layout that loses your manual layout changes pretty much rules out the chance of getting to know your way around your models, for any diagram more complex than a simple tree. And having no n-ary relationships is going to mean unwelcome hacking for both metamodelers and modelers: many relationships are binary, but certainly not all.

I'll continue to follow the progress of Quadrant with interest, but there seems little point getting my hands dirty with it yet. It's a shame that it seems to be back to square one for modeling at Microsoft - this is like the early versions of DSL Tools, and you'd think they'd have moved on in the 5 years since we first saw that. When we did a complete rewrite of MetaEdit (released 1993) to get the first version of MetaEdit+ (1995), there was rather a lot more that worked, and the scalability was already in place. The UI wasn't pretty, so we'll give Quadrant the thumbs up on that score, but the real worth of an application like this lies between the UI and the database. If Quadrant only works for binary relationships, autolayout, and small models, there's some major rework needed before it becomes a serious contender. Let's hope their bosses give them chance to do it!

DSM

Getting ready for Code Generation

May 19, 2009 20:31:11 +0300 (EEST)

Markus Voelter and I are having fun at the moment preparing our keynotes for Code Generation. The descriptions on the web page are deliberately vague, but the important fact is there: we'll be giving both keynotes together.

As frequent conference attendees will know, Markus and I are both quiet, meek guys who would never presume to disagree, so the talks will most likely be boring consensus... NOT! I did suggest mud wrestling would be an easier way to settle our differences, but my imposing physical presence must have convinced Markus he'd have a better chance with PowerPoints at twenty paces.

In related news, Mark Dalgarno has finally realized that the concepts of "early bird" and "software developer" make uneasy bed-fellows, and the way to get people to sign up some reasonable time before conferences is to use the stick not the carrot. Yes, there's now a special not-very-early-bird price increase of 10% extra heading your way if you don't go to the site NOW and register.

It's not all stick though: if you were there at either previous conference, you get 5% off. Canny forward thinker that he is, and with CodeGen 2026 clearly in mind, Mark isn't offering 10% off if you were there both years (darn!).

However you cut it, Code Generation is simply the best conference on DSM in Europe. Even without the mud wrestling.

DSM

Playing with Martin Fowler's DSM language

March 18, 2009 13:08:40 +0200 (EET)

The roadmap for Martin Fowler's forthcoming book on DSLs indicates that he will focus on textual DSLs. The online draft of the intro does however briefly show a graphical language for a home security system: the model in Figure 6 is implemented with MetaEdit+, based on the original textual requirements:

Miss Grant has a secret compartment in her bedroom that is normally locked and concealed. To open it she has to close the door, open the second draw in her chest, turn her bedside light on - and then the secret panel is unlocked for her to open.

Juha-Pekka has been using Martin's example as a way of showing how to implement a DSM language in MetaEdit+ (Parts 1 and 2 ), and in Part 3 he points out some problems with the original language: too broad a focus, unclear usage process, and too low a level of abstraction. Juha-Pekka correctly suggests going back to the basics of the domain to discover the necessary language concepts, rather than trying to shoehorn this domain into a generic state model.

As an exercise, however, I thought it might be interesting to try to improve Martin's language as it is, rather than starting from scratch. How much of DSM is "you just have to know how to do it", and how much can be reduced to simple steps that anyone could apply? Obviously, the more of the latter that we can find, the easier it is for somebody to get started. Our DSM book aimed at just this kind of practical approach; let's take a few hints from there and apply them to Martin's language. We'll show the model in the current state of the language as we evolve it: click the pictures to see the full size screenshot.

Use meaningful symbols

Miss Grant's model with meaningful symbols Martin's language uses just black and white shapes, the kind you might see in a standard flow chart palette. Only the text within the shapes gives a clue as to the actual domain: words like "door", "drawer", "light" and "panel" occur many times. However, the brain takes a lot longer to find all occurrences of a word in a picture than it does to find all occurrences of a symbol. Try it yourself: how many times does the door symbol appear in the picture on the right, and how many times does the word "door" appear in Martin's Figure 6? (You'll actually notice a slight discrepancy: Martin's diagram omits the "reset all / return to start" event caused by the door opening, shown at the extra door at the top left in our diagram; he mentions this elsewhere in the draft.)

Reuse objects

Miss Grant's model with objects reusedHaving four door objects like this obviously isn't ideal: Martin had them, but the problem wasn't so visible there because they a) couldn't be distinguished from other objects, and b) didn't so clearly represent something in the physical world -- the problem domain. Now that we have them visible, it would be nicer if we could show that there's really only one door in this model, and it is involved in four different events or actions. So let's merge the four doors into one, and similarly for the panel.

The light bulbs and drawers are harder: if we merge them, we end up with either lots of crossing lines, or objects on top of each other -- ugly. Maybe there's something else we could do for them?

Consider n-ary relationships

N-ary relationships -- relationships involving more than two objects -- are everywhere: a "family" relationship links a father, mother and children; an inheritance relationship links a superclass with several subclasses. When people draw a diagram on paper, they're happy drawing lines that split. However, implementers of modeling tools have often misanalyzed the simplest and most common case of a binary relationship, and ended up thinking relationships can only connect two objects. They end up having to represent n-ary relationships with a fake object in place of the relationship. Such fake relationship objects leave the modeling language inconsistent, as the user can draw a "relationship object" on its own without connecting it to anything. They also make checking model correctness much harder, as the rule for what can be connected in a certain kind of relationship must be split over several relationships, all cobbled back together through the fake relationship object. (For more details, see Welke's article from my previous entry on The Model Repository.)

Miss Grant's model with several events allowed for a transitionIf you're lucky enough to have a tool that supports n-ary relationships properly, take a look at your modeling language and see if you can make a more complex structure of objects into a simpler one by connecting several objects with a single relationship. In this case, Martin already shows excellent taste by using n-ary relationships for transitions :-) -- but maybe we can go a bit further still. On the left path between Active and Unlocked panel we can see the sequence "Drawer opens", "Waiting for light", "Light on"; on the right path we have "Light on", "Waiting for drawer", "Drawer opens". If we go back to the original text, we can see that all this means is that we wait for the drawer to open and the light to come on: both must happen, but in either order. So why not just have a single transition with two events to trigger it? We can make that the default semantics of a transition: it waits for all attached events to happen, in any order. To remind ourselves that such transitions will wait for all events, we show a little block on them where the lines meet. If we wanted to support the case where either one event or the other could happen, but not both (XOR), we could have a property in the transition to specify whether it is AND or XOR, or then simply require the user to draw two transitions between the two states. In either case the amount of extra work for the XOR case is much less than is needed in a generic state model for the AND case, which requires the insertion of extra "Waiting for..." states -- only two here, but imagine covering all possibilities if there were 5 events that could happen in any order. (Exercise for the reader: how many "Waiting for" states would that require? Hint: we can do better than 5 factorial.)

Rule out corner cases

Top level model for Miss GrantAn important feature of good DSM languages is that they make the job of the modeler easier. In Martin's language, one of the hard things to see is in which states are the panels, doors etc. unlocked. We can see the actions that unlock them, but to know the state of a panel in a given state, we need to play through all possible routes that can get to that state. As the whole point of this language is to describe when things are locked or unlocked, this is quite a serious problem. Is there a way that we can make things clearer to the modeler? If we look at a few of these models, we see a common pattern emerge. On the right here a panel is unlocked by a state "Panel unlocked", and there is a transition from that state when the panel is closed, to a state that locks the panel again. This unlock->close->lock sequence appears in many models, and makes sense in the problem domain. So why not allow a shortcut syntax, where the panel itself plays the role of a state, as at the bottom in this picture: on entering the panel state, the panel is unlocked; we leave the panel state when the panel is closed, and on leaving it we lock the panel. Since we can specify the semantics like this, we can obviously make the generators produce the required code: we can do this by extra steps in the generator, by a model-to-model transformation that produces a more generic state machine, or by a more powerful state machine engine in the framework. The last would be my choice, as that way the code generated per model has the closest resemblance to the models, stays on a higher level of abstraction, and keeps the overall size of the application down.

Lower level model for Miss Grant

That takes care of the case where the panel unlock-close-lock sequence can be considered as an atomic element in the model, with nothing else happening during it. What about cases like the door being unlocked, during which there is a sequence of other events needed before it is again locked? In this case we can use a sub-model: in the figure above, the green padlock relationship connecting the "Door unlocked" state to the door means "during this state, the door is unlocked" -- i.e. on entry the door is unlocked, and on exit it is locked; as before, closing the door exits that state. The little blue star in the "Door unlocked" state indicates that it has a submodel, shown in this figure. The contents of the submodel are of course just the set of states during which the door is always unlocked. Now it's easy for the modeler to know whether the various secret compartments are locked or unlocked at each stage -- and of course thus to ensure that the system he designs has no holes in its security. And since the code is generated, we'll never forget those pesky bounds checks, so there'll be no buffer overruns to exploit :-).

Keep models compact

Miss Grant's model merged back into a single diagramSub-models are great for hiding complexity and making the modeling language scale better. If each model becomes too small, however, many people find it harder to understand. Those with a Lisp or Smalltalk background are used to methods being only 2-4 lines of code; in more commonly used languages several such methods tend to be grouped together into a single larger method. Obviously extremes in either direction are bad; providing we stay within the bounds of what is sensible, we can choose the option that the modeler feels more comfortable with. In this diagram we have combined the two small models back into one larger one, stretching the "Door unlocked" state to enclose its substates. We can still see during which states the door and panel are unlocked, and maybe the overall picture is clearer -- or maybe not. In any case, this slightly reduces the number of model elements compared to the previous step.

Metrics

If we count each object as 1 element, each binary relationship as 1 element, and each additional role or property as 1 element, plus 1 for each model, we get the following size metrics for the models above:

41Initial
41Use meaningful symbols
36Reuse objects
27Consider n-ary relationships
23Rule out corner cases (includes submodel)
19Keep models compact

As can be seen, we've reduced the size of the model by over 50%. Since the effort needed for a given project increases more than linearly with size, we can estimate that productivity increases compared to the original language by a factor of at least 2. Improving symbols and cutting out corner cases weren't aimed at reducing the size of the model, but will have significant improvements in usability, so I'd guess overall a factor of around 3 is reasonable. Note that this is on top of whatever improvement is gained by Martin in moving from a straight hand-coding solution to a DSL, and from a textual DSL to a graphical DSL. More importantly, though, these are the kinds of steps that anyone can see how to apply to their own modeling language, and any team of developers would benefit from at the modeling level. Interestingly, with MetaEdit+ all of these changes could be applied to the modeling language without throwing away the initial model: the initial model and all intermediate models remain valid throughout, as the language evolves.

DSM-tech

The Model Repository (was: The CASE Repository)

March 16, 2009 17:19:43 +0200 (EET)

At last year's OOPSLA Workshop on Domain-Specific Modeling I had the pleasure and privilege of giving the keynote. One nice thing about keynotes is that you are given more freedom than for normal talks. I decided to take that to its limit by giving as my keynote a paper that was written 20 years earlier. As far as I could tell, nobody noticed :-). Actually, not wanting the audience to feel they had been fooled, I came clean near the start of the talk. All the same, the message in the talk was news to the lion's share of the audience.

In 1988 Dr. Richard J. Welke, with 26 years of computing experience and two CASE tool companies behind him, wrote a white paper on how model data should be structured, stored and manipulated -- irrespective of the modeling language. In a series of four tiny example model fragments he shows the problems we get into if we try to to represent models using just binary or Entity-Relationship-Attribute concepts, or to store models using just files or relational databases.

With today's users of Microsoft or Eclipse modeling tools only just finding out these problems through their own painful experience, now seemed a good time to revisit that paper. Prof. Welke has kindly allowed me to make it available here: The CASE Repository: More than another database application.

The sad truth is that for my keynote, the starting position was worse than for this article 20 years ago. Back then, people knew that storing models in files didn't work, and most were trying to store them in relational databases. They knew that by default, things just existed on their own, and had association links to other things, either directed or undirected. Nowadays, people are trying to store model data in files again, and worse in XML files -- with the in-built assumption that the world can be shoehorned into a tree structure, a hierarchy of strong containment aggregation.

Another difference between then and now is version control: back then it was obvious from databases that you couldn't talk about versions of individual pieces of data or tables, only of the whole set of inter-related data. The loss of fine granularity of versioning was a small price to pay for the gain in being able to support multiple simultaneous users working in the same set of data. Now, version control's "check out, edit, merge" has become the de facto poor man's multi-user capability -- so much so that few realise there could even be an alternative.

So, the only things I had to add to my keynote on top of the original paper were actually steps backward, hence the two titles: "The Model Repository: More than just XML under version control", or: "Domain-Specific Modeling: 20 years of progress?". Of course there has been progress, at least in the tools like MetaEdit+ or GME that have been around for a decade or more. For the others, all I can do is refer them to Welke's paper, and to the quote from the start of the tools chapter in our book on DSM :-)

"Those who cannot remember the past are condemned to repeat it."
- George Santayana, The Life of Reason (1905)

General

Google on Google: This site may harm your computer

January 31, 2009 17:26:42 +0200 (EET)

That was weird: searching for anything on Google was returning all results marked as "This site will harm your computer". Even searching for Google:

All Google results claim: This site may harm your computer

I submitted a report to Google, and in a few minutes Google.com was corrected. Google.co.uk showed bad results for one more search, but now that too is corrected. I couldn't find any mentions elsewhere of this yet, but it occurred both from home and via my work PC.

E: looks like the culprit is StopBadware.org: they seem to provide this information for Google, and their site is currently down.

E2: StopBadware.org put the blame back on Google in their blog entry: they say their site went down because of millions of people clicking through to it from the warnings, which were falsely generated because of a glitch at Google's end.

DSM

Podcast on Domain-Specific Modeling

January 26, 2009 11:42:45 +0200 (EET)

Jim Robertson and Michael Lucas-Smith of Cincom Smalltalk put out a podcast a week on software development. This week they kindly asked me to join them in their virtual studio -- good old Skype and Audacity! We talked about Domain-Specific Modeling, the history of MetaEdit+, and why we use Smalltalk.

The DSM podcast page has the download and some links to background and further information. You can also grab the 15MB MP3 directly or via its iTunes link. One piece of background info for a short section early on: Envy and Store are version control systems for Smalltalk.

Trying to explain DSM in a purely audio medium is something of a challenge, particularly in an unscripted interview where the participants can't see each other: the interviewer's expression is normally a good indication of whether you need to explain something further. I imagine that Juha-Pekka's interview with Markus Voelter did a better job partly because they knew the questions beforehand and could sit down together and look at the same screen. Of course that's also a risk: the listeners cannot see the screen. Mind you, for all its difficulties the audio medium has two major benefits: I don't need to get in front of a camera, and you don't need to look at me!

DSM

Earliest use of Domain-Specific Modeling name

January 19, 2009 14:37:03 +0200 (EET)

Jeff Gray asked a good question in response to my "Domain-Specific Modeling: what's in a name?":

Can any readers of Steve's blog suggest what they consider as the earliest reference where the explicit phrase "domain-specific modeling" occurs? I am not asking about where general concepts are defined under other names, but where the specific name is first used.

Let's make it more precise in that we're looking for cases where the phrase is used to mean the same thing that we mean today: creating a new graphical modeling language with a set of symbols, concepts and rules for connecting them to build models of systems in a particular domain. We're not talking about modeling in the more abstract sense, e.g. for textual DSLs or for mathematical models of how a physical system behaves.

I opened the bidding in the comments with Bran Selic's work on ROOM, later seen in ObjecTime and UML/RT:

1992: ROOM: an object-oriented methodology for developing real-time systems, B. Selic, G. Gullekson, J. McGee, I. Engelberg, in: Proceeding of Fifth International Workshop on Computer-Aided Software Engineering, 6-10 July 1992

That was from Google Scholar; Google Books might help us go back even further. A search for DSM from 1950-1980 turns up the following:

1975: Government reports announcements & index‎ by United States National Technical Information Service - "The proposed tool will include an interactive intelligent graphical interface and a high-level domain-specific modeling language"
1961: International Abstracts in Operations Research‎ by International Federation of Operational Research Societies, Operations Research Society of America - "... environment for domain-specific modeling via the use of user-defined modeling elements..."

The 1961 reference looked particularly fascinating, because it would also be the first reference to DSM where a tool allows users to create their own modeling language, as opposed to just using a tool that contains a fixed DSM language. Presumably not a graphical modeling tool -- it was 2 years before Ivan Sutherland's incredible Sketchpad -- and most likely more on the mathematical modeling side (at least the quote is found verbatim in a paper on queuing theory). The 1975 quote may well be also be more mathematical, as it is found in a paper on planetary atmospheric modeling.

Any other suggestions, or confirmation/refutation of those two early occurrences?