Playing with Martin Fowler's DSM language

March 18, 2009 13:08:40 +0200 (EET)

The roadmap for Martin Fowler's forthcoming book on DSLs indicates that he will focus on textual DSLs. The online draft of the intro does however briefly show a graphical language for a home security system: the model in Figure 6 is implemented with MetaEdit+, based on the original textual requirements:

Miss Grant has a secret compartment in her bedroom that is normally locked and concealed. To open it she has to close the door, open the second draw in her chest, turn her bedside light on - and then the secret panel is unlocked for her to open.

Juha-Pekka has been using Martin's example as a way of showing how to implement a DSM language in MetaEdit+ (Parts 1 and 2 ), and in Part 3 he points out some problems with the original language: too broad a focus, unclear usage process, and too low a level of abstraction. Juha-Pekka correctly suggests going back to the basics of the domain to discover the necessary language concepts, rather than trying to shoehorn this domain into a generic state model.

As an exercise, however, I thought it might be interesting to try to improve Martin's language as it is, rather than starting from scratch. How much of DSM is "you just have to know how to do it", and how much can be reduced to simple steps that anyone could apply? Obviously, the more of the latter that we can find, the easier it is for somebody to get started. Our DSM book aimed at just this kind of practical approach; let's take a few hints from there and apply them to Martin's language. We'll show the model in the current state of the language as we evolve it: click the pictures to see the full size screenshot.

Use meaningful symbols

Martin's language uses just black and white shapes, the kind you might see in a standard flow chart palette. Only the text within the shapes gives a clue as to the actual domain: words like "door", "drawer", "light" and "panel" occur many times. However, the brain takes a lot longer to find all occurrences of a word in a picture than it does to find all occurrences of a symbol. Try it yourself: how many times does the door symbol appear in the picture on the right, and how many times does the word "door" appear in Martin's Figure 6? (You'll actually notice a slight discrepancy: Martin's diagram omits the "reset all / return to start" event caused by the door opening, shown at the extra door at the top left in our diagram; he mentions this elsewhere in the draft.)

Reuse objects

Having four door objects like this obviously isn't ideal: Martin had them, but the problem wasn't so visible there because they a) couldn't be distinguished from other objects, and b) didn't so clearly represent something in the physical world -- the problem domain. Now that we have them visible, it would be nicer if we could show that there's really only one door in this model, and it is involved in four different events or actions. So let's merge the four doors into one, and similarly for the panel.

The light bulbs and drawers are harder: if we merge them, we end up with either lots of crossing lines, or objects on top of each other -- ugly. Maybe there's something else we could do for them?

Consider n-ary relationships

N-ary relationships -- relationships involving more than two objects -- are everywhere: a "family" relationship links a father, mother and children; an inheritance relationship links a superclass with several subclasses. When people draw a diagram on paper, they're happy drawing lines that split. However, implementers of modeling tools have often misanalyzed the simplest and most common case of a binary relationship, and ended up thinking relationships can only connect two objects. They end up having to represent n-ary relationships with a fake object in place of the relationship. Such fake relationship objects leave the modeling language inconsistent, as the user can draw a "relationship object" on its own without connecting it to anything. They also make checking model correctness much harder, as the rule for what can be connected in a certain kind of relationship must be split over several relationships, all cobbled back together through the fake relationship object. (For more details, see Welke's article from my previous entry on The Model Repository.)

If you're lucky enough to have a tool that supports n-ary relationships properly, take a look at your modeling language and see if you can make a more complex structure of objects into a simpler one by connecting several objects with a single relationship. In this case, Martin already shows excellent taste by using n-ary relationships for transitions :-) -- but maybe we can go a bit further still. On the left path between Active and Unlocked panel we can see the sequence "Drawer opens", "Waiting for light", "Light on"; on the right path we have "Light on", "Waiting for drawer", "Drawer opens". If we go back to the original text, we can see that all this means is that we wait for the drawer to open and the light to come on: both must happen, but in either order. So why not just have a single transition with two events to trigger it? We can make that the default semantics of a transition: it waits for all attached events to happen, in any order. To remind ourselves that such transitions will wait for all events, we show a little block on them where the lines meet. If we wanted to support the case where either one event or the other could happen, but not both (XOR), we could have a property in the transition to specify whether it is AND or XOR, or then simply require the user to draw two transitions between the two states. In either case the amount of extra work for the XOR case is much less than is needed in a generic state model for the AND case, which requires the insertion of extra "Waiting for..." states -- only two here, but imagine covering all possibilities if there were 5 events that could happen in any order. (Exercise for the reader: how many "Waiting for" states would that require? Hint: we can do better than 5 factorial.)

Rule out corner cases

An important feature of good DSM languages is that they make the job of the modeler easier. In Martin's language, one of the hard things to see is in which states are the panels, doors etc. unlocked. We can see the actions that unlock them, but to know the state of a panel in a given state, we need to play through all possible routes that can get to that state. As the whole point of this language is to describe when things are locked or unlocked, this is quite a serious problem. Is there a way that we can make things clearer to the modeler? If we look at a few of these models, we see a common pattern emerge. On the right here a panel is unlocked by a state "Panel unlocked", and there is a transition from that state when the panel is closed, to a state that locks the panel again. This unlock->close->lock sequence appears in many models, and makes sense in the problem domain. So why not allow a shortcut syntax, where the panel itself plays the role of a state, as at the bottom in this picture: on entering the panel state, the panel is unlocked; we leave the panel state when the panel is closed, and on leaving it we lock the panel. Since we can specify the semantics like this, we can obviously make the generators produce the required code: we can do this by extra steps in the generator, by a model-to-model transformation that produces a more generic state machine, or by a more powerful state machine engine in the framework. The last would be my choice, as that way the code generated per model has the closest resemblance to the models, stays on a higher level of abstraction, and keeps the overall size of the application down.

That takes care of the case where the panel unlock-close-lock sequence can be considered as an atomic element in the model, with nothing else happening during it. What about cases like the door being unlocked, during which there is a sequence of other events needed before it is again locked? In this case we can use a sub-model: in the figure above, the green padlock relationship connecting the "Door unlocked" state to the door means "during this state, the door is unlocked" -- i.e. on entry the door is unlocked, and on exit it is locked; as before, closing the door exits that state. The little blue star in the "Door unlocked" state indicates that it has a submodel, shown in this figure. The contents of the submodel are of course just the set of states during which the door is always unlocked. Now it's easy for the modeler to know whether the various secret compartments are locked or unlocked at each stage -- and of course thus to ensure that the system he designs has no holes in its security. And since the code is generated, we'll never forget those pesky bounds checks, so there'll be no buffer overruns to exploit :-).

Keep models compact

Sub-models are great for hiding complexity and making the modeling language scale better. If each model becomes too small, however, many people find it harder to understand. Those with a Lisp or Smalltalk background are used to methods being only 2-4 lines of code; in more commonly used languages several such methods tend to be grouped together into a single larger method. Obviously extremes in either direction are bad; providing we stay within the bounds of what is sensible, we can choose the option that the modeler feels more comfortable with. In this diagram we have combined the two small models back into one larger one, stretching the "Door unlocked" state to enclose its substates. We can still see during which states the door and panel are unlocked, and maybe the overall picture is clearer -- or maybe not. In any case, this slightly reduces the number of model elements compared to the previous step.

Metrics

If we count each object as 1 element, each binary relationship as 1 element, and each additional role or property as 1 element, plus 1 for each model, we get the following size metrics for the models above:

41	Initial
41	Use meaningful symbols
36	Reuse objects
27	Consider n-ary relationships
23	Rule out corner cases (includes submodel)
19	Keep models compact

As can be seen, we've reduced the size of the model by over 50%. Since the effort needed for a given project increases more than linearly with size, we can estimate that productivity increases compared to the original language by a factor of at least 2. Improving symbols and cutting out corner cases weren't aimed at reducing the size of the model, but will have significant improvements in usability, so I'd guess overall a factor of around 3 is reasonable. Note that this is on top of whatever improvement is gained by Martin in moving from a straight hand-coding solution to a DSL, and from a textual DSL to a graphical DSL. More importantly, though, these are the kinds of steps that anyone can see how to apply to their own modeling language, and any team of developers would benefit from at the modeling level. Interestingly, with MetaEdit+ all of these changes could be applied to the modeling language without throwing away the initial model: the initial model and all intermediate models remain valid throughout, as the language evolves.

:: Steven Kelly
:: MetaCase
:: DSM Forum

Steven Kelly on DSM

Domain-Specific Modeling: A Toolmaker Perspective