|
Good question. If you don't mind, I'll first give you the disclaimers why this is generally a bad idea, and then tell you how to do it :-).
Why reverse engineering isn't as good an idea as it sounds
What you're talking about is generally called http://www.metacase.com/faq/search.asp?Search=reverse - reverse engineering : turning textual code into models. The options on the http://www.metacase.com/mwb/model_import.html - model_import page you mention are more about model-to-model transformations. Some people want to call textual code a model too, but IMHO that confuses an important distinction. The relationship of model to code is like the relationship of code (in a third generation language) to assembler (a second generation language):
- The first is a whole level of abstraction higher than the second.
- Working on the higher level is significantly more productive.
- The higher level is more immediately understandable by humans, the lower level by computers.
- The higher level can be automatically transformed to the lower level.
- There are legal statements at the lower level that cannot be produced by any legal statement at the higher level.
- There are multiple valid ways to transform the higher level to the lower level: the choice depends on non-functional requirements such as speed, memory usage, clarity, etc.
- There is generally no satisfactory way to automatically transform hand-written statements from the lower level up to the higher level.
- The work needed to make even an unsatisfactory automatic reverse transformation to the higher level is significantly greater than to make a satisfactory transformation to the lower level.
- The manual work needed to make the results of an automatic reverse transformation satisfactory is generally larger than the work needed to enter the same information at the higher level.
- Trying to work on the same part of a program on both higher and lower levels introduces significant friction: successful moves to higher levels have always involved the complete hiding of the lower level in general use.
Of course, during any industry move from a lower level to a higher level, there has always been an interim period where people have wanted the security of the familiar old way but with the productivity of the new way. And there is a mass of programs written on the lower level that would be great to be able to look at on the higher level. Ironically, if the programs on the lower level had originally been generated from the higher level, they would be sufficiently regular that it would be feasible to make a reverse transformation. However, since they are hand-written and vary massively, making that reverse transformation isn't economically sensible.
The only cases where a reverse transformation might be relevant are where the same transformation can be used for all programs, and the results are expressed at a level of abstraction only a little higher than that of the original code - more a case of omitting details than of raising the level of abstraction. That's the case in the reverse engineering facilities of UML tools for Java code etc., which everybody knows don't really work all that well. And that's the best case scenario!
How to turn text into models
That's the theory and collected experience and wisdom, as I see it. Of course there will be cases where you still want to try turning text into models - whether to perform the reverse transformation of your generator, or to get some text in as part of interfacing with existing information outside the modeling tool.
There are two tasks when turning text into models. First you need to parse the text, to turn it into an in-memory representation of the original program. That in-memory representation can use the concepts of the programming language, or of something between that and the concepts used in the models. Second you need to turn that in-memory representation into the format used for models. There may be several possible formats, e.g. the in-memory native representation of models, the native disk representation of models, or a supported import/export format for models (often some kind of XML these days).
Those two tasks can be performed as separate phases, or combined into a more interpreter-like approach: read a line of text, output a bit of model. Depending on the input and output formats of the tasks, you may need to do some processing between the tasks, or even split things into multiple stages of transformation. (My own experience is that people get excited about building multiple stages, when a single stage would work perfectly well.)
In MetaEdit+ the most straightforward route from text to models is to use MERL to read text files (filename...read), store them in a variable (variable 'textFileContents' write filename...read close), iterate over them line-by-line (do $textFileContents {$line = id}), and split the lines up into the tokens you need (do $line%translateSeparatorsToNewlines). As you identify things that should be transformed into model elements, output the appropriate MXM, MetaEdit+'s XML model import/export format (<object typeName="MyObjectType">).
There's an example of this in the UML Examples project's Reverse Engineered Java graph. It's kept simple enough to be understandable, so it's not a fully-fledged Java reverse engineering tool, but it will read simple classes and their attributes. Even so, this is clearly not a task for beginners: you need to understand parsing, modeling, metamodeling, the http://www.metacase.com/support/45/manuals/mwb/Mw-7.html - MXM format, powerful MERL features like http://www.metacase.com/support/45/manuals/mwb/Mw-5_3_6.html#_Ref190594241 - translators , http://www.metacase.com/support/45/manuals/mwb/Mw-Appendix.html - regular expressions , http://www.metacase.com/support/45/manuals/mwb/Mw-5_3_4.html - iterating over variables , and executing MetaEdit+ http://www.metacase.com/support/45/manuals/mwb/Mw-9_1.html - command-line parameters from a generator.
That's just one approach though: you could also consider using an external program to parse the text and either output MXM or call the MetaEdit+ http://www.metacase.com/support/45/manuals/mwb/Mw-8.html - API to create models (and maybe even update existing models, although aiming for round-trip engineering is probably a mistake). Whichever way you choose, remember this: as a programmer it's immensely satisfying to be able to build an automatic transformation, regardless of whether it is for generation or reverse engineering. But building a generator is faster and has a better payoff than building a reverse transformation.
|