![]() |
[Generators] Rexexp-Machine? |
Post Reply
|
| Author | |
bullitt168
Member
Joined: 28.Apr.2008 Location: Bremen, Germany Points: 4 |
Post Options
Thanks(0)
Quote Reply
Topic: [Generators] Rexexp-Machine?Posted: 29.Apr.2008 at 17:59 |
|
Hi!
I need to split and transform some Strings in a Generator. Sadly, the documentation of the string and variable-manipulation-operations (to .. translate endto) is not so good. Do you have some more information about the Regex-Machine you're using? My most urgent questions are: -How do I match a Line break? -How can I store the matches in variables for later use? e.G.: Given as a String: 'element1_1;element1_2;element1_3 element2_1;element2_2;element2_3 ... ' Results I'd like to have: variable1 contains 'element1_1;element1_2;element1_3' variable2 contains 'element2_1;element2_2;element2_3 ...' These variables I'd like to process further so that i can access variable1_1 = 'element1_1' variable1_2 = 'element1_2' variable1_3 = 'element1_3' within the Generators These are pretty easy tasks in Languages like Pearl or Python but I'd like to know if it's possible to do this within the Generators so that the user doesn't have to have any external tools installed... Thank you, Bastian |
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 29.Apr.2008 at 18:46 |
|
The documentation of the regular expression syntax is in Appendix A. Often, you don't need to store intermediate results in a variable, so for your needs you could do something like this: do :MyTextProperty { /* will iterate over the lines */
do id%bySemicolon { /* will iterate over elements separated by semicolons */
variable
'variable' id%indexPart /* forms the name of the variable */
write
id
close
}
}
You'll need to define the two translators, %bySemicolon to map each semicolon to a newline, and %indexPart to grab the 1_1 from the end of a string like element1_1. They'd go somewhere before the block above, and be something like this:
to '%bySemicolon
; \
' endto /* maps semicolon to newline (escaped here with a backslash) */
to '%indexPart
/.*([0-9]+_[0-9]+)/ $$1'
endto /* maps any string ending in digit(s)_digit(s) to just that end part */
See the _translators generator in Graph for some more example translators. Splitting based on a delimiter character then iterating over the results is a common approach in text processing in MERL, and one of my personal favorite new features in 4.5 SR1. It makes something that would normally be tricky into something fairly easy - for instance I was able to write the above scripts by hand from memory, and they worked first time (partly luck!).
|
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 29.Apr.2008 at 18:55 |
|
The first time you see something like the above, it may well look like gibberish - in particular the translator definitions. Let's try and explain them a bit:
Each translator is formed from the output of the commands between to and endto keywords - generally just a single literal string. We'll omit the to, endto and string quotes below, and can concentrate on the syntax of the translators themselves.
The first line of a translator can optionally be a name, like %bySemicolon above. This lets you apply the translator easily later, e.g. id%bySemicolon will return the translated id.
Subsequent lines are rules, separated by newlines. Each rule has a left hand side and a right hand side, separated by a space. Things on the left hand side will be translated into things on the right hand side. (If you need spaces or new lines in your rules, you can escape them with a backslash.)
A common translation is to translate one character into another:
a A
or multiple characters into their corresponding characters:
abc ABC
or to save time, ranges:
a-z A-Z
Sometimes, you need to translate a character into a string. The right hand side is then prefixed by a dollar sign:
& $&
Sometimes, you need to translate a string into another string, e.g. the reverse of the above (note both sides are not strings):
$& $&
The most powerful form is a regular expression on the left, and a string on the right. The regular expression is enclosed in slashes, /.../. The parenthesized submatches from the regular expression can be referred to on the right as $1, $2 etc. E.g. in %indexPart above, we return a string ($) that contains just the first submatch ($1):
/.*([0-9]+_[0-9]+)/ $$1
This last form is really useful for parsing bits out of existing text. You can set up several different translators, each to grab a different part of the same line or other string.
|
|
![]() |
|
Post Reply
|
|
| Tweet |
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |