MetaCase Homepage
Forum Home Forum Home > > MetaEdit+
  New Posts New Posts RSS Feed - [Generators] Rexexp-Machine?
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

[Generators] Rexexp-Machine?

 Post Reply Post Reply
Author
Message
bullitt168 View Drop Down
Member
Member
Avatar

Joined: 28.Apr.2008
Location: Bremen, Germany
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote bullitt168 Quote  Post ReplyReply Direct Link To This Post Topic: [Generators] Rexexp-Machine?
    Posted: 29.Apr.2008 at 17:59
Hi!
I need to split and transform some Strings in a Generator. Sadly, the documentation of the string and variable-manipulation-operations (to .. translate endto) is not so good. Do you have some more information about the Regex-Machine you're using?

My most urgent questions are:
-How do I match a Line break?
-How can I store the matches in variables for later use?

e.G.:
Given as a String:
'element1_1;element1_2;element1_3
element2_1;element2_2;element2_3 ... '

Results I'd like to have:
variable1 contains 'element1_1;element1_2;element1_3'
variable2 contains 'element2_1;element2_2;element2_3 ...'

These variables I'd like to process further so that i can access
variable1_1 = 'element1_1'
variable1_2 = 'element1_2'
variable1_3 = 'element1_3'
within the Generators

These are pretty easy tasks in Languages like Pearl or Python but I'd like to know if it's possible to do this within the Generators so that the user doesn't have to have any external tools installed...

Thank you,

Bastian
Back to Top
stevek View Drop Down
MetaCase
MetaCase
Avatar

Joined: 11.Mar.2008
Points: 643
Post Options Post Options   Thanks (0) Thanks(0)   Quote stevek Quote  Post ReplyReply Direct Link To This Post Posted: 29.Apr.2008 at 18:46

The documentation of the regular expression syntax is in Appendix A. Often, you don't need to store intermediate results in a variable, so for your needs you could do something like this:

do :MyTextProperty { /* will iterate over the lines */
   do id%bySemicolon { /* will iterate over elements separated by semicolons */
      variable
         'variable' id%indexPart /* forms the name of the variable */
      write
         id
      close
   }
}
 
You'll need to define the two translators, %bySemicolon to map each semicolon to a newline, and %indexPart to grab the 1_1 from the end of a string like element1_1. They'd go somewhere before the block above, and be something like this:
 
to '%bySemicolon
; \
' endto /* maps semicolon to newline (escaped here with a backslash) */
 
to '%indexPart
/.*([0-9]+_[0-9]+)/ $$1'
endto /* maps any string ending in digit(s)_digit(s) to just that end part */
 
See the _translators generator in Graph for some more example translators. Splitting based on a delimiter character then iterating over the results is a common approach in text processing in MERL, and one of my personal favorite new features in 4.5 SR1. It makes something that would normally be tricky into something fairly easy - for instance I was able to write the above scripts by hand from memory, and they worked first time (partly luck!).
Back to Top
stevek View Drop Down
MetaCase
MetaCase
Avatar

Joined: 11.Mar.2008
Points: 643
Post Options Post Options   Thanks (0) Thanks(0)   Quote stevek Quote  Post ReplyReply Direct Link To This Post Posted: 29.Apr.2008 at 18:55
The first time you see something like the above, it may well look like gibberish - in particular the translator definitions. Let's try and explain them a bit:
 
Each translator is formed from the output of the commands between to and endto keywords - generally just a single literal string. We'll omit the to, endto and string quotes below, and can concentrate on the syntax of the translators themselves.
 
The first line of a translator can optionally be a name, like %bySemicolon above. This lets you apply the translator easily later, e.g. id%bySemicolon will return the translated id.
 
Subsequent lines are rules, separated by newlines. Each rule has a left hand side and a right hand side, separated by a space. Things on the left hand side will be translated into things on the right hand side. (If you need spaces or new lines in your rules, you can escape them with a backslash.)
 
A common translation is to translate one character into another:
a A
or multiple characters into their corresponding characters:
abc ABC
or to save time, ranges:
a-z A-Z
 
Sometimes, you need to translate a character into a string. The right hand side is then prefixed by a dollar sign:
& $&
 
Sometimes, you need to translate a string into another string, e.g. the reverse of the above (note both sides are not strings):
$& $&
 
The most powerful form is a regular expression on the left, and a string on the right. The regular expression is enclosed in slashes, /.../. The parenthesized submatches from the regular expression can be referred to on the right as $1, $2 etc. E.g. in %indexPart above, we return a string ($) that contains just the first submatch ($1):
/.*([0-9]+_[0-9]+)/ $$1
 
This last form is really useful for parsing bits out of existing text. You can set up several different translators, each to grab a different part of the same line or other string.
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.05
Copyright ©2001-2022 Web Wiz Ltd.

This page was generated in 0.031 seconds.