Print Page | Close Window

[Generators] Rexexp-Machine?

Printed From: MetaCase
Category:
Forum Name: MetaEdit+
Forum Description: All topics relating to MetaEdit+ or DSM
URL: https://www.metacase.com/forums/forum_posts.asp?TID=31
Printed Date: 27.Mar.2026 at 00:26
Software Version: Web Wiz Forums 12.05 - http://www.webwizforums.com


Topic: [Generators] Rexexp-Machine?
Posted By: bullitt168
Subject: [Generators] Rexexp-Machine?
Date Posted: 29.Apr.2008 at 17:59
Hi!
I need to split and transform some Strings in a Generator. Sadly, the documentation of the string and variable-manipulation-operations (to .. translate endto) is not so good. Do you have some more information about the Regex-Machine you're using?

My most urgent questions are:
-How do I match a Line break?
-How can I store the matches in variables for later use?

e.G.:
Given as a String:
'element1_1;element1_2;element1_3
element2_1;element2_2;element2_3 ... '

Results I'd like to have:
variable1 contains 'element1_1;element1_2;element1_3'
variable2 contains 'element2_1;element2_2;element2_3 ...'

These variables I'd like to process further so that i can access
variable1_1 = 'element1_1'
variable1_2 = 'element1_2'
variable1_3 = 'element1_3'
within the Generators

These are pretty easy tasks in Languages like Pearl or Python but I'd like to know if it's possible to do this within the Generators so that the user doesn't have to have any external tools installed...

Thank you,

Bastian



Replies:
Posted By: stevek
Date Posted: 29.Apr.2008 at 18:46

The documentation of the regular expression syntax is in http://www.metacase.com/support/45/manuals/mwb/Mw-Appendix.html - Appendix A . Often, you don't need to store intermediate results in a variable, so for your needs you could do something like this:

do :MyTextProperty { /* will iterate over the lines */
   do id%bySemicolon { /* will iterate over elements separated by semicolons */
      variable
         'variable' id%indexPart /* forms the name of the variable */
      write
         id
      close
   }
}
 
You'll need to define the two translators, %bySemicolon to map each semicolon to a newline, and %indexPart to grab the 1_1 from the end of a string like element1_1. They'd go somewhere before the block above, and be something like this:
 
to '%bySemicolon
; \
' endto /* maps semicolon to newline (escaped here with a backslash) */
 
to '%indexPart
/.*([0-9]+_[0-9]+)/ $$1'
endto /* maps any string ending in digit(s)_digit(s) to just that end part */
 
See the _translators generator in Graph for some more example translators. Splitting based on a delimiter character then iterating over the results is a common approach in text processing in MERL, and one of my personal favorite http://www.metacase.com/support/45/program/45sr1.html - new features in 4.5 SR1 . It makes something that would normally be tricky into something fairly easy - for instance I was able to write the above scripts by hand from memory, and they worked first time (partly luck!).


Posted By: stevek
Date Posted: 29.Apr.2008 at 18:55
The first time you see something like the above, it may well look like gibberish - in particular the translator definitions. Let's try and explain them a bit:
 
Each translator is formed from the output of the commands between to and endto keywords - generally just a single literal string. We'll omit the to, endto and string quotes below, and can concentrate on the syntax of the translators themselves.
 
The first line of a translator can optionally be a name, like %bySemicolon above. This lets you apply the translator easily later, e.g. id%bySemicolon will return the translated id.
 
Subsequent lines are rules, separated by newlines. Each rule has a left hand side and a right hand side, separated by a space. Things on the left hand side will be translated into things on the right hand side. (If you need spaces or new lines in your rules, you can escape them with a backslash.)
 
A common translation is to translate one character into another:
a A
or multiple characters into their corresponding characters:
abc ABC
or to save time, ranges:
a-z A-Z
 
Sometimes, you need to translate a character into a string. The right hand side is then prefixed by a dollar sign:
& $&
 
Sometimes, you need to translate a string into another string, e.g. the reverse of the above (note both sides are not strings):
$& $&
 
The most powerful form is a regular expression on the left, and a string on the right. The regular expression is enclosed in slashes, /.../. The parenthesized submatches from the regular expression can be referred to on the right as $1, $2 etc. E.g. in %indexPart above, we return a string ($) that contains just the first submatch ($1):
/.*([0-9]+_[0-9]+)/ $$1
 
This last form is really useful for parsing bits out of existing text. You can set up several different translators, each to grab a different part of the same line or other string.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.05 - http://www.webwizforums.com
Copyright ©2001-2022 Web Wiz Ltd. - https://www.webwiz.net