|
The first time you see something like the above, it may well look like gibberish - in particular the translator definitions. Let's try and explain them a bit:
Each translator is formed from the output of the commands between to and endto keywords - generally just a single literal string. We'll omit the to, endto and string quotes below, and can concentrate on the syntax of the translators themselves.
The first line of a translator can optionally be a name, like %bySemicolon above. This lets you apply the translator easily later, e.g. id%bySemicolon will return the translated id.
Subsequent lines are rules, separated by newlines. Each rule has a left hand side and a right hand side, separated by a space. Things on the left hand side will be translated into things on the right hand side. (If you need spaces or new lines in your rules, you can escape them with a backslash.)
A common translation is to translate one character into another:
a A
or multiple characters into their corresponding characters:
abc ABC
or to save time, ranges:
a-z A-Z
Sometimes, you need to translate a character into a string. The right hand side is then prefixed by a dollar sign:
& $&
Sometimes, you need to translate a string into another string, e.g. the reverse of the above (note both sides are not strings):
$& $&
The most powerful form is a regular expression on the left, and a string on the right. The regular expression is enclosed in slashes, /.../. The parenthesized submatches from the regular expression can be referred to on the right as $1, $2 etc. E.g. in %indexPart above, we return a string ($) that contains just the first submatch ($1):
/.*([0-9]+_[0-9]+)/ $$1
This last form is really useful for parsing bits out of existing text. You can set up several different translators, each to grab a different part of the same line or other string.
|