6.5.5 Translating strings
To replace characters or substrings in a string use the
template:
to
'A-Z a-z'
translate
'Foo'
endto
which outputs ‘foo’. The clauses
between to and
translate form the translator, and
those between translate and endto form the output to which that translator is
applied. For example, the rule ‘A-Z
a-z’ above means that each uppercase letter is converted into
the corresponding lowercase letter.
Translators can be named and used many times. For example
to
'%lower' newline 'A-Z a-z'
endto
defines a translator named ‘lower’
whose rule is ‘A-Z a-z’. The clauses between
to and
endto form the translator definition:
the first line sets the translator name and the next lines contain the rules.
This translator can be used later many times:
to '%lower' translate 'Foo' endto
There is also a
shortcut syntax for using named translators with simple commands, design element
output commands, variables and literal strings:
id;2;%lower
:myName%lower
$variable%lower
A translator definition can contain
multiple rules, each one line of text that maps a left-hand side to a right-hand
side. There are several different kinds of rule, such as character to character,
or string to string. The different kinds of rules available are explained in the
table below. Special characters (newline space \ / $ % - *) must be escaped with
backslash, e.g. to map spaces to underscores use:
'\ _' (backslash, space, space,
underscore). Remember too that if the translator definition is expressed in a
literal string, a single quote ' must
of course be escaped by doubling it.
Name or comment
|
'%myName' as the first
line in a translator definition gives the translator a name
“myName”. Lines starting with % later in the translator are ignored
as comments.
|
Character
|
'a b' maps each
occurrence of character a to character b.
|
Range
|
'1-9 a-i' maps each
character in the range on the left to the corresponding character in the range
on the right. In this example numbers are mapped to letters: 1 becomes a, 2
becomes b and so on. Note that ranges must be of equal size, thus ‘a-c
1-4’ is not legal.
Note: range can be reversed,
e.g. "a-z z-a".
|
Multiple character
|
'123 abc' maps each
number to a letter: 1 to a, 2 to b, 3 to c. The difference from range is that
each character is specified explicitly.
|
String
|
'$dog $cat' means
replace each occurrence of the string ‘dog’ with ‘cat’.
|
Mixed
|
'aeiou $VOWEL' means
replace each vowel with the string "VOWEL". This is applied with the character
translations.
|
Asterisk
|
An asterisk on the left is the default mapping – what to
map all unspecified characters to (the default is to leave them unchanged):
'* $abc' means replace each character
with the string "abc"
An asterisk on the right means
leave the characters on the left unchanged: ‘abc *’ do not change a,
b and c.
|
|
'/[A-Z][a-z]*/ $NAME'
means replace each occurrence of a capital letter followed by lowercase letters
with NAME. The left-hand side need not escape special translation characters,
but can use the normal regular expression escapes; / must be escaped by doubling
it.
The right-hand side (after the initial $) can use
$0 to refer to the whole matched string, $1 for the substring matching the first
parenthesized subexpression etc. E.g. the following rule (which should be on one
line) would turn “Fred Bloggs and John Doe” into “Bloggs, Fred
and Doe,
John”
/([A-Z][a-z]*)
([A-Z][a-z]*)/ $$2,\
$1
|
All rules that apply to single characters are collected
together first to build one large character mapping, which is applied to the
input text in one operation. After that all rules that apply to strings,
including regular expression rules, are applied in order, one at a time, to the
whole text. If you need to change this order, e.g. to translate strings first
then characters, you can use two translators. The first will translate just
strings and the second just characters, and you can apply the first and then the
second to achieve the desired result. Translators can also have their
subexpression matches translated, e.g.
$1%upper; will find the match for
$1, then translate it with the
%upper translator. Note that the
semicolon is obligatory here.
As translators can perform almost any edits on texts, the
result of a translator does not preserve any formatting from the original
text.
Some useful translators such as
%lower can be found from the
_translators generator in the Graph metatype. To be able to use these
translators in your own generators, call
_translators() somewhere near the start
of your outermost
generator.