|
One of the most powerful functions in Vortex - indeed all of
Texis - is <rex>
. The <rex>
function searches for a regular expression in data.
The use of regular expressions permeates Texis because they are
so versatile and fast at parsing data. While occasionally difficult
to understand, they are well worth the mental gymnastics sometimes
needed to write them. (They're called regular expressions
because in Unix they're also so common that they're, well, regular..)
In Vortex we can scan a variable for a regular expression with
<rex>
:
<rex "\digit{3}-=\digit{2}-=\digit{4}" $data>
Social Security numbers:
<LOOP $ret>
$ret <BR>
</LOOP>
|
This scans for the given expression in $data
, which in
our example is some plain English text with embedded Social Security
numbers we want to scan for. <rex>
finds each occurence of
such a number in $data
, and returns them in a list in $ret
.
Let's examine that REX expression more closely. A REX expression
is composed of one or more subexpressions which are searched
for adjacently. Each subexpression is terminated with a
repetition operator: =
means once, {N}
means
exactly N times.
So our expression first looks for 3 digits - \digit
means
"any single digit". Then it looks for one dash (-=
).
Then 2 digits. Then another single dash. Then 4 digits.
Take a look at the Vortex manual on REX Expression Syntax
for more details. It is very useful to become familiar
with REX.
Search and replace
In Vortex, we can also search and replace in a string:
<sandr "\digit{3}-=\digit{2}-=\digit{4}" "XXX-XX-\5" $data>
Full text with partially blacked-out numbers:
$ret
|
The <sandr>
(search and replace)
function takes a REX expression and a replace string, in addition to
search data. Instead of the matches, it returns the $data
,
but with each match having the replace string substituted in.
Certain characters in the replace string are special;
see <sandr>
for full details. Here we use \5
to replace the 5th subexpression's match back in. That's
the last 4 digits. So we replace full Social Security numbers
like 123-45-6789 with XXX-XX-6789: partially blacking them out.
|