|
Example 1:
Let's say that we want to search for any occurrence of An Intel
80X86 processor on the same line with the concept of "speed" or
"benchmark" as long as the string "Motorola" is not present.
The query is: +/80=[1-4]?86 -/motorola speed benchmark
Explanation:
A leading '+' means "this must be present".
A leading '-' means "this must not be present".
The '/' signals the use of a regular-expression.
'/80=[1-4]?86' will locate an '80' followed by an optional ('1' or
'2' or '3' or '4') followed by an '86'. This will locate: 8086,
80186, 80286, 80386 or 80486.
'/motorola' will locate 'MOTOROLA' or 'Motorola' or 'motorola' (or
any other combination of alphabetic cases).
'speed' will locate any word that means "speed".
'benchmark' will locate any word that means "benchmark".
The beginning and ending delimiting expressions would be defined
as '\n' (meaning a new-line character).
The Metamorph search engine will now optimize this search and will
perform the following actions:
- A:
- Search for any pattern that matches
'/80=[1-4]?86'.
When it is located do item (B).
- B:
- Search backwards for the start delimiter
'\n' (or begin of
file/record whichever comes first).
- C:
- Search forwards for the ending delimiter
'\n' (or end of
file/record whichever comes first).
- D:
- Search for the pattern
'/motorola' between the start and end
delimiters. If it is not located do item (E), otherwise go
to item (A).
- E:
- Search for the set of words that mean "benchmark". If a
member is located do item (G), otherwise, do item (F).
- F:
- Search for the set of words that mean "speed". If a
member is located do item (G), otherwise, go to item (A).
- G:
- Inform the user that a hit has been located.
Example 2:
Let's say we are searching an address and phone number list trying
to find an entry for a person whose name has been apparently
entered incorrectly.
The query: "%60 Jane Plaxton" "%60 234 rhoads dr." /OH /49004
Because our database is large, we want to enter as much as
possible about what we know about Ms. Plaxton so that we decrease
the number of erroneous hits. The actual address in our database
looks as follows:
Jane Plxaton
243 Roads Dr.
Middle Town OH 49004
This is a little exaggerated for reasons of clarity, but what has
happened is that the data-entry operator has transposed the 'x'
and the 'a' in 'Plaxton' as well as the '4' and '3' and has also
misspelled 'Rhodes'.
The query we performed has four sets:
A 60% approximation of: > "Jane Plaxton"
A 60% approximation of: > "234 rhoads dr."
The state string : > OH
The zip code string : > 49004
The database records are separated by a blank line, therefore our
start and end delimiters will be '\n\n' (two new-line characters).
The Approximate pattern matcher will be looking for the name and
street address information and will match anything that comes
within 60matcher will default to 80regular-expression pattern matcher will be looking for the state
and zip-code strings. We are searching for three intersections of
the four sets (this is the default action).
Example 3:
We are reading the electronic version of the Wall Street Journal
and we are interested in locating any occurrence of profits and/or
losses that amount to more than a million dollars.
The query: +#>1,000,000 +dollar @0 profit loss gain
The '+' symbol in front of the first two terms indicates that they
must be present in the hit. The '@0' tells Metamorph to find zero
intersections of the following sets. Put another way, only one of
the remaining sets needs to be located.
The sets:
- Mandatory (because of the
'+' symbol):
- Any quantity in the text that is greater than one million.
- Any word (or string) that means "dollar".
- Permutation (because of the
'@0'):
- Anything that means "profit".
- Anything that means "loss".
- Anything that means "gain".
We would probably define the delimiters to be either a sentence or
a paragraph.
The following would qualify as hits to this query:
-
Congress has spent 2.5 billion dollars on the
stealth bomber.
-
Lockheed Corp. has taken a four million dollar
contract from Boeing.
-
The Lottery income from John Q. Public last week
was One Million Two Hundred and Fifty Thousand
dollars and twenty five cents.
Copyright © Thunderstone Software Last updated: Sun Mar 17 21:14:49 EDT 2013
|