Texis' Metamorph Compound Index
The METAMORPH and METAMORPH INVERTED indexes in Texis are used to
improve the performance of text searches using full-text queries with
LIKE, LIKEP, and the rest of the LIKE family. Often the query involves
other values, which are used to either sort the results, or further
restrict the results returned.
One example is in the Webinator application, which provides the
option to sort the results by date. Historically, the way to improve
the performance of the ORDER BY was to use an INVERTED INDEX. If you
also wanted to do date range restriction, then you could add a regular
INDEX as well.
The Metamorph compound index will provide better performance than
the three indexes since all the data is available from a single index,
and also requires less maintenance.
For the query:
SELECT Url FROM html
WHERE
Title\Description\Keywords\Meta\Body LIKE $query
AND Visited BETWEEN $first AND $last
ORDER BY Visited DESC;
You could create the index as:
CREATE METAMORPH INVERTED INDEX
xhtmlbodv ON HTML(Title\Description\Keywords\Meta\Body, Visited);
Which is the CREATE INDEX statement you will find in the
Webinator dowalk script.
If there are several fields that you might use in the query or ORDER
BY, then you can specify all of them as additional fields. The order
of the fields does not matter, and the engine may use any combination
of them. If in Webinator you also wanted to allow searches and sorts
based on the Depth field, you could add Depth to the index:
CREATE METAMORPH INVERTED INDEX
xhtmlbodvd ON HTML(Title\Description\Keywords\Meta\Body, Visited, Depth);
Then, with the ability of Vortex to ignore parts of the
query you could write a query:
<switch $o>
<case
d><$orderby="ORDER BY Depth">
<case
v><$orderby="ORDER BY Visited DESC">
</switch>
<SQL ROW "SELECT Url FROM html
WHERE Title\Description\Keywords\Meta\Body
LIKE $query
AND (Visited BETWEEN $first AND $last
AND Depth BETWEEN $low and $high) "
$orderby>
That will allow efficient searching and ordering on any combination
of Visited and Depth, as long as a query is specified for the LIKE.
The compound index can also be used for GROUP BY or other queries
that can fully rely on the index data, e.g.:
SELECT Depth, count(*) from html
WHERE
Title\Description\Keywords\Meta\Body LIKE $query
GROUP BY Depth;
Key facts
- In a
full-text index (any of the variations of METAMORPH INDEX) the first field
specified must be the full-text field, and will be indexed accordingly.
- The
first field may be a virtual field, if you want to search across multiple
database fields. In the above
example we would search the Title, Description, Keywords, Meta and Body
fields as if they were a single field.
- The
full-text index will only be used if the full-text field is being queried
with a full-text query. In the
above example, if there was no LIKE clause, or it was dropped by Vortex
because it matches $null, then the METAMORPH INDEX would not be used.
- The
additional fields beyond the full-text field should be small, fixed size
fields, most commonly dates and numbers.
- Using
too many additional fields can negate the performance benefits of having
the index. Care should be taken to
ensure that only those fields actually used in queries are represented in
the index.
- The
total size of the additional fields should be small relative to the size
of the record, and should not exceed a few hundred bytes per record.
- The
total size of additional indexed data (number or rows multiplied by size
per row) should be no larger than 25% of physical memory on the server.
- If you
specify a VARCHAR(N) field as an additional field, you will get a warning
message "Variable size warning".
The index will still be created, and N bytes of the field will be
indexed (where N is from the declaration of the field) for each row.
If N is large, this will bloat the index,
reducing performance.
- Updating
fixed size fields, including the additional fields can be done without
causing the index to go out of date and needing to be updated.
Updating the full-text field, or any
variable sized field (e.g. VARCHAR, BLOB, INDIRECT) will still cause the index
to require an update.
- Parts
of the where clause that use the compound should be grouped together with
parentheses for maximum efficiency.
|