Navigation Toggle

Texis' Metamorph Compound Index

January 9, 2013
Texis' Metamorph Compound Index

The METAMORPH and METAMORPH INVERTED indexes in Texis are used to improve the performance of text searches using full-text queries with LIKE, LIKEP, and the rest of the LIKE family. Often the query involves other values, which are used to either sort the results, or further restrict the results returned.

One example is in the Webinator application, which provides the option to sort the results by date. Historically, the way to improve the performance of the ORDER BY was to use an INVERTED INDEX. If you also wanted to do date range restriction, then you could add a regular INDEX as well.

The Metamorph compound index will provide better performance than the three indexes since all the data is available from a single index, and also requires less maintenance. For the query:

 WHERE Title\Description\Keywords\Meta\Body LIKE $query
   AND Visited BETWEEN $first AND $last

You could create the index as:

CREATE METAMORPH INVERTED INDEX xhtmlbodv ON HTML(Title\Description\Keywords\Meta\Body, Visited);

Which is the CREATE INDEX statement you will find in the Webinator dowalk script.

If there are several fields that you might use in the query or ORDER BY, then you can specify all of them as additional fields. The order of the fields does not matter, and the engine may use any combination of them. If in Webinator you also wanted to allow searches and sorts based on the Depth field, you could add Depth to the index:

CREATE METAMORPH INVERTED INDEX xhtmlbodvd ON HTML(Title\Description\Keywords\Meta\Body, Visited, Depth);

Then, with the ability of Vortex to ignore parts of the query you could write a query:

<switch $o>
    <case d><$orderby="ORDER BY Depth">
    <case v><$orderby="ORDER BY Visited DESC">
 WHERE Title\Description\Keywords\Meta\Body LIKE $query
    AND (Visited BETWEEN $first AND $last
    AND Depth BETWEEN $low and $high) " $orderby>

That will allow efficient searching and ordering on any combination of Visited and Depth, as long as a query is specified for the LIKE.

The compound index can also be used for GROUP BY or other queries that can fully rely on the index data, e.g.:

SELECT Depth, count(*) from html
 WHERE Title\Description\Keywords\Meta\Body LIKE $query
 GROUP BY Depth;


Key facts

  • In a full-text index (any of the variations of METAMORPH INDEX) the first field specified must be the full-text field, and will be indexed accordingly.
  • The first field may be a virtual field, if you want to search across multiple database fields. In the above example we would search the Title, Description, Keywords, Meta and Body fields as if they were a single field.
  • The full-text index will only be used if the full-text field is being queried with a full-text query. In the above example, if there was no LIKE clause, or it was dropped by Vortex because it matches $null, then the METAMORPH INDEX would not be used.
  • The additional fields beyond the full-text field should be small, fixed size fields, most commonly dates and numbers.
  • Using too many additional fields can negate the performance benefits of having the index. Care should be taken to ensure that only those fields actually used in queries are represented in the index.
  • The total size of the additional fields should be small relative to the size of the record, and should not exceed a few hundred bytes per record.
  • The total size of additional indexed data (number or rows multiplied by size per row) should be no larger than 25% of physical memory on the server.
  • If you specify a VARCHAR(N) field as an additional field, you will get a warning message "Variable size warning". The index will still be created, and N bytes of the field will be indexed (where N is from the declaration of the field) for each row. If N is large, this will bloat the index, reducing performance.
  • Updating fixed size fields, including the additional fields can be done without causing the index to go out of date and needing to be updated. Updating the full-text field, or any variable sized field (e.g. VARCHAR, BLOB, INDIRECT) will still cause the index to require an update.
  • Parts of the where clause that use the compound should be grouped together with parentheses for maximum efficiency.