Navigation Toggle

Texis Overview

January 9, 2013
Texis Overview

Executive Summary:

  • TEXIS is the only fully integrated SQL RDBMS optimized for full-text search.
  • TEXIS has high-performance ability to intelligently query and manage databases containing natural language text, numeric values, standard data types, geographic information, images, video, audio and other payload data.
  • TEXIS powers real-time applications with zero-latency data insertion, providing immediate search availability of key data without waiting for scheduled index updates.
  • TEXIS efficiently sorts and groups search results by any field(s) in the data. It can quickly sort tens of thousands of hits or more.
  • TEXIS, the innovative development platform behind Thunderstone's entire line of enterprise search products, lets users and developers incorporate their own unique knowledge and expertise into customized search solutions that easily integrate with other applications.

What makes Texis different from other search engines and databases?

Thunderstone's Texis is the only search engine developed from the ground up as a fully integrated SQL RDBMS optimized for full-text search, and it's the only relational database that can store and search text documents of unlimited size within standard database tables.

Used by hundreds of thousands of database application developers around the world, Structured Query Language provides many advantages for satisfying complicated search requirements. SQL also holds great promise as a reliable, well-defined path for implementing unanticipated new search functionality in the future. All other search engines offer a much narrower range of possibilities based on proprietary interfaces.

While typical search engines do a nice job of searching unstructured text and traditional databases have an impressive ability to handle queries on fielded or structured data, text searching and relational database management each rely upon radically different paradigms for organizing and retrieving information. They both developed and matured over decades as completely separate technologies, and they don't “marry” easily.

Thunderstone is the only company that has accomplished the true marriage of a full-text search engine and a powerful SQL relational database in a single platform. Addressing this challenge, the simultaneous searching of structured and unstructured data, remains one of Thunderstone's core competencies.

Deep in the heart of the Texis RDBMS resides Thunderstone's Metamorph, a concept-based natural language search engine utilizing advanced lexical set logic.

Metamorph has often been classified as a form of Artificial Intelligence, since its functions fall into the categories of knowledge acquisition, natural language processing and intelligent text retrieval. The software attempts in its own way to understand your search queries, to represent its understanding to the data in the files and to come up with relevant responses as retrieved portions of full-text information which best correspond to your submitted queries.

Metamorph's starting vocabulary has 250,000+ word connections, constructed in a dense web of associations and equivalences. Search parameters can be adjusted to dynamically dictate surface and deep inference. The program's responses can be controlled so that they are direct or abstract in relation to user queries. Proximity of concept can be fine tuned so as to qualify degree of relevance, providing matches which are sometimes concrete, sometimes abstract, as desired.

Metamorph allows for editing word sets. This means that you may select which associations you would like in connection to any search. You can create your own concept sets permanently for future use. You can fine tune the search to use associations of only a certain part of speech. You can enter all known spelling variations of any particular search word in the same way. You can generally customize the program to include your own nomenclature and vocabulary, making it increasingly intelligent the longer it is in use. When you want to control exactly what associations are made with any or all of the words or expressions in your searches, you can do so by editing the equivalence set associated with any word already known by the Equivalence File or by creating associations for a new or created word not yet known.

You can call up the ApproXimate Pattern Matcher (XPM) and tell it to look for a certain percentage of proximity to an entered string, finding misspelled names and typos. You can also look for numeric quantities entered as text, thanks to the Numeric Pattern Matcher (NPM) which recognizes that “four score and seven” is the same amount as “87.”

Metamorph allows users to search for intersections of sets of lexical items, while also performing prefix and suffix morpheme processing. Users can specify, right in their queries, the delimiters of choice: i.e., they can look for lexical intersections within a sentence, a paragraph, a page, a designated amount of text or some other defined textual unit such as a memo.

Texis, with Metamorph inside it, provides a modular set of tools to attack the formidable problem of how to get at and deal with a large volume of information when you don't really know precisely what you need or where to find it. Thunderstone's Texis gives you the power, speed and flexibility to rapidly implement a customized search solution that will accomplish your data access/retrieval objectives in the most dynamic, efficient and pragmatic way possible.

Thunderstone's Texis has a number of characteristics and built-in advantages that differentiate it from other search solutions:

  1. A fully integrated SQL database management system (DBMS) that follows the relational database model, Texis is optimized for addressing the inclusion of unlimited quantities of narrative full text. It provides a method for managing and manipulating an organization's shared data, where intelligent text retrieval is harnessed as a qualifying action for selecting the desired information. Texis simultaneously provides full-text, fielded and Boolean searching of both structured and unstructured content.
  2. Texis powers real-time applications with zero-latency data insertion, providing immediate search availability of new data without waiting for scheduled index updates. Unlike other search tools, Texis ensures that all information which has been added to any table can be searched immediately -- regardless of whether the table has been indexed and regardless of whether it has been suggested that an index be maintained on that table or not. Sequential table space scans and index-based scans are efficiently managed by Texis so that the database can always be searched in the most optimized manner with the most current information available to the user.
  3. Texis enables searchers to sort and group query results by any field(s) in the data. And Texis can quickly sort tens of thousands of hits or more. Other search tools either bog down sorting more than a few hundred items or else their sorting features are much more limited than the capabilities of Texis.
  4. Texis allows you to treat the concatenation of any number of text fields as a single “virtual field.” As a single field you can create an index on the fields, search the fields and perform any other operation allowable on a field.
  5. Texis has high-performance ability to intelligently query and manage databases containing natural language text, numeric values, standard data types, geographic information, images, video, audio and other payload data. While Texis excels at purposeful manipulation of textual information, it also performs useful mathematical operations on your data. You can construct queries that combine calculated values with text search.
  6. Texis lets you create an unlimited number of independent search collections -- each with their own unique data types, fields, attributes or parameters. It also empowers users to submit queries to multiple search engines and/or multiple collections and have the results displayed together or combined.
  7. Texis gives developers prototype-friendly customization tools, extreme flexibility, rapid deployments and a feature-rich API. It supports multiple Search User Interfaces that offer specially-defined views of query input and results for different audience types or even for each unique individual. Thunderstone's Texis imposes no user interface requirements. Texis Web Script (Vortex) maintains “neutrality” with regard to whatever HTML markup (or JavaScript or other user interface technology) is employed for the user results presentation.

Which enterprise search applications require the robustness and flexibility of Texis?

Texis is the premier solution when large-scale, mission-critical and/or complex information retrieval challenges call for full-text searching tightly integrated with traditional structured database querying. Businesses, governments, NGOs and educational institutions use Texis in a wide range of applications such as online catalogs, auctions, classifieds, automated categorization, litigation support, intelligence collection/analysis, risk assessment, quality control, CRM, knowledge discovery, document and multimedia management, internet publishing, vertical portals, real-time message handling, web searching and many others.

Thunderstone's Texis provides the ideal development platform for rapidly deployable, custom-designed applications that require both unstructured and structured types of searching:

  • Online catalogs contain unstructured text (product name, description, etc.) and structured content (style/size, price, in-stock availability, etc.) Users expect the ability to search by item description, to navigate by price range or to do both in combination.
  • Knowledge management systems demand very efficient and secure enterprise-wide information retrieval across multiple repositories that serve different types of users, who all want dynamic, context-sensitive views of defined content (structured data) with the ability to refine results through full-text searching (unstructured data).
  • A Thunderstone solution provider customer has deployed Texis in a "brute force" full-text search scenario for its DoD Intelligence Community customer, using Thunderstone's Texis to search the contents of a massive Oracle database in a counter-terrorism effort. Texis is being used as an adjunct to Oracle full-text search because of its ability to scale while still providing superior performance in both rate of ingestion as well as search. Thunderstone's Texis enables this customer to search across a 20 terabyte index, ingesting 70-80 million new records per hour and returning typical search results in < 10 seconds.
  • A Fortune 20 customer is using Thunderstone's Texis as the search platform for what they describe as the "single largest knowledge management system currently deployed at any corporation in the world." The application encompasses knowledge, people and processes, and it is used globally within the organization to access more than 30 terabytes of data. Users access the application 20+ million times per day, retrieving and sharing information from across the global enterprise. The application is the most-used corporate I.T. resource after e-mail.

Thunderstone's Texis lets users and developers incorporate their own unique knowledge and expertise into customized search solutions that easily integrate with other applications. For additional information call +1 216 820 2200 or visit us online at http://www.thunderstone.com.

Texis' Metamorph Compound Index

January 9, 2013
Texis' Metamorph Compound Index

The METAMORPH and METAMORPH INVERTED indexes in Texis are used to improve the performance of text searches using full-text queries with LIKE, LIKEP, and the rest of the LIKE family. Often the query involves other values, which are used to either sort the results, or further restrict the results returned.

One example is in the Webinator application, which provides the option to sort the results by date. Historically, the way to improve the performance of the ORDER BY was to use an INVERTED INDEX. If you also wanted to do date range restriction, then you could add a regular INDEX as well.

The Metamorph compound index will provide better performance than the three indexes since all the data is available from a single index, and also requires less maintenance. For the query:

SELECT Url FROM html
 WHERE Title\Description\Keywords\Meta\Body LIKE $query
   AND Visited BETWEEN $first AND $last
 ORDER BY Visited DESC;

You could create the index as:

CREATE METAMORPH INVERTED INDEX xhtmlbodv ON HTML(Title\Description\Keywords\Meta\Body, Visited);

Which is the CREATE INDEX statement you will find in the Webinator dowalk script.

If there are several fields that you might use in the query or ORDER BY, then you can specify all of them as additional fields. The order of the fields does not matter, and the engine may use any combination of them. If in Webinator you also wanted to allow searches and sorts based on the Depth field, you could add Depth to the index:

CREATE METAMORPH INVERTED INDEX xhtmlbodvd ON HTML(Title\Description\Keywords\Meta\Body, Visited, Depth);

Then, with the ability of Vortex to ignore parts of the query you could write a query:

<switch $o>
    <case d><$orderby="ORDER BY Depth">
    <case v><$orderby="ORDER BY Visited DESC">
</switch>
<SQL ROW "SELECT Url FROM html
 WHERE Title\Description\Keywords\Meta\Body LIKE $query
    AND (Visited BETWEEN $first AND $last
    AND Depth BETWEEN $low and $high) " $orderby>

That will allow efficient searching and ordering on any combination of Visited and Depth, as long as a query is specified for the LIKE.

The compound index can also be used for GROUP BY or other queries that can fully rely on the index data, e.g.:

SELECT Depth, count(*) from html
 WHERE Title\Description\Keywords\Meta\Body LIKE $query
 GROUP BY Depth;

 

Key facts

  • In a full-text index (any of the variations of METAMORPH INDEX) the first field specified must be the full-text field, and will be indexed accordingly.
  • The first field may be a virtual field, if you want to search across multiple database fields. In the above example we would search the Title, Description, Keywords, Meta and Body fields as if they were a single field.
  • The full-text index will only be used if the full-text field is being queried with a full-text query. In the above example, if there was no LIKE clause, or it was dropped by Vortex because it matches $null, then the METAMORPH INDEX would not be used.
  • The additional fields beyond the full-text field should be small, fixed size fields, most commonly dates and numbers.
  • Using too many additional fields can negate the performance benefits of having the index. Care should be taken to ensure that only those fields actually used in queries are represented in the index.
  • The total size of the additional fields should be small relative to the size of the record, and should not exceed a few hundred bytes per record.
  • The total size of additional indexed data (number or rows multiplied by size per row) should be no larger than 25% of physical memory on the server.
  • If you specify a VARCHAR(N) field as an additional field, you will get a warning message "Variable size warning". The index will still be created, and N bytes of the field will be indexed (where N is from the declaration of the field) for each row. If N is large, this will bloat the index, reducing performance.
  • Updating fixed size fields, including the additional fields can be done without causing the index to go out of date and needing to be updated. Updating the full-text field, or any variable sized field (e.g. VARCHAR, BLOB, INDIRECT) will still cause the index to require an update.
  • Parts of the where clause that use the compound should be grouped together with parentheses for maximum efficiency.

Recent