Categorize documents or text records to add value to full-text search systems
Texis Categorizer—also known as a classifier—automatically attaches categories, subject codes, metadata and more to documents or text records. The categorizer is an application of the Texis platform and Texis Web Script (Vortex) product. The Texis underpinnings provide a broad range of "hooks" for using the categorizer and tying it into other computer applications. For added flexibility, the categorizer handles most European languages.
Manual, Automatic or Mixed Operation
Each automatic category "recommendation" receives a statistical confidence score. Operation may be manual, automatic or mixed. In manual mode, an operator accepts or rejects each recommendation. In automatic mode, categories are applied without user intervention. In mixed operation, one designates a confidence score threshold so that recommendations above the threshold are accepted automatically and those below are held for human review.
Enhance Text Searching
The benefit categories bring to full-text search systems include:
- Sorting: keys for sorting or grouping search results.
- Menus: provide a “controlled vocabulary” that users can select from, instead of, or in addition to, trial-and-error searching.
- Browsing: a finite set of hyperlinks can be "navigated" as a means to browse through data in an organized fashion.
The classification results can be stored in the database or drive further processing. Customers generally begin with a taxonomy, or pre-determined set of categories, but authorized users can create new categories as needed through the dynamic system.
A Highly Scalable Classifier
Because Texis Categorizer is highly scalable, one typical server can classify tens of thousands of documents daily. And, it can perform real-time operation by performing categorization on new documents as soon as they are available.
More Accurate Search
Accuracy is a function of both the quantity and quality of the examples. Categorization results approved or corrected by an operator are fed back into the training base, helping the categorizer results become even more accurate over time. In addition, hierarchical category schemes are easily accommodated.