Webinator Search Help


Forming a query

The Webinator's search can be as simple or as complex as you need it to be. Usually you will just need to enter a few words that best describe that which you are trying to locate. To perform more complicated searches you might use any combination of logic operators, special pattern matchers, concept expansion, or proximity operations.

Example: nature conservation organization

Query Rules of Thumb:

  • If you get too many junk or nonsense answers, try:
    1. Add some more words to your query.
    2. Decrease the range of the Proximity control.
    3. Change the Word Forms control to Exact.
    4. Look at the Match Info and see why they are showing up.
    5. Use the Exclusion Operator (-) to remove unwanted terms.
    6. If you are searching for a phrase, hyphenate the words together.

  • If you don't get any answers, or just too few:
    1. Remove some more words to your query.
    2. Examine your spelling.
    3. Increase the scope of the Proximity control.
    4. It just might not be there?

Overview of query abilities

The Webinator is an example Texis application and as such it shares its text query abilities with all of Thunderstone's products. Throughout our documentation you will see references to Metamorph or Texis, this is because all of our products share a common text query language. This document provides only a brief overview of this language.

If you'd like to know more see this document , and if you really want the gory details, see the online manual.

  • Controlling proximity:
    Mastering the usage of proximity gives the ability to locate answers with greater precision. The Webinator input form gives you several options to control the search proximity:

    • line
      All query terms must occur on the same line
    • sentence
      Query items should all reside within the same sentence
    • paragraph
      Within the same paragraph or text block
    • page (default)
      All items must occur within same HTML document

    The bar-graph display ( ********___ ) will be shown any time a ranking search was performed (eg. all searches except Show Parents).

  • Ranking Factors
    The ranking algorithm takes into consideration relative word ordering, word proximity, database frequency, document frequency, and position in text. The relative importance of these factors in computing the quality of a hit can be altered under RANKING FACTORS on the Options page.

  • Keywords Phrases and Wild-cards:
    To locate words, just type them in as you would in a word processor. Letter cases will be ignored.

    The wild-card character * (asterisk) may be used to match just the prefix of a word or to ignore the middle of something.

    If the item you wish to locate is more complicated than the simple * wild-card can accomplish, try using the regular expression matcher.

    To locate a number of adjacent words in a specific order, surround them with " (double quotation) characters. Putting a '-' (hyphen) between words will also force order and one word proximity.

    Examples:

    Query                  Locates
    ----------------------------------------------------------
    john                    john, John
    "john public"           John Public
    web-browser             Web browser, web-browser
    John*Public             John Q. Public, John Public
    456*a*def               1-23456-789-ABCDEF
    activate                activate, activation, activated... (see Word Forms)
    

  • Applying Search Logic
    Texis and Metamorph use set logic for text queries. Set logic is easier to use and provides more abilities than boolean. The examples below make reference to single keywords, but keep in mind that each keyword can represent an entire list of things or any of the special pattern matchers.

    Sets (or lists) of things are specified by placing the elements within parenthesis, separated by commas. example: (bob,joe,sam,sue) . In the examples below, you could replace any of the keywords with a list like this.

    The default behavior of the search is to locate an intersection (or 'AND') of every element within a query. This means that the query; "microsoft bob interface" is the equivalent to the boolean query: "microsoft AND bob AND interface"

    • '-' (without)
      The '-'(minus) is the most commonly used logic symbol. It means the answer should EXCLUDE references to that item.

    • '+' (mandatory)
      The '+'(plus) symbol in front of a search item means that the answer MUST INCLUDE that item. This is generally used in conjunction with the permutation operation.

    • '@N' (permute)
      The '@' followed by a number indicates how many intersections to locate of the terms in your query. This may be confusing at first, but it is very powerful.

    Notes: Only the '+' and '-' operations are valid with a relevance rank search.
    Example               Finds 
    ----------------------------------------------------------------- 
    bob sam joe           Bob with Sam and Joe  (within the selected proximity)
    bob sam -joe          Bob with Sam without Joe
    bob sam joe @1        Bob with Sam, or, Bob with Joe, or, Joe with Sam
    A B C D @1            AB or AC or AD or BC or BD or CD       
    +A B C D @1           ABC or ABD or ACD 
    A B C -D @1           ( AB or AC or BC ) without D

  • Natural Language Query:
    You may enter a query in the form of a sentence or question. The software will automatically identify the important words and phrases within your query and remove the "noise words".

    Example:
    What is the state of the art in text retrieval?

    The software will search for:
    state of the art AND text AND retrieval

  • Using the Special Pattern Matchers
    These pattern matchers are used to locate hard-to-find items within text:

    If improperly used these pattern matchers can slow queries. Therefore they require other keyword(s) in the query, and are disabled entirely under Page proximity. For more details see the Vortex manual on Query Protection.

    Example              Matcher      Finds
    ------------------------------------------------------------------------
    ronald %regan        Approx    Ronald Raygun, Ronald Re~an, Ronald 8eagan
    %75MYPARTNO9045d/6a  Approx    Anything within 75% of looking like MYPARTNO9045d/6a   
    /19[789][0-9]        Reg.Expr. 1970-1999
    /[1-9]{3}\-=[0-9]{4} Reg.Expr. Phone numbers like 555-1212, 820-2200
    #87                  Numeric   four score and seven, 87 
    #>0<1                Numeric   Fractions like 9/16, 55%, 0.123, 15 nanoseconds
    

  • Invoking Thesaurus Expansion
    Metamorph and Texis have an edit-able vocabulary of over 250,000 word and phrase associations. Each entry is generally classifiable by either its meaning or part of speech.

    To expand the meaning of a word or phrase within your query, precede it with a '~' (tilde) character.


Using word forms

The Word forms options give you control over how many variations of your query terms will be sought in your search.
  • Exact: (default) Only exact matches will be allowed.

  • Plural & possessives: Plural and possessive forms will be found. (s, es, 's)

  • Any word forms: As many word forms as can be derived will be located.
EXAMPLES:

 president 
EXACT : president
PLURAL: (above) + presidents president's
ANY   : (above) + presidential presidency preside presides presiding presided

 tight 
EXACT : tight
PLURAL: (above) + tights
ANY   : (above) + tightly tightening tightened tighter tightest

 program 
EXACT : programs 
PLURAL: (above) + programs program's
ANY   : (above) + programming programmatic programmed programmer programmable 
We call this morpheme processing, and it is generally smarter than a traditional "stemming" algorithm. It doesn't just rip the end off a word, it actually checks to see if it could be a valid form of the search term. click here for more info

Notes: Thesaurus terms are also treated in the same manner. Words smaller than 4-5 characters will not be processed.


Controlling proximity

These options give you control over the region in which a match must be found.
  • line: match terms must be located within the same line.

  • sentence: all terms within the same sentence.

  • paragraph: match terms must be located within the same paragraph

  • page: (default) all terms within the same document.

In all cases the best possible matches for your query are located and ordered by decreasing quality. A bar graph is produced to indicate the quality of each answer.


Note: The look and feel described here is for the standard search interface. The interface may have been customized by the web site administrator.

Interpreting search results

When a query is submitted it will come back with another query form and up to 10 matching documents. If there are more than 10 answers, a link at the top and bottom of the list will allow you to view the next 10 in sequence.

The input form at the top allows you further tailor your query to home-in on the desired answers, or to submit a completely new query without having to navigate back to the original input form.

Each answer in the result set will have a format similar to the following:

1: THE DOCUMENT TITLE (hyperlink to original)
This is the document abstract. It consists of the first few hundred characters of text
of the matching document. It is followed by the size of the document in parenthesis...
http://www.thesite.com/thepage.html
84%********__
Size: 11K
Depth: 3
Find Similar
Match Info
Show Parents

The components of each result are:

  • Result number

  • Document title ( clicking on this will take you to the original document )

  • Abstract (The first few hundred characters of the document )

  • Match quality graph. 84%********__ ( Only shown if relevancy ranking was used )

  • Size ( How big is the original document )

  • Depth ( How many clicks from the top of the site )

  • Find Similar ( Find other documents similar to this one )

  • Match Info ( View the matches and other information about the document )

  • Show Parents ( List pages that link to this one )


Viewing match info

The Match Info link will show you the context of your answers within the matching document. Matching words will be shown like this.

Clicking on any match term will take you to the next matching term. A summary at the top of the in-context view shows information about the document including the time it was last indexed by the Webinator.


Finding similar documents

The Find Similar link will find documents that are similar to the corresponding result. It does this by reading the original document to ascertain its main subject matter, and then conducting a relevance ranked search for those subjects.

Result documents are ordered from best to worst match. The bar graph display will indicate the overall quality of the match.

Note:The document you click on may not be ranked as the best match. This is because other documents may contain more information about the overall subject matter than the original.


Showing document parents

Often times it is difficult to navigate using a search engine because there is no back-link present on the matching document. The Show Parents link solves this.

This link will show other documents that contain hyperlinks to the one you click on. In other words, it is an automated back button.

Copyright © 2024 Thunderstone Software LLC. All rights reserved.