Thunderstone Software Document Retreival and Management
Search:
Advanced Search
Home | Products | Company | News | Tech Support | Demos | Contact Us
Texis Manual

Indexing properties

 

indexspace
A directory in which to store the index files. The default is the empty string, which means use the database directory. This can be used to put the indexes onto another disk to balance load or for space reasons. If indexspace is set to a non-default value when a Metamorph index is being updated, the new index will be stored in the new location.

indexblock
When a Metamorph index is created on an indirect field, the indirect files are read in blocks. This property allows the size of the block used to be redefined.

indexmem
When indexes are created Texis will use memory to speed up the process. This setting allows the amount of memory used to be adjusted. The default is to use 40% of physical memory, if it can be determined, and to use 16MB if not. If the value set is less than 100 then it is treated as a percentage of physical memory. It the number is greater than 100 then it is treated as the number of bytes of memory to use. Setting this value too high can cause excessive swapping, while setting it too low causes unneeded extra merges to disk.

indexmeter
Whether to print a progress meter during index creation/update. The default is 0 or 'none', which suppresses the meter. A value of 1 or 'simple' prints a simple hash-mark meter (with no tty control codes; suitable for redirection to a file and reading by other processes). A value of 2 or 'percent' or 'pct' prints a hash-mark meter with a more detailed percentage value (suitable for large indexes). Added in version 4.00.998688241 Aug 24 2001.

addexp
An additional REX expression to match words to be indexed. This is useful if there are non-English words to be searched for, such as part numbers. When an index is first created, the expressions used are stored with it so they will be updated properly. The default expression is \alnum{2,99}. Note: Only the expressions set when the index is first created are saved. Expressions set during an update (issuance of ``create metamorph [inverted] index'' on an existent index) will not be added.

delexp
This removes an index expression. Expressions can be removed either by number or by expression.

lstexp
Lists the current index expressions. The value specified is ignored (but required).

addindextmp
Add a directory to the list of directories to use for temporary files while creating the index. If temporary files are needed while creating a Metamorph index they will be created in one of these directories, the one with the most space at the time of creation. If no addindextmp dirs are specified, the default list is the index's destination dir (eg. database or indexspace), and the environment variables TMP and TMPDIR.

delindextmp
Remove a directory from the list of directories to use for temporary files while creating a Metamorph index.

lstindextmp
List the directories used for temporary files while creating Metamorph indices. Aka listindextmp.

btreethreshold
This sets a limit as to how much of an index should be used. If a particular portion of the query matches more than the given percent of the rows the index will not be used. It is often more efficient to try and find another index rather than use an index for a very frequent term. The default is set to 50, so if more than half the records match, the index will not be used. This only applies to ordinary indices. See infthresh and infpercent for control of Metamorph indices.

maxlinearrows
This set the maximum number of records that should be searched linearly. If using the indices to date yield a result set larger than maxlinearrows then the program will try to find more indices to use. Once the result set is smaller than maxlinearrows, or all possible indices are exhausted, the records will be processed. The default is 1000.

likerrows
How many rows a single term can appear in, and still be returned by liker. When searching for multiple terms with liker and likep one does not always want documents only containing a very frequent term to be displayed. This sets the limit of what is considered frequent. The default is 1000.

indexaccess
If this option is turned on then data from an index can be selected as if it were a table. When selecting from an ordinary index, the fields that the index was created on will be listed. When selecting from a Metamorph index a list of words and number of documents containing each word will be returned.

indexchunk
In versions of Texis after October 1998, the indexchunk setting is deprecated and unused. In prior releases, when creating a Metamorph index temporary files are used which in the worst case can grow to twice the size of the data being indexed. This process can be broken into stages, such that after indexing a certain amount of data the temporary files are processed, to generate a partial index, and then the process repeats for the rest of the data. By default the amount of free disk space is checked on startup, and used to calculate when it will need to perform the processing step. If the system does not report free disk space accurately, or to free more disk space, this value can be changed. The default is 0, which automatically calculates a value. Otherwise it is set to the number of bytes of data to index before processing the temporary files. Lower values conserve disk space, at the expense of more time to process intermediate files.

cleanupwait
Windows/NT specific After updating a Metamorph index the database will wait this long before trying to remove the old copy of the index. This is to allow any other process currently using the index time to stop using the index, so it can be removed. The default is twenty seconds. If a whole batch of Metamorph indices are being updated right after another, it may be useful to set this to 0 for all but the last index, as an attempt will be made to remove all old indices after every index update.

indextrace
For debugging: trace index usage, especially during searches, issuing informational putmsgs. Greater values produce more messages. Note that the meaning of values, as well as the messages printed, are subject to change without notice. Aka traceindex, traceidx. Added in version 3.00.942186316 19991109.

tracerecid
For debugging: trace index usage for this particular recid. Added in version 3.01.945660772 19991219.

indexdump
For debugging: dump index recids during search/usage. Value is a bitwise OR of the following flags:
Bit 0
for new list
Bit 1
for delete list
Bit 2
for token file
Bit 3
for overall counts too

The default is 0.

indexmmap
Whether to use memory-mapping to access Metamorph index files, instead of read(). The value is a bitwise OR of the following flags:
Bit 0
for token file
Bit 1
for .dat file

The default is 1 (ie. for token file only). Note that memory-mapping may not be supported on all platforms.

indexreadbufsz
Read buffer size, when reading (not memory-mapping) Metamorh index .tok and .dat files. The default is 64KB; suffixes like ``KB'' are respected. During search, actual read block size could be less (if predicted) or more (if blocks merged). Also used during index create/update. Decreasing this size when creating large indexes can save memory (due to the large number of intermediate files), at the potential expense of time. Aka indexreadbufsize. Added in version 4.00.1006398833 20011121.

indexwritebufsz
Write buffer size for creating Metamorph indexes. The default is 128KB; suffixes like ``KB'' are respected. Aka indexwritebufsize. Added in version 4.00.1007509154 20011204.

indexmmapbufsz
Memory-map buffer size for Metamorph indexes. During search, it is used for the .dat file, if it is memory-mapped (see indexmmap); it is ignored for the .tok file since the latter is heavily used and thus fully mapped (if indexmmap permits it). During index update, indexmmapbufsz is used for the .dat file, if it is memory-mapped; the .tok file will be entirely memory-mapped if it is smaller than this size, else it is read. Aka indexmmapbufsize. The default is 0, which uses 25% of RAM. Added in version 3.01.959984092 20000602. In version 4.00.1007509154 20011204 and later, ``KB'' etc. suffixes are allowed.

indexslurp
Whether to enable index ``slurp'' optimization during Metamorph index create/update, where possible. Optimization is always possible for index create; during index update, it is possible if the new insert/update recids all occur after the original recids (eg. the table is insert-only, or all updates created a new block). Optimization saves about 20% of index create/update time by merging piles an entire word at a time, instead of word/token at a time. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1004391616 20011029.

indexappend
Whether to enable index ``append'' optimization during Metamorph index update, where possible. Optimization is possible if the new insert recids all occur after the original recids, and there were no deletes/updates (eg. the table is insert-only); it is irrelevant during index create. Optimization saves index build time by avoiding original token translation if not needed. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1006312820 20011120.

indexwritesplit
Whether to enable index ``write-split'' optimization during Metamorph index create/update. Optimization saves memory by splitting the writes for (potentially large) .dat blocks into multiple calls, thus needing less buffer space. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1015532186 20020307.

 [indexbtreeexclusive] Whether to optimize access to certain index B-trees during exclusive access. The optimization may reduce seeks and reads, which may lead to increased index creation speed on platforms with slow large-file lseek behavior. The default is 1 (enabled); set to 0 to disable. Added in version 5.01.1177548533 20070425.

mergeflush
Whether to enable index ``merge-flush'' optimization during Metamorph index create/update. Optimization saves time by flushing in-memory index piles to disk just before final merge; generally saves time where indexslurp is not possible. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1011143988 20020115.

indexversion
Which version of Metamorph index to produce or update, when creating or updating Metamorph indexes. The supported values are 0 through 3; the default is 2. Setting version 0 sets the default index version for that Texis release. Note that old versions of Texis may not support version 3 indexes. Version 3 indexes may use less disk space than version 2, but are considered experimental. Added in version 3.00.954374722 20000329.

indexmaxsingle
For Metamorph indexes; the maximum number of locations that a single-recid dictionary word may have and still be stored solely in the .btr B-tree file (without needing a .dat entry). Single-recid-occurence words usually have their data stored solely in the B-tree to save a .dat access at search time. However, if the word occurs many times in that single recid, the data (for a Metamorph inverted index) may be large enough to bloat the B-tree and thus negate the savings, so if the single-recid word occurs more than indexmaxsingle times, it is stored in the .dat. The default is 8.

uniqnewlist
Whether/how to unique the new list during Metamorph index searches. Works around a potential bug in old versions of Texis; not generally set. The possible values are:
0
: do not unique at all
1
: unique auxillary/compound index new list only
2
: unique all new lists
3
: unique all new lists and report first few duplicates

The default is 0.

tablereadbufsz
Size of read buffer for tables, used when it is possible to buffer table reads (eg. during some index creations). The default is 16KB. When setting, suffixes such as ``KB'' etc. are supported. Set to 0 to disable read buffering. Added in version 5.01.1177700467 20070427. Aka tablereadbufsize.


Copyright © Thunderstone Software     Last updated: Wed Sep 10 11:42:21 EDT 2008
 
Home   ::   Products   ::   Company   ::   News   ::   Tech Support   ::   Demos   ::   Contact Us
Copyright © 2008 Thunderstone Software LLC. All rights reserved.