Thunderstone Software Document Search, Retrieval, and Management
Search:
Vortex Manual
 

wordlist, wordcount, wordoccurrencecounts - get words and frequencies from index

 

SYNOPSIS

<wordlist $table [$field [$wordsOrWildcards [$options]]]>
<wordcount>
<wordoccurrencecounts>


DESCRIPTION
The wordlist function returns a list of the words in the given $table, as found in a Metamorph index. The first Metamorph index found is used, for $field if given, otherwise any field. Each value of $wordsOrWildcards can be a single word, in which case only that word is returned, or a word-prefix followed by "*" (asterisk), in which case only words having that prefix are returned. If $wordsOrWildcards is not given, all words are returned.

The wordcount function returns a list of the row counts of each corresponding word returned by the previous wordlist, i.e. the number of rows each word occurs in. If the $options value to wordlist was "NOCOUNTS", then no counts are available via wordcount. This is to conserve memory when examining a large list but when the counts are not needed.

In version 6 and later, the wordoccurrencecounts function returns a list of the occurrence counts of the corresponding words. E.g. if a word occurs twice in each of 10 documents, its wordcount value will be 10, while its wordoccurrencecounts value will be 20. Note that word occurrence information is only stored for inverted Metamorph indexes: non-inverted indexes will return 0 or nothing for word occurrence values.


DIAGNOSTICS
wordlist returns a list of the words found in a Metamorph index. wordcount returns the corresponding document frequencies of those words. wordoccurrencecounts returns the hit counts (every word every doc).


EXAMPLE
This example prints a list of the words and their frequencies in the title field of the table books, sorted by ascending frequency (e.g. rarest first):

<wordlist "books" "title"><$words = $ret>
<wordcount>
<sort $ret $words>
<LOOP $ret $words>
  $words $ret
</LOOP>


CAVEATS
The wordlist and wordcount functions were added Feb. 20 1997. The NOCOUNTS option was added in March 1999.

A Metamorph index must exist on the named table/field for wordlist to work. Note that what constitutes a word, and how many words there are, is dependent on the Metamorph index, how it was created (e.g. the index expression), and when it was last updated.


Copyright © Thunderstone Software     Last updated: Mon Feb 18 10:28:15 EST 2013
 
Home   ::   Products   ::   Solutions   ::   How to Buy   ::   Support   ::   Contact Us   ::   News   ::   About
Copyright © 2013 Thunderstone Software LLC. All rights reserved.