THUNDERSTONE NEWS
CONTENTS
NEW WEBINATOR UNPACKING FEATURES
Webinator and Texis now can automatically unpack many types of
compressed files and index the contents. The most common use of this
feature will be to index *.zip, gzip (*.gz), and Unix tape archive
(*.tar) files. However, it also handles a variety of other formats if
you have the appropriate unpacker or translator. Examples include Rich
Text Format (*.rtf), Microsoft Help (*.hlp and *.chm), and Microsoft
TNEF files (attachments).
In addition, any other format for which you have a translator
or unpacker can be handled by editing the config file!
The unpacking feature is now part of the Webinator and Texis
File-Format plug-in (anytotx) from version 4.3 on. Texis
maintenance customers or those with Webinator paid versions 4.0+ may
request a copy of the new plug-in from Tech Support. Other customers
may obtain the new plug-in by upgrading Webinator or joining Texis
Maintenance.
NEW TEXIS FILE CRAWLING FEATURE
Texis can now crawl both local and network accessible files. This makes
it easier to index documents that are not served by a web or FTP
server. The feature is implemented as an enhancement to the Vortex
<fetch> statement, which can now fetch
file:// URLs, so network files can be indexed directly.
For example, to get the file C:\myfile.txt, use the
statement <fetch
"file:///c|/myfile.txt">.
This feature is available as of Texis version 4.3. The enhancement
eliminates a step some customers used in the past, involving an
<exec> or <stat> of a directory
listing, then reading the individual file names to insert into Texis.
It also eliminates having to manually map such files to a
file:// URL during the search, since the URL can be
fetched and stored as-is.
The new feature also is used in the dowalk script distributed with
Texis releases. This means Texis customers can use the Webinator
application to crawl network directories together with
http:// and other URLs, right out of the box. Customers
with Webinator only need to upgrade to Texis to take advantage of this
feature.
MEET US IN WASHINGTON IN APRIL
Thunderstone will be exhibiting at FOSE, April 8-10, and e-gov Knowledge Management
Conference, April 16, both in Washington D.C. Please stop by our
display to talk about how you use Texis, Webinator, or Vortex! (Let us
know you read about it here!) Admission to the exposition hall is free
for government employees at both events.
TECH CORNER: CALCULATED RANKS
With the ability in Texis to evaluate SQL expressions, you can order by
a computed value. One common use is to modify the relevance ranking to
include data from another field, such as date or price.
For example, John Punshon of BMW UK wanted to return results ordered
such that 80% weight was given to $rank (the relevance score assigned
by Texis), and 20% to the age of the document. He came up with the
following Vortex code:
<$now=(convert( 'now' , 'date' ))>
<SQL ROW SKIP=$skip MAX=10 "select id, Date,
(($$rank+5)/10)+(7300/((($now - Date)/86400)+365)) relevance
from content where body likep $query
order by 3 desc">
The value of $rank was between 0 and 799 in their experience, so
(($$rank+5)/10) will return between 0 and 80. 7300/365 is 20, which
will be the weight given to today's records (86400 is the number of
seconds in a day), and they will taper off approaching 0 as they get
older. The ORDER BY 3 uses the third field selected to sort.
You can use these examples as guidelines for your own sorts. Please let
us know of other cool examples you come up with!
Feedback, suggestions and questions are welcome to