<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="https://www.thunderstone.com/blog/rss/xslt"?>
<rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Thunderstone Blog: Customized Search Engine &amp; Search Software</title>
    <link>https://www.thunderstone.com/blog/</link>
    <description>The latest on custom search solutions, search appliance, and website search from Thunderstone Software.</description>
    <generator>Articulate, blogging built on Umbraco</generator>
    <item>
      <guid isPermaLink="false">1533</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-releases-version-20/</link>
      <category>Thunderstone</category>
      <category>Main</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <category>parametric search appliance</category>
      <title>Thunderstone Releases Version 20</title>
      <description>&lt;h3&gt;New features include&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;Rank Bias&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Parametric profiles &lt;span&gt;gain new capabilities to set a bias on a per document field using data from field rules.  This allow documents to be biased up or down in the search results, for example PDF results could be biased down, or documents with "Important" in a meta tag could be ranked higher.&lt;/span&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3&gt;Improvements&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;Content Repositories&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;span&gt;Azure Blob Storage support was added for crawling content stored in Azure Blobs.&lt;/span&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;File Share Robustness&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Network shares are remounted if needed at the start of a crawl.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Audit Logging&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;The ability to log all setting modifications was added to aid with auditing setting changes.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Other fixes&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Miscellaneous other bug fixes and enhancements to the crawl and search interfaces.  See the &lt;a href="https://forums.thunderstone.com/viewtopic.php?f=7&amp;amp;t=5541" title="announcement" data-anchor="?f=7&amp;amp;t=5541"&gt;announcement&lt;/a&gt; on the message board for full details.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3&gt;Availability&lt;/h3&gt;
&lt;p&gt;Search Appliance Version 20 is available to all Search Appliance and Parametric Search Appliance customers who have a current maintenance plan by following the menu System / System Setup / Update Software.&lt;/p&gt;
&lt;p&gt;Webinator Version 20 is available to all Webinator customers who have a current maintenance plan by &lt;a href="/contact_us"&gt;contacting Thunderstone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For 36 years Thunderstone Software has been dedicated to providing the best search solutions possible to government, corporate, and non-profit customers, with exceptional service and devotion to finding the right solution for the customer.&lt;/p&gt;
&lt;p&gt;For further information call +1 216-820-2200, visit &lt;a href="http://www.thunderstone.com/"&gt;www.thunderstone.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
      <pubDate>Tue, 16 Jan 2018 00:00:00 -0500</pubDate>
      <a10:updated>2018-01-16T00:00:00-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1521</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-releases-version-19/</link>
      <category>Thunderstone</category>
      <category>Main</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <category>parametric search appliance</category>
      <title>Thunderstone Releases Version 19</title>
      <description>&lt;h3&gt;New features include&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;Entity Recognition&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Parametric profiles &lt;span&gt;gain new capabilities to extract information from document text. Now even documents with no metadata can contribute terms to faceted navigation, via entity recognition. For example, a list of terms, e.g. cities or manufacturers, can be defined for extraction and then searched or grouped by. Or a regular expression can be crafted to match a specific pattern of data — e.g. a defined syntax for part or social security numbers. And the XML syntax for such entities is largely compatible with GSA syntax, making upgrading a snap.&lt;/span&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3&gt;Improvements&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;Dashboard Statisitics&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;span&gt;Several new statistics have been added to help monitor activity, including the search rate, fetch rate, and document growth rate so you can identify problems that may arise in the future.&lt;/span&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Unmount Shares&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Network shares can now be unmounted without removing the settings.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;XSS Protection fix&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Browsers may have warned on the settings page about possible XSS problems.  There was no vulnerability in the appliance, and the settings were updated correctly, however now the XSS warnings no longer are triggered.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Other fixes&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Miscellaneous other bug fixes and enhancements to the crawl and search interfaces including result counts and some REX expression handling.  See the &lt;a href="https://forums.thunderstone.com/viewtopic.php?p=22944" title="announcement" data-anchor="?p=22944"&gt;announcement&lt;/a&gt; on the message board for full details.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3&gt;Availability&lt;/h3&gt;
&lt;p&gt;Search Appliance Version 19 is available to all Search Appliance and Parametric Search Appliance customers who have a current maintenance plan by following the menu System / System Setup / Update Software.&lt;/p&gt;
&lt;p&gt;Webinator Version 19 is available to all Webinator customers who have a current maintenance plan by &lt;a href="/contact_us"&gt;contacting Thunderstone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For 36 years Thunderstone Software has been dedicated to providing the best search solutions possible to government, corporate, and non-profit customers, with exceptional service and devotion to finding the right solution for the customer.&lt;/p&gt;
&lt;p&gt;For further information call +1 216-820-2200, visit &lt;a href="http://www.thunderstone.com/"&gt;www.thunderstone.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
      <pubDate>Mon, 07 Aug 2017 00:00:00 -0400</pubDate>
      <a10:updated>2017-08-07T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1364</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-releases-version-16/</link>
      <category>Thunderstone</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <category>Main</category>
      <title>Thunderstone Releases Version 16</title>
      <description>&lt;p&gt;CLEVELAND, OH, February 1, 2016 — Improving the quality of search for intranet, public facing websites and aggregator sites -- Thunderstone Search Appliance and Webinator Version 16 is here.&lt;/p&gt;
&lt;h4&gt;New crawl features include&lt;/h4&gt;
&lt;dl&gt;
&lt;dt&gt;Additional authorization options&lt;/dt&gt;
&lt;dd&gt;If you have websites that require user authentication, the latest version of the Search Appliance and Webinator include support for:
&lt;ul&gt;
&lt;li&gt;Central Authentication Service (CAS): These form-based logins may now be crawled simply with a user name and password, without needing to create custom authentication rules.&lt;/li&gt;
&lt;li&gt;Negotiate Kerberos option authentication is now supported for crawls.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Simplified crawl configuration&lt;/dt&gt;
&lt;dd&gt;In addition to CAS, version 16 also adds support for:
&lt;ul&gt;
&lt;li&gt;Sitemaps allowing easier crawling of sites where URLs are not easily determined from a crawl.&lt;/li&gt;
&lt;li&gt;XML/XSL site support by applying stylesheets to sites that deliver content via XML and XSL instead of HTML; the searchable text is better identified.&lt;/li&gt;
&lt;li&gt;Proxy Auto-config (PAC) file support which makes it easier to index and crawl enterprises composed of different networks with varying proxy rules: the same config files used by browsers may now be used at crawl time.&lt;/li&gt;
&lt;li&gt;The Ajax crawlable URL scheme from Google is supported, allowing Ajax based dynamic sites that support it to be crawled and indexed more effectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Additional fixes and enhancements&lt;/dt&gt;
&lt;dd&gt;In addition improvements have been made to the administrative interface as well as some file-format processing.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h4&gt;Availability&lt;/h4&gt;
&lt;p&gt;Search Appliance Version 16 is available to all Search Appliance and Parametric Search Appliance customers who have a current maintenance plan by following the menu System / System Setup / Update Software.&lt;/p&gt;
&lt;p&gt;Webinator Version 16 is available to all Webinator customers who have a current maintenance plan by &lt;a href="https://www.thunderstone.com/texis/site/pages/Contact_Us.html"&gt;contacting Thunderstone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For 34 years Thunderstone Software has been dedicated to providing the best search solutions possible to government, corporate, and non-profit customers, with exceptional service and devotion to finding the right solution for the customer.&lt;/p&gt;
&lt;p&gt;For further information call +1 216-820-2200, visit &lt;a href="http://www.thunderstone.com/"&gt;www.thunderstone.com&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Mon, 01 Feb 2016 00:00:00 -0500</pubDate>
      <a10:updated>2016-02-01T00:00:00-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1360</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-releases-webinator-web-index-retrieval-system-version-13/</link>
      <category>Webinator</category>
      <category>Main</category>
      <title>Thunderstone Releases Webinator™ Web Index &amp; Retrieval System Version 13</title>
      <description>&lt;p&gt;CLEVELAND, OH — Making it even easier to integrate high quality search into your website -- Webinator Version 13 is here.&lt;/p&gt;
&lt;p&gt;New features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Query Autocomplete, guides your users to the search they want&lt;/li&gt;
&lt;li&gt;HTML Highlighting, lets users see the results in the original HTML for better contextual information&lt;/li&gt;
&lt;li&gt;Expanded XML/SOAP API allows integration of administrative interface&lt;/li&gt;
&lt;li&gt;And much more: see the &lt;a href="http://www.thunderstone.com/webinator"&gt;Webinator home page for full details&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For 34 years Thunderstone Software has been dedicated to providing the best search solutions possible to government, corporate, and non-profit customers, with exceptional service and devotion to finding the right solution for the customer.&lt;/p&gt;
&lt;p&gt;For further information call +1 216-820-2200, visit &lt;a href="http://www.thunderstone.com/"&gt;www.thunderstone.com&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Tue, 07 Apr 2015 00:00:00 -0400</pubDate>
      <a10:updated>2015-04-07T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1372</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-releases-webinator-web-index-retrieval-system-version-6-and-search-appliance-version-8-1/</link>
      <category>Webinator</category>
      <category>Main</category>
      <category>Search Appliance</category>
      <title>Thunderstone Releases Webinator™ Web Index &amp; Retrieval System Version 6 and Search Appliance Version 8 (1)</title>
      <description>&lt;p&gt;CLEVELAND, OH — Making it even easier to integrate high quality search into your website -- Webinator Version 6 and Search Appliance Version 8 are here.&lt;/p&gt;
&lt;p&gt;New features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More intuitive searches, including Unicode support and accent-insensitive queries:&lt;br /&gt;This improves non-English searches; e.g. "cœur" will also match "coeur", "resume" will match "resumé".&lt;/li&gt;
&lt;li&gt;Enhanced search results, including multiple snippets and styled highlighting.&lt;/li&gt;
&lt;li&gt;XML/SOAP API to ease integration of search into dynamic sites.&lt;/li&gt;
&lt;li&gt;Meta Search to allow searching multiple profiles together in one transaction.&lt;/li&gt;
&lt;li&gt;Results Authorization so unauthorized documents don't show up in search results.&lt;/li&gt;
&lt;li&gt;HTTP/1.1 support including gzip compression to reduce crawl times and bandwidth utilization:&lt;br /&gt;Reduces load on targeted servers, and potentially allows access to more content.&lt;/li&gt;
&lt;li&gt;And much more: see the &lt;a href="http://www.thunderstone.com/site/vortexman/webinator_changes.html"&gt;full list of new features here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For 30 years Thunderstone Software has been dedicated to providing the best search solutions possible to government, corporate, and non-profit customers, with exceptional service and devotion to finding the right solution for the customer.&lt;/p&gt;
&lt;p&gt;For further information call +1 216-820-2200, visit &lt;a href="http://www.thunderstone.com/"&gt;www.thunderstone.com&lt;/a&gt; or email &lt;img src="/site/images/infoemail.gif" alt="" /&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 22 Dec 2011 00:00:00 -0500</pubDate>
      <a10:updated>2011-12-22T00:00:00-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1419</guid>
      <link>https://www.thunderstone.com/blog/archive/may-2009-newsletter/</link>
      <category>Main</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <title>May 2009 Newsletter</title>
      <description>&lt;h3&gt;May 2009 - &lt;a href="http://www.thunderstone.com/texis/site/newsletter/archive.html"&gt;Archive&lt;/a&gt;&lt;/h3&gt;
&lt;h3&gt;CONTENTS&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200905.html#happenings"&gt;Happenings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200905.html#customerquote"&gt;Customer Quote of the Month&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200905.html#upcoming"&gt;Upcoming Events&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200905.html#tips"&gt;Tech Tips: Using Keep/Ignore Tags on Your Search Appliance or Webinator Software&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200905.html#unsub"&gt;Subscription/Unsubscription and Contacts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="happenings"&gt;&lt;/a&gt;HAPPENINGS&lt;/h3&gt;
&lt;p&gt;SUPERIOR TECH SUPPORT LEADS TO DEVELOPMENT OF THUNDERSTONE SEARCH APPLIANCE VERSION 7&lt;/p&gt;
&lt;p&gt;As a direct response to ongoing practical input from customers who regularly interact with our tech support engineers, Thunderstone Software will release Version 7 of the Thunderstone Search Appliance on June 8, 2009. We've added a number of desirable new features — which also apply to our Parametric Search Appliances and to Thunderstone's entire line of Appliance products. These performance enhancements include:&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Faster Category Searching&lt;br /&gt;We've provided a new setting to improve category search speed when categories are distinct/non-overlapping (i.e., when no URL belongs to more than one category.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced Federated Search (Thunderstone Meta Search)&lt;br /&gt;Search users can now select which back-end profiles to actually search, from a list configured by their Appliance administrator.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Character Match Mode&lt;br /&gt;Non-English/Unicode character support is greatly improved with this setting, which not only allows case-insensitive searching of foreign characters but also enables ignore-accents ("e" matches "é"), ligature expansion ("oe" matches "œ") and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Language Analysis Module&lt;br /&gt;An optional feature for Appliance owners who pay to have it activated, this setting improves CJK (Chinese/Japanese/Korean) searches with extra processing to put spaces between words so they can be found without wildcards when adjacent to others.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;MySQL in DB Walker&lt;br /&gt;We've added support for crawling MySQL databases to our DB Walker.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Updated "Look and Feel"&lt;br /&gt;As well as receiving some improvement from having its text slightly reorganized, the administrative interface also now employs Cascading Style Sheets (CSS) for more modern HTML usage.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;All Thunderstone customers with current Appliance maintenance agreements in place automatically qualify to receive the downloadable Version 7 software update. Please phone +1 216 820 2200 if you have any questions (business days, 10 a.m. - 6 p.m. Eastern Time.)&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="customerquote"&gt;&lt;/a&gt;CUSTOMER QUOTE OF THE MONTH&lt;/h3&gt;
&lt;p&gt; &lt;/p&gt;
&lt;blockquote&gt;“We use our Thunderstone Search Appliance 1000 not for web searching but for internal directory searches of call logs, packing lists, sales lists, invoices, load sheets, etc. People would formerly request needed documents, and we'd have to manually make photocopies and deliver them — which required lots of time and effort. Now that we have a Thunderstone Search Appliance, everybody can search and find what they want quickly and easily. It seems to work very well for what we need.”
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Michael Lee&lt;br /&gt;Director of I.T.&lt;br /&gt;Carry-On Trailer Corporation&lt;br /&gt;&lt;a href="http://www.carry-ontrailer.com/"&gt;http://www.carry-ontrailer.com&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="upcoming"&gt;&lt;/a&gt;UPCOMING EVENTS&lt;/h3&gt;
&lt;p&gt;Thunderstone's R &amp;amp; D team has several other exciting enterprise search development projects in the 2009 pipeline, as our staff continues to work on:&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;TEXIS Version 6&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Webinator Version 6&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;TEXIS Catalog — our newest eCommerce search engine for online catalogs&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Plus, we will continue adding even more features to the Search Appliance products this year. Look for details on all these scheduled releases in future issues of Thunderstone News.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="tips"&gt;&lt;/a&gt;TECH TIPS: USING KEEP/IGNORE TAGS ON YOUR SEARCH APPLIANCE OR WEBINATOR SOFTWARE&lt;/h3&gt;
&lt;p&gt;Just the Facts, Ma'am — Specifying the content with Keep/Ignore tags&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As websites grow larger and more complex, they contain more and more "cruft" that search engines stumble upon. Things that may be small or even hidden by javascript and CSS are front and center to search engines. Standard headers, menus and breadcrumbs are just some of the things that may be polluting your search's data.&lt;/p&gt;
&lt;p&gt;Keep Tags and Ignore Tags allow your page authors to indicate what should and shouldn't be used from a webpage. It allows you to trim the fat from your pages so the only thing that gets searched is the content, instead of all the other fluff.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Keep Tags specify a beginning and ending expression, and only the content between the beginning and end are kept. This is performed on the HTML source, so it's common to use comments as the begin/end tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Ignore Tags specify a beginning and ending expression, and the content between the beginning and end tags is DISCARDED. This is also performed on the HTML source, so HTML tags/comments are fair game.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Which to use? Both can accomplish the same goal. It's just a question of which will be logistically easier for you. If you have mostly content with just a little bit of extra info, Ignore Tags will probably be easier. If you have a small amount of content awash in a sea of cruft, then putting Keep Tags around the content may be easiest. Plus, you're not limited to using just one or the other. You can also use a combination of Keep Tags and Ignore Tags on your content as you see fit.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Feedback, suggestions and questions are welcome. Send your email to &lt;img src="https://www.thunderstone.com/site/images/editoremail.gif" alt="" /&gt;&lt;/p&gt;</description>
      <pubDate>Sun, 31 May 2009 00:00:00 -0400</pubDate>
      <a10:updated>2009-05-31T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1422</guid>
      <link>https://www.thunderstone.com/blog/archive/march-2009-newsletter/</link>
      <category>Main</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <title>March 2009 Newsletter</title>
      <description>&lt;h3&gt;March 2009 - &lt;a href="http://www.thunderstone.com/texis/site/newsletter/archive.html"&gt;Archive&lt;/a&gt;&lt;/h3&gt;
&lt;h3&gt;CONTENTS&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200903.html#quote"&gt;Customer Quote of the Month&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200903.html#tips"&gt;Tech Tips: Controlling Your Crawl — Exclude by Field&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200903.html#happenings"&gt;Happenings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200903.html#specialoffer"&gt;FREE Expo Floor Passes to AIIM 2009&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200903.html#unsub"&gt;Subscription/Unsubscription and Contacts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="quote"&gt;&lt;/a&gt;CUSTOMER QUOTE OF THE MONTH&lt;/h3&gt;
&lt;p&gt; &lt;/p&gt;
&lt;blockquote&gt;"We use the Thunderstone Search Appliance to crawl, index and search Word files, PDFs and other content in our law firm's internal document management system. The Appliance gives us a lot of customization options in the way it operates, with excellent control over precisely what we want to make searchable and what we don't want included. It does everything we need it to do. You can just plug it in and forget about it. It works great. After years of trouble-free performance, when we finally did have a hardware failure — Thunderstone had us quickly up and running again on the same day we received our replacement unit. Their level of customer support is almost unheard of in the I.T. industry."
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Michael E. Salopek&lt;br /&gt;I.T. Manager&lt;br /&gt;Janik, Dorman &amp;amp; Winter, L.L.P.&lt;br /&gt;&lt;a href="http://www.janiklaw.com/"&gt;http://www.janiklaw.com&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="tips"&gt;&lt;/a&gt;TECH TIPS: CONTROLLING YOUR CRAWL WITH WEBINATOR OR THUNDERSTONE SEARCH APPLIANCES — EXCLUDE BY FIELD&lt;/h3&gt;
&lt;p&gt;Last time we discussed exclusions and requirements for managing what pages your crawler gets, but there's one setting that gets a Tech Tips all to its own: Exclude by Field. It gives you extra power in how you're excluding and what exactly is being excluded.&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;"Metamorph query" matching
&lt;p&gt;Rather than a prefix or substring match, Exclude by Field uses a "Metamorph query", which is the full-text matching engine used for our normal searches. You can simply type in words to match, or if you begin with a slash (/) then it is treated as a REX expression (our RegEx-like pattern matching language; see the "REX" section in the Vortex documentation on our website for more details).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Multiple fields for exclusion
&lt;p&gt;All previously discussed exclusion &amp;amp; requirement options operate only on the URL itself. Exclude by Field allows you to exclude based on a number of different other areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HTML — Matches against the raw HTML of the page. Useful if there's something in an HTML comment that you'd like to base the match on.&lt;/li&gt;
&lt;li&gt;Text — The formatted text of the URL. This is the same text you'd see if you looked at the list/edit info of a page or at the "Match Info" in the search results. Useful if you what to match text but want to ignore any HTML markup that may or may not be present.&lt;/li&gt;
&lt;li&gt;All Meta — The contents of all available meta fields are put together and then matched against.&lt;/li&gt;
&lt;li&gt;Meta Field -&amp;gt; — Matches against the contents of a specific meta field, which you specify in the next column "From Meta Field".&lt;/li&gt;
&lt;li&gt;Keywords, Description, &amp;amp; Mime Type — Matches against the text of these common meta fields.&lt;/li&gt;
&lt;li&gt;URL — Matches against the URL, just like Exclusion REX. You may want to use this to get the extra Exclude options, listed below.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;What to exclude
&lt;p&gt;Beyond more power in specifying what to match, Exclude by Field also gives you more control with what to do when you get a match.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pages and Links — This acts like any other exclusion rule. The page and its links are kept completely out of the walk data.&lt;/li&gt;
&lt;li&gt;Pages only — The content of the page is not included in the walk, but the links from the page ARE followed.&lt;/li&gt;
&lt;li&gt;Links only — The page is included in the walk, but the links from the page are not followed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;A word on efficiency
&lt;p&gt;A disadvantage that Exclude by Field has when using any Field except URL is the page must be fully fetched before the rule can be applied.&lt;/p&gt;
&lt;p&gt;With all other exclusion rules (and Exclude by Field on URL), the URL can be thrown out before the page is fetched an processed.&lt;/p&gt;
&lt;p&gt;When performing Exclude by Field on the content of the page, though, the page must be downloaded and fully processed before we can know if it has HTML or a Body that matches the rules specified.&lt;/p&gt;
&lt;p&gt;When possible, it's better to use other exclusion rules or the URL target for Exclude by Field, as this will allow you to prune URLs before they are fetched. Still, there are many things that Exclude by Field can do that the other settings simply can't (as mentioned below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Example — Excluding directories from a file crawl
&lt;p&gt;A perfect example of Exclude by Field is directories when performing a file crawl — we can't fully exclude directories because they are what link to all the files, and without them we'd have nothing. Still, we might want them not to show up in the search. We can get this with Exclude by Field.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Metamorph Query "//=&amp;gt;&amp;gt;=" (without the quotes) — This is a REX expression for "match anything that ends in a slash". Please see the REX section of the Vortex documentation if you'd like more details on REX syntax.&lt;/li&gt;
&lt;li&gt;Field - URL&lt;/li&gt;
&lt;li&gt;Exclude - Pages only — This will keep the contents of the directory "pages" out of the crawl but will still follow the links to get the actual files and use them in the search.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;If you have any questions about how to use Exclude by Field, please feel free to contact Thunderstone Support — and we'll discuss it.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="happenings"&gt;&lt;/a&gt;HAPPENINGS&lt;/h3&gt;
&lt;p&gt;The February 2009 issue of CRN, a publication of Everything Channel and ChannelWeb.com, recognized the "top Channel Chiefs in the industry based upon their record of business innovation and dedication to the partner community." This annual list, which CRN calls "Our definitive guide to the movers and shakers of I.T. channel management," included Frederick A. Harmon (Thunderstone's Channel Director &amp;amp; CSO.)&lt;/p&gt;
&lt;p&gt;You can visit the CRN website (&lt;a href="http://www.crn.com/crn/chiefs/2009cc.jhtml?chief=136"&gt;http://www.crn.com/crn/chiefs/2009cc.jhtml?chief=136&lt;/a&gt;) to view pertinent information about Fred Harmon in the 2009 Channel Chiefs list.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;UPCOMING&lt;/p&gt;
&lt;p&gt;Thunderstone's John Turnbull (President and CEO) will present a workshop session entitled The Next Generation in Search: Today's Best Practices on Friday, April 17, 2009, (2:00 p.m. - 3:30 p.m.) during the DigitalNow 2009 Conference at Disney's Yacht and Beach Club Resorts in Lake Buena Vista, Florida.&lt;/p&gt;
&lt;p&gt;Session Description&lt;br /&gt;Search has progressed from a complex tool used by librarians through simple tools that let users perform a keyword search, to today's information access tools that can still provide users a simple interface but make use of much of an association's collective knowledge. In this workshop participants will learn what sorts of information can be behind a search engine and how to make it more valuable to users. The session includes a case study from IEEE, the world's largest technical membership association that significantly improved their business by focusing on their customers and helping them access content in new ways.&lt;/p&gt;
&lt;p&gt;DigitalNow (&lt;a href="http://www.fusionproductions.com/digitalnow/"&gt;http://www.fusionproductions.com/digitalnow/&lt;/a&gt;) is an annual conference that brings together senior-level executives and volunteer leaders from some of the most influential professional and trade associations in America. Produced by Fusion Productions and Disney Institute, two of the foremost authorities in adult educational design, with input from registered attendees and a conference advisory board, DigitalNow addresses the critical issues facing association leaders in the digital age.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="specialoffer"&gt;&lt;/a&gt;GET YOUR FREE FLOOR PASS (A $75 VALUE) TO THE MARCH 30 - APRIL 2, 2009 AIIM INTERNATIONAL EXPOSITION + CONFERENCE IN PHILADELPHIA&lt;/h3&gt;
&lt;p&gt;The AIIM International Exposition + Conference, the yearly gathering for information management professionals across industries and lines of business, will take place Monday, March 30, through Thursday, April 2, 2009, at the Pennsylvania Convention Center in Philadelphia, PA. With 19 tracks, more than 135 conference sessions featuring more than 100 real-world case studies, and an Expo floor showcasing 200+ information management technology solution providers, the event aims to provide attendees with actionable insight they can use.&lt;/p&gt;
&lt;p&gt;REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS&lt;br /&gt;and get access to all keynotes, general sessions,&lt;br /&gt;Expo floor education and the co-located ON DEMAND Expo!&lt;/p&gt;
&lt;p&gt;To receive your free pass, use Registration Code: 615M&lt;br /&gt;when you register at &lt;a href="http://www.aiimexpo.com/"&gt;WWW.AIIMEXPO.COM&lt;/a&gt;&lt;br /&gt;or call +1 888 824 3004.&lt;/p&gt;
&lt;p&gt;Your FREE pass comes to you compliments of Thunderstone Software. Please stop by and visit Fred Harmon (Channel Director &amp;amp; CSO) and Peter Thusat (Communication Director &amp;amp; CMO) at Booth 1045.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt; &lt;/p&gt;</description>
      <pubDate>Tue, 31 Mar 2009 00:00:00 -0400</pubDate>
      <a10:updated>2009-03-31T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1425</guid>
      <link>https://www.thunderstone.com/blog/archive/february-2009-newsletter/</link>
      <category>Main</category>
      <category>Webinator</category>
      <title>February 2009 Newsletter</title>
      <description>&lt;h3&gt;February 2009 - &lt;a href="http://www.thunderstone.com/texis/site/newsletter/archive.html"&gt;Archive&lt;/a&gt;&lt;/h3&gt;
&lt;h3&gt;CONTENTS&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200902.html#tips"&gt;Tech Tips: Controlling Your Crawl — Requirements &amp;amp; Exclusions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200902.html#happenings"&gt;Happenings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200902.html#specialoffer"&gt;FREE Expo Floor Passes to AIIM 2009&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200902.html#success"&gt;Customer Success Story: Using Webinator to search online collections of Eurasian and East European research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200902.html#unsub"&gt;Subscription/Unsubscription and Contacts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;&lt;hr /&gt;
&lt;h3&gt;&lt;a name="tips"&gt;&lt;/a&gt;TECH TIPS: CONTROLLING YOUR CRAWL WITH WEBINATOR OR THUNDERSTONE SEARCH APPLIANCES — REQUIREMENTS &amp;amp; EXCLUSIONS&lt;/h3&gt;
&lt;p&gt;The crawler provides many ways of controlling what you do and don't crawl. Note that URLs manually specified by you (Base URLs, URL URLs, Single Pages, etc.) are exempt from all inclusion/exclusion rules — they will always be used.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Exclusions
&lt;p&gt;The shotgun approach — any URLs that contain any of the text listed in an exclusion line &lt;u&gt;anywhere in the URL&lt;/u&gt; will not be included in the walk. It doesn't need to be a full path or filename, sub-matches are okay.&lt;/p&gt;
&lt;p&gt;If you specify "archive" as an exclusion, then "http://www.example.com/archive/index.htm" will be excluded and "http://www.example.com/site/newsarchivefrom2004" will also be excluded.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Exclusion prefix
&lt;p&gt;Like Exclusion, except it has to be the same starting from the beginning. This gives a bit more control over what exactly matches.&lt;/p&gt;
&lt;p&gt;If you specify "http://www.example.com/archive" as an exclusion prefix, then "http://www.example.com/archive/index.htm" will be excluded and "http://www.example.com/archivePages..htm" will be excluded, but "http://www.example.com/site/newsarchivefrom2004" will be allowed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Required prefix
&lt;p&gt;The opposite of Exclusion prefix — instead of rejecting URLs that DO match the prefix, it rejects URLs that DON'T match the expression. Both settings are used for weeding out URLs, it just swaps which are used and which aren't. Multiple Required prefixes can be specified, and URLs are allowed if they match at least one.&lt;/p&gt;
&lt;p&gt;If you specify "http://www.example.com/archive" as an required prefix, then "http://www.example.com/archive/index.htm" will be used and "http://www.example.com/archivePages..htm" will be used, but "http://www.example.com/site/newsarchivefrom2004" will be excluded.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Exclusion REX &amp;amp; Required REX
&lt;p&gt;Similar ideas to Exclusion prefix &amp;amp; Required prefix, except you use our powerful REX pattern matcher to specify what should match instead of just a prefix. It's similar to regular expressions but much faster. Please see the the REX pages in the Vortex manual on our website (&lt;a href="http://www.thunderstone.com/site/vortexman/rex_split.html"&gt;http://www.thunderstone.com/site/vortexman/rex_split.html&lt;/a&gt;) for more details on the exact syntax.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you specify both requirements and exclusions, then URLs must satisfy both to be used — they must not match any Exclusion Prefix, AND they must match at least one Required Prefix (if specified).&lt;/p&gt;
&lt;p&gt;There's an even more powerful way to exclude pages with Exclude by Field, but that's for another Tech Tips article. (Watch for it in next month's newsletter.)&lt;/p&gt;
&lt;p&gt;If you have questions about how any of these operate, feel free contact Thunderstone Support.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="happenings"&gt;&lt;/a&gt;HAPPENINGS&lt;/h3&gt;
&lt;p&gt;Steve Kolowich, a reporter for THE CHRONICLE OF HIGHER EDUCATION, noted what he referred to as Thunderstone's determined efforts at "Out-Googling Google" in his article entitled In Search of a Better Search Engine (&lt;a href="http://chronicle.com/free/v55/i24/24a01501.htm"&gt;http://chronicle.com/free/v55/i24/24a01501.htm&lt;/a&gt;) for the February 20, 2009 issue of The Chronicle. He wrote, in part:&lt;/p&gt;
&lt;blockquote&gt;The Virginia Bioinformatics Institute at Virginia Tech, facing a thickening swamp of digital documents, opted for Thunderstone's search appliance, which starts at $13,000, about six months ago. The institute uses the device to index reams of unpublished data and notes stored on its intranet. James E. Stoll, who leads Internet projects at the institute, said the appliance allowed research collaborators and other authorized users to retrieve items from across the institute's network of repositories without exposing those documents to the public Web, as basic site-search software would require. Researchers "don't want to be scooped," Mr. Stoll said. "This is their livelihood."&lt;/blockquote&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;UPCOMING&lt;/p&gt;
&lt;p&gt;Thunderstone's Fred Harmon (Channel Director and CSO) and Peter Thusat (Communication Director and CMO) will participate as exhibitors (Thunderstone Software Booth: 1045) during the AIIM International Exposition and Conference March 30 - April 2, 2009 at the Pennsylvania Convention Center in Philadelphia, PA.&lt;/p&gt;
&lt;p&gt;Conference: March 30 - April 2, 2009&lt;br /&gt;Exhibits: March 31 - April 2, 2009&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="specialoffer"&gt;&lt;/a&gt;GET YOUR FREE FLOOR PASS (A $75 VALUE) TO THE MARCH 31 - APRIL 2, 2009 AIIM INTERNATIONAL EXPOSITION IN PHILADELPHIA&lt;/h3&gt;
&lt;p&gt;REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS&lt;br /&gt;and get access to all keynotes, general sessions,&lt;br /&gt;Expo floor education and the ON DEMAND Expo!&lt;/p&gt;
&lt;p&gt;To receive your free pass, use Registration Code: 615M&lt;br /&gt;when you register at &lt;a href="http://www.aiimexpo.com/"&gt;WWW.AIIMEXPO.COM&lt;/a&gt;&lt;br /&gt;or call +1 888 824 3004.&lt;/p&gt;
&lt;p&gt;Your FREE pass comes to you compliments of Thunderstone&lt;br /&gt;Software. Please stop by and visit us at Booth 1045.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="success"&gt;&lt;/a&gt;CUSTOMER SUCCESS STORY: USING WEBINATOR TO SEARCH ONLINE COLLECTIONS OF EURASIAN AND EAST EUROPEAN RESEARCH&lt;/h3&gt;
&lt;p&gt;The Center for Russian &amp;amp; East European Studies, a sub-unit of the larger University Center for International Studies (UCIS) at the University of Pittsburgh, won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library — a collection that includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. Thunderstone's Webinator indexing and retrieval software enabled the responsible Informatics team to accomplish this goal in an efficient and affordable manner.&lt;/p&gt;
&lt;p&gt;The University Center for International Studies (UCIS) provides the organizational framework that supports the University of Pittsburgh's mission to integrate and reinforce all its strands of international scholarship in research, teaching and public service. UCIS includes — in addition to many other highly-acclaimed programs and component units — a Center for Russian &amp;amp; East European Studies, an Asian Studies Center, a Center for Latin American Studies, a European Studies Center, an International Business Center (jointly sponsored with the Katz School of Business) and a European Union Center of Excellence (funded by the European Union.)&lt;/p&gt;
&lt;p&gt;As a thin layer on top of the whole UCIS structure, Central Administration handles all business-related core functions and technology issues. When individuals in any of the sub-units need advice or consulting related to I.T. Services, Knowledge Management, database planning, upgrading of their websites or anything else that would fall into technology-mediated information, they call upon Mark J. Weixel, Director of Informatics at UCIS.&lt;/p&gt;
&lt;h4&gt;Discovering Webinator and Getting Started With Using It as an Easily Customizable Development Tool&lt;/h4&gt;
&lt;p&gt;Weixel recalled, "Back in I guess it was '98, I found out about Webinator from a friend of mine who was at Princeton at the time. We had a particular niche here in International Studies, and we wanted to create mini search engines for web content that was specific to certain world regions. We were hoping to create search engines like AltaVista, since Google wasn't even around then, that would allow people to do full-text searching of those websites. But, because we were vetting the list of sites, we thought we could increase the probability that searchers would come across something really relevant to the part of the world we were focusing on.&lt;/p&gt;
&lt;p&gt;"We used Webinator to index and search collections of websites that were in and dealt with Russia and Eastern Europe.&lt;/p&gt;
&lt;p&gt;"So, that was my original introduction to Webinator. We bought the entry-level product to begin with, and we currently have the Enterprise version. What I really like about it, still, is the fact that it's relatively easy to configure. It's much easier to configure that it was back when we bought the original product, when everything was run through command lines. I like the notion of relevance in terms of returned hits. It seems to make a lot more sense to me than, for example, Google page ranking — which places a much higher priority on popularity than it does on the actual content of the pages where text matches.&lt;/p&gt;
&lt;p&gt;"Another thing that has been nice is the fact there is support for synonym matching within the server. And I think Vortex as a scripting language is very powerful. Even though I haven't used it to its fullest ability, it's proven to be quite flexible when we've needed to make modifications." &lt;a href="http://www.thunderstone.com/texis/site/casestudy/ucis.html"&gt;Read More...&lt;/a&gt; &lt;br /&gt;&lt;a href="http://www.thunderstone.com/site/cases/ucis-webinator-casestudy.pdf"&gt;Download the 3-page UCIS case study PDF here.&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Feedback, suggestions and questions are welcome. Send your email to &lt;img src="https://www.thunderstone.com/site/images/editoremail.gif" alt="" /&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Feb 2009 09:38:44 -0500</pubDate>
      <a10:updated>2009-02-28T09:38:44-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1378</guid>
      <link>https://www.thunderstone.com/blog/archive/customer-success-story-using-webinator-to-search-online-collections-of-eurasian-and-east-european-research/</link>
      <category>Webinator</category>
      <category>Case Study</category>
      <category>Main</category>
      <title>Customer Success Story: Using Webinator To Search Online Collections Of Eurasian And East European Research</title>
      <description>&lt;p&gt;The Center for Russian &amp;amp; East European Studies, a sub-unit of the larger University Center for International Studies (UCIS) at the University of Pittsburgh, won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library — a collection that includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. &lt;a href="/products-for-search/webinator/"&gt;Thunderstone's Webinator&lt;/a&gt; indexing and retrieval software enabled the responsible Informatics team to accomplish this goal in an efficient and affordable manner.&lt;/p&gt;
&lt;p&gt;The University Center for International Studies (UCIS) provides the organizational framework that supports the University of Pittsburgh's mission to integrate and reinforce all its strands of international scholarship in research, teaching and public service. UCIS includes — in addition to many other highly-acclaimed programs and component units — a Center for Russian &amp;amp; East European Studies, an Asian Studies Center, a Center for Latin American Studies, a European Studies Center, an International Business Center (jointly sponsored with the Katz School of Business) and a European Union Center of Excellence (funded by the European Union.)&lt;/p&gt;
&lt;p&gt;As a thin layer on top of the whole UCIS structure, Central Administration handles all business-related core functions and technology issues. When individuals in any of the sub-units need advice or consulting related to I.T. Services, Knowledge Management, database planning, upgrading of their websites or anything else that would fall into technology-mediated information, they call upon Mark J. Weixel, Director of Informatics at UCIS. &lt;/p&gt;
&lt;h4&gt;Discovering Webinator and Getting Started With Using It as an Easily Customizable Development Tool&lt;/h4&gt;
&lt;p&gt;Weixel recalled, "Back in I guess it was '98, I found out about Webinator from a friend of mine who was at Princeton at the time. We had a particular niche here in International Studies, and we wanted to create mini search engines for web content that was specific to certain world regions. We were hoping to create search engines like AltaVista, since Google wasn't even around then, that would allow people to do full-text searching of those websites. But, because we were vetting the list of sites, we thought we could increase the probability that searchers would come across something really relevant to the part of the world we were focusing on.&lt;/p&gt;
&lt;p&gt;"We used Webinator to index and search collections of websites that were in and dealt with Russia and Eastern Europe.&lt;/p&gt;
&lt;p&gt;"So, that was my original introduction to Webinator. We bought the entry-level product to begin with, and we currently have the Enterprise version. What I really like about it, still, is the fact that it's relatively easy to configure. It's much easier to configure that it was back when we bought the original product, when everything was run through command lines. I like the notion of relevance in terms of returned hits. It seems to make a lot more sense to me than, for example, Google page ranking — which places a much higher priority on popularity than it does on the actual content of the pages where text matches.&lt;/p&gt;
&lt;p&gt;"Another thing that has been nice is the fact there is support for synonym matching within the server. And I think Vortex as a scripting language is very powerful. Even though I haven't used it to its fullest ability, it's proven to be quite flexible when we've needed to make modifications."&lt;/p&gt;
&lt;h4&gt;Implementing a Sophisticated Indexing and Retrieval Package with an Attractive ROI Track Record&lt;/h4&gt;
&lt;p&gt;Did they look at any competing products? According to Weixel, no, they didn't — for a couple of reasons. One, they're a small shop and they have to ask, "How much is this going to cost?" And, he said, the ROI for a one-time investment in a perpetual Webinator license was always pretty clear. It was a known quantity to them. Plus, Weixel strongly believed, as the person in charge of actually setting up and administering it, Webinator provided an affordable and high-quality solution for his specific application requirements. The business manager trusted Weixel's judgment, and by all accounts Webinator has delivered excellent results.&lt;/p&gt;
&lt;p&gt;As to future expansion beyond the Center for Russian &amp;amp; East European Studies, discussions have begun with several of the other sub-units within UCIS. The Center for Latin American Studies and the European Studies Center also seem interested in putting more and more of their materials online — newsletters, conference reports, etc.&lt;/p&gt;
&lt;p&gt;Webinator offers UCIS sub-units the possibility of acquiring a well-proven search engine that they could customize as desired and manage on their own.&lt;/p&gt;
&lt;h4&gt;Digitizing, Capturing and Making Searchable the Publications that Comprise the Vladimir I. Toumanoff Virtual Library&lt;/h4&gt;
&lt;p&gt;Weixel said their Webinator-powered search implementation getting the heaviest use right now is a project that the University of Pittsburgh's Center for Russian and East European Studies (REES) has done in conjunction with The National Council for Eurasian &amp;amp; East European Research (NCEEER, frequently pronounced 'Nickser') — a federally funded organization charged with supporting research, typically in social sciences, focusing on the former Soviet Union and Eastern Europe.&lt;/p&gt;
&lt;p&gt;REES won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library comprised of research reports and working papers submitted to NCEEER by scholars under their grants over the last two decades. This collection includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. NCEEER continues adding to the collection as its funded researchers prepare new papers.&lt;/p&gt;
&lt;p&gt;"We proposed scanning and digitizing more than 20 years' worth of reports and then taking it and essentially pointing Webinator at it and, using the documents plug-in, doing a full-text index of the entire corpus. And I think one of the reasons that we won the competition is because, once we had done the really hard work of creating PDFs out of all the printed documents — we were going to be able to put it in once place and, overnight, have a full-text search index. It's my understanding that that was not a component of the other proposals," said Weixel.&lt;/p&gt;
&lt;p&gt;He continued, "We successfully contended for that particular project, got it, spent the better part of nine months digitizing the materials and, I kid you not, it took, I think, less than 24 hours, and we had a fully searchable index of the entire corpus of research products. And it worked out well. We have this nice, targeted archive of material. We've got it set to re-index on a regular schedule, so anytime NCEEER gets a new batch of project reports — they upload them, they get caught in the next cycle of indexing, and it makes us very happy.&lt;/p&gt;
&lt;p&gt;"The search interface for the archive materials of NCEEER is available through the Vladimir I. Toumanoff Virtual Library at the website of &lt;a href="http://www.nceeer.org/toumanoff.php."&gt;The National Council for Eurasian &amp;amp; East European Research&lt;/a&gt;. You kick off the search there, and then you're transported to Pittsburgh for the actual results set.&lt;/p&gt;
&lt;p&gt;"Recently we put the server housing Webinator behind the firewall as part of our new increased security policy at the University of Pittsburgh. The fact that the folks at Thunderstone — John, in particular, in the Support Group — were able to work with me in coming up with a way to take a search query and pipe it through a back door into Webinator and then take the result set and present that to users in an accessible front-end, was just fantastic. It took me about two weeks once I had access to the beta version of the code, and that worked out really well. It was satisfying for me on a number of levels, not just because the product did what it was supposed to, but because I had support from people who could actually help me efficiently accomplish what I needed to do. That worked out very, very well."&lt;/p&gt;
&lt;p&gt;Weixel added, "Our audience is interesting. Of course, we're housed within a major research university. So, we do have a number of our projects where we're trying to target our students and our faculty. But the area studies centers, these sub-units underneath the University Center for International Studies, most of them have federal funding that mandates what they call 'outreach' — trying to bring the message of international studies to a larger community, whether it's a local business community or whether it's local educators at the Kindergarten through high school level. Most of them probably have some kind of academic interest in one of the regions of focus. However you look at it, it's a pretty large and diverse audience.&lt;/p&gt;
&lt;p&gt;"Being in an international studies environment, one thing that is important to us is foreign language support. I will admit to not having tried this yet with any of the CJK languages. But, in terms of the European and Cyrillic-based languages that we've indexed, Webinator has been a really good performer. And we've been quite happy with that."&lt;/p&gt;
&lt;p&gt;For more information about UCIS or any of its area studies centers, you may contact UCIS by mail or email at:&lt;/p&gt;
&lt;p&gt;University of Pittsburgh&lt;br /&gt;University Center for International Studies&lt;br /&gt;4400 Wesley W. Posvar Hall&lt;br /&gt;Pittsburgh, PA 15260&lt;br /&gt;&lt;img src="/site/images/ucisemail.gif" alt="" /&gt;&lt;/p&gt;</description>
      <pubDate>Tue, 24 Feb 2009 00:00:00 -0500</pubDate>
      <a10:updated>2009-02-24T00:00:00-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1426</guid>
      <link>https://www.thunderstone.com/blog/archive/january-2009-newsletter/</link>
      <category>Main</category>
      <category>Webinator</category>
      <category>Search Appliance</category>
      <title>January 2009 Newsletter</title>
      <description>&lt;h3&gt;January 2009 - &lt;a href="http://www.thunderstone.com/texis/site/newsletter/archive.html"&gt;Archive&lt;/a&gt;&lt;/h3&gt;
&lt;h3&gt;CONTENTS&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200901.html#specialoffer"&gt;Improving Efficiency in the New Year&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200901.html#tips"&gt;Tech Tips: The Many Ways of Specifying URLs for the Crawl with Webinator or Your Search Appliance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200901.html#quote"&gt;Quote of the Month&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200901.html#upcoming"&gt;More Customer Success Stories Coming This Year&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thunderstone.com/texis/site/newsletter/n200901.html#unsub"&gt;Subscription/Unsubscription and Contacts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="specialoffer"&gt;&lt;/a&gt;IMPROVING EFFICIENCY IN THE NEW YEAR&lt;/h3&gt;
&lt;p&gt;Was one of you New Year's resolutions to improve the efficiency of your I.T. infrastructure? Our expert engineering staff is available to clients with a current maintenance contract, and for the next month you can schedule a free 15-minute consultation on a first come first served basis to discuss your architecture and get immediate advice on improving your solution. If you need more time, that can also be arranged. There are a limited number of time slots available. So, make sure to call today at +1 216 820 2200.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="tips"&gt;&lt;/a&gt;TECH TIPS: THE MANY WAYS OF SPECIFYING URLS FOR THE CRAWL WITH WEBINATOR OR YOUR SEARCH APPLIANCE&lt;/h3&gt;
&lt;p&gt;There are a number of ways to specify what URLs you'd like the software to crawl, and which will be easiest to use can depend on your situation.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Base URL&lt;br /&gt;The old standby -- URLs listed in the Base URL will be crawled, and the entirety of all pages they link to will be included. If you only have one or two sites and start from the top, this is definitely the way to go.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;URL URL&lt;br /&gt;Sometimes you may have dozens or hundreds of base URLs, maybe for doing many different folders on a site (but not all of them). If putting them all in a text box is starting to feel unwieldy, you can use the URL URL instead.
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Create a text file somewhere on your website that contains the URLs you want to use as Base URLs, each on its own line. Then you can specify the URL to _that_ in the URL URL setting. The URL URL is fetched by the crawler, and every URL is treated as a Base URL. This can make it easier to manage a frequently-changing list of URLs.&lt;/p&gt;
&lt;p&gt;This is the only benefit of URL URL. The pages will still be crawled EXACTLY as if they were all listed as Base URLs. It exists only to make the list easier to manage for you.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Single Page&lt;br /&gt;Sometimes you have a single page, or a handful of pages, where you just want that page crawled, but none of its links. This is exactly what Single Page is for.
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;The URLs listed in Single Page are fetched, and their links are ignored.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Page URL&lt;br /&gt;Just as "URL URL" is a list of URLs for "Base URL", "Page URL" is a list of URLs for "Single Page". URLs listed here should point to a plain text file on your server, each URL on its own line. Every one of those pages is fetched, and their links are completely ignored.
&lt;p&gt; &lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="quote"&gt;&lt;/a&gt;QUOTE OF THE MONTH&lt;/h3&gt;
&lt;p&gt;"I should take a moment to let you know how much we appreciate the Webinator product. For us, it's very fast, easy to configure and meets all our needs. Thanks for such a great product!"&lt;/p&gt;
&lt;p&gt;David Arbuthnot&lt;br /&gt;VP IT&lt;br /&gt;MS Society of Canada&lt;br /&gt;&lt;a href="http://www.mssociety.ca/"&gt;http://www.mssociety.ca&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name="upcoming"&gt;&lt;/a&gt;MORE CUSTOMER SUCCESS STORIES COMING THIS YEAR&lt;/h3&gt;
&lt;p&gt;In the coming months you'll find a number of interesting new case studies at the &lt;a href="http://www.thunderstone.com/texis/site/pages/Exposition.html"&gt;Thunderstone.com&lt;/a&gt; website. As usual, we'll also feature links to them them here in this newsletter. Keep an eye out for them. You won't want to miss these case studies of TEXIS, Webinator and Thunderstone Search Appliances "in action".&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Feedback, suggestions and questions are welcome. Send your email to &lt;a href="mailto:editor@thunderstone.com" target="_top"&gt;editor@thunderstone.com.&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 31 Jan 2009 00:00:00 -0500</pubDate>
      <a10:updated>2009-01-31T00:00:00-05:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1381</guid>
      <link>https://www.thunderstone.com/blog/archive/webinator-as-a-customizable-way-to-add-vertical-search-engines-to-multiple-industry-web-portals/</link>
      <category>Webinator</category>
      <category>Main</category>
      <category>Thunderstone</category>
      <title>Webinator as a customizable way to add vertical search engines to multiple industry web portals</title>
      <description>&lt;p&gt;When implementing an optimal solution for the heavy search demands of multiple online properties, a website administrator needs a practical way to easily create and provide a high quality retrieval interface to collections of HTML documents. In this article we review how Trade Press Publishing successfully added powerful and flexible vertical search engines to its popular web portals with the help of Thunderstone's Webinator web index and retrieval system. Webinator serves as an example of the type of applications that can be built around Thunderstone's Texis RDBMS and Web Script.&lt;/p&gt;
&lt;p&gt;Trade Press Publishing Corporation &lt;a href="http://www.tradepress.com/"&gt;http://www.tradepress.com&lt;/a&gt;) is a privately-held company based in Milwaukee and a leading provider of market intelligence to the facilities management, building service contractor, housekeeping, cleaning supplies distribution and railroad industries. In addition to publishing business-to-business magazines and eNewsletters, it also produces trade shows and conferences, as well as offering variety of related educational and marketing opportunities.&lt;/p&gt;
&lt;p&gt;Jesus Carrillo, Director of Information Technology, joined the company's Pre-Press Division more than 16 years ago. According to him, “I started in the Desktop Publishing Department at an entry - level position that was my first job out of college. And I've been at the same place ever since. The company grew. About ten years ago they dissolved the pre-press part of the business to focus on educational media products and business-to-business publishing. They wanted someone to lead their technology efforts, and they asked me to do that. So, I stayed around and have continued to search out technology applications in the b-to-b publishing space.”&lt;/p&gt;
&lt;p&gt;Special Requirements to Index and Search Industry-Focused Web Content&lt;/p&gt;
&lt;p&gt;Trade Press Publishing Corporation uses Webinator on four “vertical portal” web sites, including two in the facility management space and two in the sanitary distribution/cleaning space. The main site is at FaciltyZone.com (&lt;a href="http://www.facilityzone.com/"&gt;http://www.facilityzone.com&lt;/a&gt;.) Carrillo said the biggest reason he selected Webinator as the indexing and searching tool for Trade Press had to do primarily with Webinator's open-ended customizability.&lt;/p&gt;
&lt;p&gt;According to Carrillo, “Probably the single identifying characteristic of the Webinator software, for us, was the ability to get to the source code. And that allowed us the flexibility to put it to work to do the things that we wanted to accomplish with its back-end. For example, we were indexing over six thousand web sites, which is quite a bit of data. And the first results that came up were kind of cool. We could see how, out of these millions of pages, you do a search, and there's some logic in there that says 'these are the ten best ones' out of the millions of pages I've got.&lt;/p&gt;
&lt;p&gt;“Taking a closer look at them, we felt that to really bring it to a marketplace and have it be as meaningful as possible to our end users -- we needed to go in and play with a lot of the settings to get the search engine to produce the particular kind of results we believed that our users would typically want to find. The straight-out-of-the-box algorithm for searching didn't have an immediate correlation to precisely what we thought our users would be looking for.&lt;/p&gt;
&lt;p&gt;“We spent some pretty significant time working with Thunderstone's tech support, doing tests and evaluations and changes and modifications, trial and error, to get things to the point where now it seems on a regular basis the terms we're punching in are getting the types of results that we know will make our users happy,” Carrillo explained.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Webinator's user features include:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;simple navigation&lt;/li&gt;
&lt;li&gt;intelligent query capabilities with natural language processes, special pattern matchers (regular expressions, numeric quantities, fuzzy patterns,) document similarity searches, in-context result listings, link reference reports, proximity controls and set logic&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Carrillo continued,” The setup and deployment of Webinator is extremely easy and straightforward. All the core functionality is there plus the ability to access the source code and be as creative and as customized as your capabilities will let you be. In other words, Thunderstone doesn't hold you back. Thunderstone lets you take the product to whatever level you're ready, willing and able to take it. For that reason we've stuck with it, we've used it, and it's been great in that regard. That's not something you're going to get from the Googles of the world.”&lt;/p&gt;
&lt;p&gt;‘Locked Box’ Approach of Others Inadequate&lt;/p&gt;
&lt;p&gt;“We took a look at the Google appliance. It was brand new at the time. And the reason we didn't go with the Google appliance was we had no control over it. No flexibility. No ability to customize. Basically it was a ‘locked box’ sitting in our office, you know? And that's really not the way we wanted to go about it. We've got technical expertise on staff. We can go in. We can study and learn the scripts. We can make our modifications.&lt;/p&gt;
&lt;p&gt;“For instance, when you execute a search on faciltyzone.com -- it executes a search first off of a SQL Server database that we've got on our end. Then it goes and executes it against the Webinator database and combines the two sets of results. So, we've got results that are built into a page that kind of fall on top of the results that come out of the search engine. There's no way that you're going to be able to do that with the Google Search Appliance. You just won't be able to.&lt;/p&gt;
&lt;p&gt;“The access to the source code and the flexibility of Webinator were definitely both something of value to us. Basically, we could not have done what we did without it. Working with the tech support team at Thunderstone, we have access to people who will actually call you back and work with you on some crazy questions.”&lt;/p&gt;
&lt;p&gt;“We're hoping that Thunderstone will continue to be a leader and help pave the way for how search technology is going to evolve. We'd like to take advantage of the new applications that Thunderstone develops and apply them to our industries and to our users,” Carrillo said.&lt;/p&gt;
&lt;p&gt;Web portal administrators looking for a web walking and indexing package to help them add vertical search engines to multiple online properties will appreciate the fact that Thunderstone's Webinator:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;indexes multiple sites into one common index&lt;/li&gt;
&lt;li&gt;offers administrators detailed verification and logging of document linkages&lt;/li&gt;
&lt;li&gt;can index/update documents while the database is in use&lt;/li&gt;
&lt;li&gt;permits multiple databases at a site&lt;/li&gt;
&lt;li&gt;features a simple browser interface&lt;/li&gt;
&lt;li&gt;is written in Texis Web Script for complete flexibility&lt;/li&gt;
&lt;li&gt;provides an SQL query interface to the database for maintenance and reports&lt;/li&gt;
&lt;li&gt;allows remote sites to be copied to the local file system&lt;/li&gt;
&lt;li&gt;lets multiple index engines run concurrently against a common database&lt;/li&gt;
&lt;/ul&gt;</description>
      <pubDate>Wed, 03 Oct 2007 00:00:00 -0400</pubDate>
      <a10:updated>2007-10-03T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1448</guid>
      <link>https://www.thunderstone.com/blog/archive/change-in-daylight-saving-rules/</link>
      <category>Main</category>
      <category>Texis</category>
      <category>Webinator</category>
      <title>Change In Daylight Saving Rules</title>
      <description>&lt;p&gt;&lt;u&gt;Are Thunderstone Software's products affected?&lt;/u&gt;&lt;/p&gt;
&lt;p&gt;Texis and Webinator do not require any patches to accommodate the upcoming changes to daylight saving rules as they store dates in UTC, and use the configured timezone to output and import dates. Daylight saving tracking is a feature provided by the operating system and Texis will respect the OS configuration. However, your OS may need patching if it's not already properly configured to handle the new rules.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Thunderstone will be issuing patches for the Search Appliance on March 1 (possibly earlier). Use your appliance's "Maintenance-&amp;gt;Check for updates" feature if you haven't configured it for automatic updates. Select the package called "timezone-1.0.0".&lt;/p&gt;
&lt;p&gt;&lt;u&gt;Are the dates in my database correct?&lt;/u&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If you have used, or are using Texis on an unpatched OS, and you import or convert string dates whose string values lie between the old and new DST changeover dates (e.g. Sun Mar 11 02:00:00 2007 and Sun Apr 01 02:00:00 2007 local time for the US), for &lt;em&gt;any&lt;/em&gt; year after 2006, then the imported value will be based on the prior rules, and will be output differently after the patch because the OS parsed it wrong. You will need to re-import/convert those dates after patching your OS.&lt;/p&gt;
&lt;p&gt;For example the string "2007-03-20 16:30:00" (4:30pm on March 20, 2007) imported to a date field on an unpatched operating system configured for a US timezone will print as "2007-03-20 17:30:00" (5:30pm on March 20, 2007) after patching.&lt;/p&gt;
&lt;p&gt;&lt;u&gt;Updating your operating system&lt;/u&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Unix and Linux&lt;/p&gt;
&lt;p&gt;Your best bet would be to check with your OS distribution vendor for timezone or tzdata updates. Here are links to a few popular ones.&lt;br /&gt;For Solaris see Sun Alert ID 102775.&lt;br /&gt;For RedHat see &lt;a href="http://kbase.redhat.com/faq/FAQ_80_7909.shtm"&gt;knowledge base article 7909&lt;/a&gt;.&lt;br /&gt;For SUSE see &lt;a href="http://www.novell.com/support/dynamickc.do?externalId=3853518&amp;amp;sliceId=SAL_Public&amp;amp;command=show&amp;amp;forward=nonthreadedKC&amp;amp;kcId=3853518"&gt;document 3853518&lt;/a&gt;.&lt;br /&gt;For doit-your-selfers check out tzdata.tar.gz at &lt;a href="ftp://elsie.nci.nih.gov/pub/"&gt;ftp://elsie.nci.nih.gov/pub/&lt;/a&gt; for new timezone data. Be sure to link or copy your timezone file to /etc/localtime or whatever's appropriate for your system.&lt;/p&gt;
&lt;p&gt;Microsoft Windows&lt;/p&gt;
&lt;p&gt;Visit &lt;a href="http://www.windowsupdate.com/"&gt;windowsupdate.com&lt;/a&gt; to download Update KB928388 in the Optional category. Note that this fix is NOT included in automatic updates. For full details from Microsoft visit &lt;a href="http://www.microsoft.com/windows/timezone/dst2007.mspx"&gt;http://www.microsoft.com/windows/timezone/dst2007.mspx&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;u&gt;Testing compliance for Texis and Webinator&lt;/u&gt;&lt;/p&gt;
&lt;p&gt;Note: The following tests assume a US timezone that used and will use the conventional DST rules. Some localities and other countries follow different rules.&lt;br /&gt;The lines here may wrap to fit the page. Enter each command on one long line.&lt;/p&gt;
&lt;p&gt;Open a shell or msdos window and cd to the Texis install directory. Then run (for Windows use "texis"&lt;sup&gt;1&lt;/sup&gt; instead of "bin/texis" in the examples below)&lt;/p&gt;
&lt;p&gt;bin/texis -h -d texis/testdb -s "select convert('2007-03-11 03:01:00','date')-convert('2007-03-11 01:59:00','date')"&lt;/p&gt;
&lt;p&gt;This test compares one minute before and after the new transition time.&lt;br /&gt;Unpatched you should get "3720". Patched you should get "120".&lt;/p&gt;
&lt;p&gt;Then run&lt;/p&gt;
&lt;p&gt;bin/texis -h -d texis/testdb -s "select convert('2007-04-01 03:01:00','date')-convert('2007-04-01 01:59:00','date')"&lt;/p&gt;
&lt;p&gt;This test compares one minute before and after the old transition time.&lt;br /&gt;Unpatched you should get "120". Patched you should get "3720".&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt;Particularly old installations of Texis may not havetexis.exe in the installation directory but only in the webserver's CGI directory. In that case use the full path to texis.exe to run it.&lt;/p&gt;
&lt;p&gt;&lt;u&gt;Confirming updates on the Search Appliance&lt;/u&gt;&lt;/p&gt;
&lt;p&gt;After installing the timezone-1.0.0 package via "Check For Updates" confirm the installation by going to "Maintenance-&amp;gt;Manage Logs" and clicking on "messages". You should see something similar to the following, but with your machine name and timezone (note that the lines are in reverse chronological order).&lt;/p&gt;
&lt;pre&gt;Feb 27 14:12:58 host logger: timezone finished
Feb 27 14:12:58 host logger: patch ok
Feb 27 14:12:57 host logger: Your timezone is America/New_York
Feb 27 14:12:57 host logger: Installing updated timezone info
Feb 27 14:12:57 host logger: timezone-1.0.0-1
Feb 27 14:12:57 host logger: Preparing packages for installation...
&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;If necessary you can adjust your timezone setting via "Maintenance-&amp;gt;Webmin Interface-&amp;gt;Change Time Zone".&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;u&gt;Further questions&lt;/u&gt;&lt;/p&gt;
&lt;p&gt;Please contact Thunderstone &lt;a href="http://www.thunderstone.com/texis/site/pages/Support.html"&gt;Support&lt;/a&gt; if you have questions.&lt;/p&gt;</description>
      <pubDate>Fri, 28 Sep 2007 10:44:15 -0400</pubDate>
      <a10:updated>2007-09-28T10:44:15-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1382</guid>
      <link>https://www.thunderstone.com/blog/archive/webinator-as-an-indexing-and-retrieval-tool-for-creating-vertical-search-portals-on-network-hubs/</link>
      <category>Webinator</category>
      <category>Main</category>
      <category>case study</category>
      <title>Webinator as an indexing and retrieval tool for creating vertical search portals on network hubs</title>
      <description>&lt;p&gt;Ecological Internet (EI) maintains up-to-date climate, forests and environment portals that serve more than 35,000 visitors a day. By implementing Thunderstone's Webinator, EI enables its website users to search the indexed content of five million URLs and quickly retrieve the desired information.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why Ecological Internet?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Having earned his B.A. degree in Political Science at Marquette University, Glen R. Barry joined the Peace Corps and went to Papua New Guinea -- where he fell in love with the rainforest while witnessing the tragedy of their very extensive destruction for the sake of making cardboard boxes and other such stuff.&lt;/p&gt;
&lt;p&gt;According to him, “During my Peace Corps service in Papua New Guinea from about 1990 I became an early adopter of the Internet and began looking seriously at how networking technologies could be used to facilitate environmental conservation. In the early days of the Internet I was struck by the fact that communication between people anywhere in the world could be used to spread information that would lead to better resource management decisions and better conservation decisions.”&lt;/p&gt;
&lt;p&gt;After returning from the Peace Corps he completed an M.S. degree in Conservation Biology and Sustainable Development, as well as a Ph. D. in Land Resources, both from the University of Wisconsin-Madison. His primary research revolved around the creation and maintenance of environmental web portals such as Forests.org -- which became one of the first 10,000 web sites on the Internet. Dr. Barry's Ph.D. dissertation was entitled Global Forests and the Internet: Assessing the Reach and Usefulness of the Forest Conservation Portal.&lt;/p&gt;
&lt;p&gt;In 1999 he decided to add search capabilities to Forests.org, while also launching a climate site and an environmental sustainability site.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customized Search Engine for Web Sites&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Dr. Barry explained, “We wanted to be able to make our own customized search engine. We preferred an off-the-shelf solution that we could easily install to crawl, index, search and retrieve content from more than 4,000 reviewed scientific-content sites of interest to our target audience of conservation professionals. I remember searching on the Internet and finding a huge list of spidering and robot software that had about a hundred products on it. A lot of them were ‘open source,’ with little snippets of code. I was more concerned with having a fully implemented product that does what you need it to do. I wasn't interested in doing an open source sort of thing. Where do you go for technical support in those situations? Going through the list, most of them weren't fully implemented packages. Many of them were free, but the amount of time that a small organization would need to spend getting them operational would have offset any cost benefits. There were a few other options, but they were going to be much more expensive than Webinator.&lt;/p&gt;
&lt;p&gt;“At that time our entire budget was like fifteen thousand dollars a year (even now it's only about seventy-five thousand dollars a year in mostly $25 - $100 donations.) So, we're a really small organization. We chose Webinator. I think our initial license with Thunderstone was eight thousand dollars, which was a major purchase for us. It was a big deal. We were trying to do something that hadn't been done before. We had a vision that we wanted to create a specialized search engine on forests content, on climate change content and on water conservation content. The whole purchasing and installation process was straightforward. And Webinator was very, very stable. It just ran. I'm using it on a Windows platform. My operating system is Windows.&lt;/p&gt;
&lt;p&gt;“We wanted to walk about four thousand sites we were feeding, and then we also wanted to do off-site pages. Here's where I think customized search is so good. Not only are we getting the content of the reviewed four thousand sites that I as a scientist have identified, but also each of those sites has links to other sites that are included in our index. So, you have some synergy where you find unexpected things at other good sites. Webinator is a really well thought-out product that has a lot of different tools built into it. It's a full-functioning web indexing and retrieval package. You can even include or exclude specified external links. For instance, we don't want Green Peace's online store and merchandise in our search engine..”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Network “Hubs” to Support Environmental Professionals&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Ecological Internet (EI) does not directly focus on the general audience that's looking for fluffy pictures of panda bears. There are other web sites that do that very well. EI's target audience is primarily conservation professionals who need information retrieval tools and who seek useful data to factually support their own work. These people tend to be already highly motivated on the issues, and what they get from Ecological Internet are practical tools to do their work better.&lt;/p&gt;
&lt;p&gt;Dr. Barry had been employed in the biology department at the University of Wisconsin as their ‘bioinformatics person’ until he left several years ago to run Dennmark, Wisconsin-based Ecological Internet, Inc. (&lt;a href="http://www.ecologicalinternet.org/"&gt;http://www.ecologicalinternet.org&lt;/a&gt;) on a full-time basis.&lt;/p&gt;
&lt;p&gt;“There's a whole branch of science, network science, that over the last decade has studied how diseases spread or how the Internet's organized in a ‘hub’ design comprised of nodes with disproportionately high numbers of links to them. It's like the whole Kevin Bacon ‘six degrees of separation.’ We're all networked, and there are hubs. The Internet is a good demonstration of a lot of these networks. What we tried to do with Ecological Internet was to make a network hub on climate change, a network hub on forests, etc. where all of the best content is linked, indexed and made available in support of intelligent activities to protect the environment. Part of this is awareness, but it's awareness with a purpose to actually achieving something. There is reason to be hopeful. The forces of ignorance and corruption are ominous, but we have new tools - like Webinator - that we've never had before,” said Dr. Barry.&lt;/p&gt;
&lt;p&gt;He continued, “I went up there to Thunderstone's headquarters in Cleveland, Ohio to participate in a Webinator training program two years ago. I had already been using the product for six years. During this whole time I think that the Thunderstone Software team has always been very responsive. I don't know of any other comparable product that brings full-text customized search to non-profits at a reasonable price. We wholeheartedly support Thunderstone and would recommend the Webinator search platform highly..”&lt;/p&gt;
&lt;p&gt;Ecological Internet (EI) now maintains up-to-date climate, forests and environment portals that serve more than 35,000 visitors a day. By implementing Webinator, EI enables its website users to search the indexed content of five million URLs and quickly retrieve the desired information.&lt;/p&gt;
&lt;p&gt;The nonprofits' conservation portals currently include:&lt;/p&gt;
&lt;p&gt;EcoEarth.Info (&lt;a href="http://www.ecoearth.info/"&gt;http://www.ecoearth.info&lt;/a&gt;)&lt;br /&gt;ClimateArk.org (&lt;a href="http://www.climateark.org/"&gt;http://www.climateark.org&lt;/a&gt;)&lt;br /&gt;WaterConserve.org (&lt;a href="http://www.waterconserve.org/"&gt;http://www.waterconserve.org&lt;/a&gt;)&lt;br /&gt;Forests.org (&lt;a href="http://www.forests.org/"&gt;http://www.forests.org&lt;/a&gt;)&lt;br /&gt;OceanConserve.org (&lt;a href="http://www.oceanconserve.org/"&gt;http://www.ocenconserve.org&lt;/a&gt;)&lt;br /&gt;My.EchoEarth.Info (&lt;a href="http://my.ecoearth.info/"&gt;http://my.ecoearth.info&lt;/a&gt;)&lt;/p&gt;</description>
      <pubDate>Fri, 21 Sep 2007 00:00:00 -0400</pubDate>
      <a10:updated>2007-09-21T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1447</guid>
      <link>https://www.thunderstone.com/blog/archive/ecological-internet-portals-tap-thunderstone-to-search-5-million-urls/</link>
      <category>Main</category>
      <category>Webinator</category>
      <title>Ecological Internet Portals Tap Thunderstone To Search 5 Million Urls</title>
      <description>&lt;p&gt;Nonprofits' Conservation Websites Inform More than 35,000 Visitors a Day&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;CLEVELAND,OH — August 24, 2007 — Thunderstone Software LLC announced today that Dennmark, Wisconsin-based Ecological Internet, Inc. has renewed its license of Thunderstone's Webinator Web Index &amp;amp; Retrieval System to continue offering industry-leading search capabilities on all its environment conservation websites, including the highly popular &lt;a href="http://www.ecoearth.info/"&gt;http://www.ecoearth.info&lt;/a&gt; and &lt;a href="http://www.climateark.org/"&gt;http://www.climateark.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.ecologicalinternet.org/"&gt;Ecological Internet, Inc.&lt;/a&gt; is a non-profit organization specializing in the use of the Internet to achieve conservation outcomes. As part of its mission it seeks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;to empower the global movement for environmental sustainability by working to conserve climate, forest, ocean and water ecosystems&lt;/li&gt;
&lt;li&gt;to commence the age of ecological sustainability and restoration&lt;/li&gt;
&lt;li&gt;to provide forest/rainforest, climate, water, ocean and environment conservation websites—presented as a free service to the environmental community.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Ecological Internet maintains up-to-date climate, forests and environment portals that serve more than 35,000 visitors a day. It enables its website users to quickly search the indexed content of five million URLs and retrieve the desired information.&lt;/p&gt;
&lt;p&gt;Thunderstone's Webinator Web Index &amp;amp; Retrieval System allows a website administrator to easily create and provide a high quality retrieval interface to collections of HTML documents. Webinator serves as an example of the type of applications that can be built around Thunderstone's Texis RDBMS and Web Script.&lt;/p&gt;
&lt;p&gt;"Thunderstone's Webinator program has been the foundation of our efforts to build web sites that facilitate environmental conservation by providing accurate information for action," explains Ecological Internet President Dr. Glen Barry. "There is no other comparable product that brings full-text customized search to non-profits at a reasonable price. We wholeheartedly support Thunderstone and would recommend the Webinator search platform highly."&lt;/p&gt;
&lt;p&gt;About Thunderstone&lt;br /&gt;&lt;a href="http://www.thunderstone.com/"&gt;Thunderstone Software LLC&lt;/a&gt; pioneered simultaneous searching of both structured and unstructured data with the Texis relational database optimized for full text search. Since 1981 Thunderstone has continued to develop its global reputation as provider of the world's most powerful, scalable and flexible enterprise search solutions.&lt;/p&gt;
&lt;p&gt;Sales Contact: Mark Bacho&lt;br /&gt;&lt;img src="https://www.thunderstone.com/site/images/markbachoemail.gif" alt="" /&gt;&lt;br /&gt;+1 216 820 2200 ext.105&lt;/p&gt;
&lt;p&gt;Media Contact: Peter Thusat&lt;br /&gt;&lt;img src="https://www.thunderstone.com/site/images/peterthusatemail.gif" alt="" /&gt;&lt;br /&gt;+1 216 820 2200 ext.118&lt;/p&gt;</description>
      <pubDate>Fri, 24 Aug 2007 00:00:00 -0400</pubDate>
      <a10:updated>2007-08-24T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1455</guid>
      <link>https://www.thunderstone.com/blog/archive/thunderstone-to-power-hollywoodcom-search-engine/</link>
      <category>Main</category>
      <category>Webinator</category>
      <title>Thunderstone To Power Hollywood.com Search Engine</title>
      <description>&lt;p&gt;CLEVELAND, OH -- (MARKET WIRE) -- 06/15/2004 -- Thunderstone Software announced its sale of a search system for the Hollywood.com website. Thunderstone's Webinator product will provide an up-to-date indexing of around one million movie reviews, entertainment listings, and related information appearing on the re-designed new and improved Hollywood.com web site.&lt;/p&gt;
&lt;p&gt;Webinator is one of the original web-search software products, introduced in 1995 and now in major release 5. Webinator powers search features for many hundreds of web sites in many languages around the world.&lt;/p&gt;
&lt;p&gt;Hollywood.com is one of the leading movie-related sites on the Internet, featuring movie reviews, showtimes listings, entertainment news, and an extensive multimedia library. Hollywood.com serves more than one billion web pages annually.&lt;/p&gt;
&lt;p&gt;"We found Webinator the best technology for searching our content under our requirements," said Laurie S. Silvers, President of Hollywood.com. "Its many advanced features were crucial, such as its ability to reindex fast-changing subsets of the site quickly, and its ability to crawl through JavaScript links. Our site structure is somewhat complicated but Webinator handles it beautifully."&lt;/p&gt;
&lt;p&gt;Thunderstone's general manager, John Turnbull, said: "We built many features into Webinator to optimize it for indexing large complex web sites. We're happy to see that effort put to good use on Hollywood.com."&lt;/p&gt;
&lt;p&gt;Thunderstone's Webinator is available in a variety of configurations including a free version. For more information, see http://www.webinator.com.&lt;/p&gt;
&lt;p&gt;ABOUT THUNDERSTONE&lt;/p&gt;
&lt;p&gt;Thunderstone Software LLC is a pioneer of search engine technology, providing text retrieval products to industry, government, and educational institutions since 1981. Thunderstone's flagship Texis software is the most versatile platform for search application development. Texis is at the heart of the Webinator and Search Appliance products. Texis uniquely integrates natural language and relevance ranking functionality with structured SQL relational database indexing. Applications of Thunderstone technologies include: online publishing, product catalogs, classified advertising, document management, text mining, web searching, and intelligence. Thunderstone's products are used on thousands of web sites worldwide. Major customers include eBay, Corbis, QVC, About.com, ZDNet, and HotJobs. For more information, visit http://www.thunderstone.com or call +1 216-820-2200.&lt;/p&gt;</description>
      <pubDate>Tue, 15 Jun 2004 00:00:00 -0400</pubDate>
      <a10:updated>2004-06-15T00:00:00-04:00</a10:updated>
    </item>
    <item>
      <guid isPermaLink="false">1461</guid>
      <link>https://www.thunderstone.com/blog/archive/search-engine-breakthrough-knows-to-ignore-junk/</link>
      <category>Main</category>
      <category>Webinator</category>
      <title>Search Engine Breakthrough Knows To Ignore 'junk'</title>
      <description>&lt;p&gt;Cleveland, Ohio, Nov. 29, 2001 - A leading search-engine software vendor is the first to solve the junk results problem. Junk results are search hits that match words in web page headers, footers, and navigation menus, as opposed to the page's actual unique content.&lt;/p&gt;
&lt;p&gt;The Webinator site-indexing product, from Thunderstone Software, now automatically ignores text repeated across multiple pages of a site. Users thus receive more precise search results, without spurious matches on boilerplate text.&lt;/p&gt;
&lt;p&gt;The breakthrough enhancement is part of the new Webinator 4 release, available immediately. Webmasters may download a free version of Webinator from Thunderstone's web site.&lt;/p&gt;
&lt;p&gt;Many additional features are new in Webinator 4. They include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Categorization. Pages may be automatically assigned to categories that are searchable separately, even if the site itself is not organized by category.&lt;/li&gt;
&lt;li&gt;Web-based administration. Settings such as the indexing schedule may be controlled by an authorized user, from a browser anywhere on the internet.&lt;/li&gt;
&lt;li&gt;Source code. Programmers now may modify any aspect of the program, by means of familiar SQL commands and CGI scripting.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Webinator is an application of Thunderstone's sophisticated Texis system, a combination search engine and database. Texis serves more than 30 million searches a day at eBay, and additional millions a day for other popular web sites such as HotJobs, About.com, and Master.com.&lt;/p&gt;
&lt;p&gt;Essentially the same software powering those sites is available free to web site operators. "Most webmasters don't need to pay for a site-search function," said Doran Howitt, Thunderstone spokesman. "Free Webinator will let you search up to 10,000 pages per index, which is more than sufficient for the majority of sites."&lt;/p&gt;
&lt;p&gt;Webinator is subject to acceptable-use license provisions. Thunderstone offers expanded support and capacity for purchase, if needed. Promotional discounts for upgrades or expanded capacity are in effect until Dec. 31.&lt;/p&gt;
&lt;p&gt;About Thunderstone Software&lt;/p&gt;
&lt;p&gt;Thunderstone Software LLC provides high-performance solutions for text-searching in conjunction with relational database applications. Uses of this technology include publishing, catalogs, classified advertising, corporate portals, and web-searching. Thunderstone is a 20-year-old, profitable company, whose products are used on thousands of web sites worldwide.&lt;/p&gt;</description>
      <pubDate>Thu, 29 Nov 2001 00:00:00 -0500</pubDate>
      <a10:updated>2001-11-29T00:00:00-05:00</a10:updated>
    </item>
  </channel>
</rss>