Navigation Toggle

May 2009 Newsletter

May 31, 2009

May 2009 - Archive

CONTENTS


HAPPENINGS

SUPERIOR TECH SUPPORT LEADS TO DEVELOPMENT OF THUNDERSTONE SEARCH APPLIANCE VERSION 7

As a direct response to ongoing practical input from customers who regularly interact with our tech support engineers, Thunderstone Software will release Version 7 of the Thunderstone Search Appliance on June 8, 2009. We've added a number of desirable new features — which also apply to our Parametric Search Appliances and to Thunderstone's entire line of Appliance products. These performance enhancements include:

 

    • Faster Category Searching
      We've provided a new setting to improve category search speed when categories are distinct/non-overlapping (i.e., when no URL belongs to more than one category.)

 

 

    • Enhanced Federated Search (Thunderstone Meta Search)
      Search users can now select which back-end profiles to actually search, from a list configured by their Appliance administrator.

 

 

    • Character Match Mode
      Non-English/Unicode character support is greatly improved with this setting, which not only allows case-insensitive searching of foreign characters but also enables ignore-accents ("e" matches "é"), ligature expansion ("oe" matches "œ") and more.

 

 

    • Language Analysis Module
      An optional feature for Appliance owners who pay to have it activated, this setting improves CJK (Chinese/Japanese/Korean) searches with extra processing to put spaces between words so they can be found without wildcards when adjacent to others.

 

 

    • MySQL in DB Walker
      We've added support for crawling MySQL databases to our DB Walker.

 

 

    • Updated "Look and Feel"
      As well as receiving some improvement from having its text slightly reorganized, the administrative interface also now employs Cascading Style Sheets (CSS) for more modern HTML usage.

 

All Thunderstone customers with current Appliance maintenance agreements in place automatically qualify to receive the downloadable Version 7 software update. Please phone +1 216 820 2200 if you have any questions (business days, 10 a.m. - 6 p.m. Eastern Time.)


CUSTOMER QUOTE OF THE MONTH

 

“We use our Thunderstone Search Appliance 1000 not for web searching but for internal directory searches of call logs, packing lists, sales lists, invoices, load sheets, etc. People would formerly request needed documents, and we'd have to manually make photocopies and deliver them — which required lots of time and effort. Now that we have a Thunderstone Search Appliance, everybody can search and find what they want quickly and easily. It seems to work very well for what we need.”

 

Michael Lee
Director of I.T.
Carry-On Trailer Corporation
http://www.carry-ontrailer.com


UPCOMING EVENTS

Thunderstone's R & D team has several other exciting enterprise search development projects in the 2009 pipeline, as our staff continues to work on:

 

    • TEXIS Version 6

 

 

    • Webinator Version 6

 

 

    • TEXIS Catalog — our newest eCommerce search engine for online catalogs

 

Plus, we will continue adding even more features to the Search Appliance products this year. Look for details on all these scheduled releases in future issues of Thunderstone News.


TECH TIPS: USING KEEP/IGNORE TAGS ON YOUR SEARCH APPLIANCE OR WEBINATOR SOFTWARE

Just the Facts, Ma'am — Specifying the content with Keep/Ignore tags

As websites grow larger and more complex, they contain more and more "cruft" that search engines stumble upon. Things that may be small or even hidden by javascript and CSS are front and center to search engines. Standard headers, menus and breadcrumbs are just some of the things that may be polluting your search's data.

Keep Tags and Ignore Tags allow your page authors to indicate what should and shouldn't be used from a webpage. It allows you to trim the fat from your pages so the only thing that gets searched is the content, instead of all the other fluff.

 

    • Keep Tags specify a beginning and ending expression, and only the content between the beginning and end are kept. This is performed on the HTML source, so it's common to use comments as the begin/end tags.

 

 

    • Ignore Tags specify a beginning and ending expression, and the content between the beginning and end tags is DISCARDED. This is also performed on the HTML source, so HTML tags/comments are fair game.

 

Which to use? Both can accomplish the same goal. It's just a question of which will be logistically easier for you. If you have mostly content with just a little bit of extra info, Ignore Tags will probably be easier. If you have a small amount of content awash in a sea of cruft, then putting Keep Tags around the content may be easiest. Plus, you're not limited to using just one or the other. You can also use a combination of Keep Tags and Ignore Tags on your content as you see fit.


Feedback, suggestions and questions are welcome. Send your email to

March 2009 Newsletter

March 31, 2009

March 2009 - Archive

CONTENTS


CUSTOMER QUOTE OF THE MONTH

 

"We use the Thunderstone Search Appliance to crawl, index and search Word files, PDFs and other content in our law firm's internal document management system. The Appliance gives us a lot of customization options in the way it operates, with excellent control over precisely what we want to make searchable and what we don't want included. It does everything we need it to do. You can just plug it in and forget about it. It works great. After years of trouble-free performance, when we finally did have a hardware failure — Thunderstone had us quickly up and running again on the same day we received our replacement unit. Their level of customer support is almost unheard of in the I.T. industry."

 

Michael E. Salopek
I.T. Manager
Janik, Dorman & Winter, L.L.P.
http://www.janiklaw.com


TECH TIPS: CONTROLLING YOUR CRAWL WITH WEBINATOR OR THUNDERSTONE SEARCH APPLIANCES — EXCLUDE BY FIELD

Last time we discussed exclusions and requirements for managing what pages your crawler gets, but there's one setting that gets a Tech Tips all to its own: Exclude by Field. It gives you extra power in how you're excluding and what exactly is being excluded.

    • "Metamorph query" matching

      Rather than a prefix or substring match, Exclude by Field uses a "Metamorph query", which is the full-text matching engine used for our normal searches. You can simply type in words to match, or if you begin with a slash (/) then it is treated as a REX expression (our RegEx-like pattern matching language; see the "REX" section in the Vortex documentation on our website for more details).

    • Multiple fields for exclusion

      All previously discussed exclusion & requirement options operate only on the URL itself. Exclude by Field allows you to exclude based on a number of different other areas:

      • HTML — Matches against the raw HTML of the page. Useful if there's something in an HTML comment that you'd like to base the match on.
      • Text — The formatted text of the URL. This is the same text you'd see if you looked at the list/edit info of a page or at the "Match Info" in the search results. Useful if you what to match text but want to ignore any HTML markup that may or may not be present.
      • All Meta — The contents of all available meta fields are put together and then matched against.
      • Meta Field -> — Matches against the contents of a specific meta field, which you specify in the next column "From Meta Field".
      • Keywords, Description, & Mime Type — Matches against the text of these common meta fields.
      • URL — Matches against the URL, just like Exclusion REX. You may want to use this to get the extra Exclude options, listed below.

 

    • What to exclude

      Beyond more power in specifying what to match, Exclude by Field also gives you more control with what to do when you get a match.

      • Pages and Links — This acts like any other exclusion rule. The page and its links are kept completely out of the walk data.
      • Pages only — The content of the page is not included in the walk, but the links from the page ARE followed.
      • Links only — The page is included in the walk, but the links from the page are not followed.

 

    • A word on efficiency

      A disadvantage that Exclude by Field has when using any Field except URL is the page must be fully fetched before the rule can be applied.

      With all other exclusion rules (and Exclude by Field on URL), the URL can be thrown out before the page is fetched an processed.

      When performing Exclude by Field on the content of the page, though, the page must be downloaded and fully processed before we can know if it has HTML or a Body that matches the rules specified.

      When possible, it's better to use other exclusion rules or the URL target for Exclude by Field, as this will allow you to prune URLs before they are fetched. Still, there are many things that Exclude by Field can do that the other settings simply can't (as mentioned below).

    • Example — Excluding directories from a file crawl

      A perfect example of Exclude by Field is directories when performing a file crawl — we can't fully exclude directories because they are what link to all the files, and without them we'd have nothing. Still, we might want them not to show up in the search. We can get this with Exclude by Field.

      • Metamorph Query "//=>>=" (without the quotes) — This is a REX expression for "match anything that ends in a slash". Please see the REX section of the Vortex documentation if you'd like more details on REX syntax.
      • Field - URL
      • Exclude - Pages only — This will keep the contents of the directory "pages" out of the crawl but will still follow the links to get the actual files and use them in the search.

If you have any questions about how to use Exclude by Field, please feel free to contact Thunderstone Support — and we'll discuss it.


HAPPENINGS

The February 2009 issue of CRN, a publication of Everything Channel and ChannelWeb.com, recognized the "top Channel Chiefs in the industry based upon their record of business innovation and dedication to the partner community." This annual list, which CRN calls "Our definitive guide to the movers and shakers of I.T. channel management," included Frederick A. Harmon (Thunderstone's Channel Director & CSO.)

You can visit the CRN website (http://www.crn.com/crn/chiefs/2009cc.jhtml?chief=136) to view pertinent information about Fred Harmon in the 2009 Channel Chiefs list.

 

UPCOMING

Thunderstone's John Turnbull (President and CEO) will present a workshop session entitled The Next Generation in Search: Today's Best Practices on Friday, April 17, 2009, (2:00 p.m. - 3:30 p.m.) during the DigitalNow 2009 Conference at Disney's Yacht and Beach Club Resorts in Lake Buena Vista, Florida.

Session Description
Search has progressed from a complex tool used by librarians through simple tools that let users perform a keyword search, to today's information access tools that can still provide users a simple interface but make use of much of an association's collective knowledge. In this workshop participants will learn what sorts of information can be behind a search engine and how to make it more valuable to users. The session includes a case study from IEEE, the world's largest technical membership association that significantly improved their business by focusing on their customers and helping them access content in new ways.

DigitalNow (http://www.fusionproductions.com/digitalnow/) is an annual conference that brings together senior-level executives and volunteer leaders from some of the most influential professional and trade associations in America. Produced by Fusion Productions and Disney Institute, two of the foremost authorities in adult educational design, with input from registered attendees and a conference advisory board, DigitalNow addresses the critical issues facing association leaders in the digital age.


GET YOUR FREE FLOOR PASS (A $75 VALUE) TO THE MARCH 30 - APRIL 2, 2009 AIIM INTERNATIONAL EXPOSITION + CONFERENCE IN PHILADELPHIA

The AIIM International Exposition + Conference, the yearly gathering for information management professionals across industries and lines of business, will take place Monday, March 30, through Thursday, April 2, 2009, at the Pennsylvania Convention Center in Philadelphia, PA. With 19 tracks, more than 135 conference sessions featuring more than 100 real-world case studies, and an Expo floor showcasing 200+ information management technology solution providers, the event aims to provide attendees with actionable insight they can use.

REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS
and get access to all keynotes, general sessions,
Expo floor education and the co-located ON DEMAND Expo!

To receive your free pass, use Registration Code: 615M
when you register at WWW.AIIMEXPO.COM
or call +1 888 824 3004.

Your FREE pass comes to you compliments of Thunderstone Software. Please stop by and visit Fred Harmon (Channel Director & CSO) and Peter Thusat (Communication Director & CMO) at Booth 1045.


 

February 2009 Newsletter

February 28, 2009

February 2009 - Archive

CONTENTS



TECH TIPS: CONTROLLING YOUR CRAWL WITH WEBINATOR OR THUNDERSTONE SEARCH APPLIANCES — REQUIREMENTS & EXCLUSIONS

The crawler provides many ways of controlling what you do and don't crawl. Note that URLs manually specified by you (Base URLs, URL URLs, Single Pages, etc.) are exempt from all inclusion/exclusion rules — they will always be used.

  • Exclusions

    The shotgun approach — any URLs that contain any of the text listed in an exclusion line anywhere in the URL will not be included in the walk. It doesn't need to be a full path or filename, sub-matches are okay.

    If you specify "archive" as an exclusion, then "http://www.example.com/archive/index.htm" will be excluded and "http://www.example.com/site/newsarchivefrom2004" will also be excluded.

  • Exclusion prefix

    Like Exclusion, except it has to be the same starting from the beginning. This gives a bit more control over what exactly matches.

    If you specify "http://www.example.com/archive" as an exclusion prefix, then "http://www.example.com/archive/index.htm" will be excluded and "http://www.example.com/archivePages..htm" will be excluded, but "http://www.example.com/site/newsarchivefrom2004" will be allowed.

  • Required prefix

    The opposite of Exclusion prefix — instead of rejecting URLs that DO match the prefix, it rejects URLs that DON'T match the expression. Both settings are used for weeding out URLs, it just swaps which are used and which aren't. Multiple Required prefixes can be specified, and URLs are allowed if they match at least one.

    If you specify "http://www.example.com/archive" as an required prefix, then "http://www.example.com/archive/index.htm" will be used and "http://www.example.com/archivePages..htm" will be used, but "http://www.example.com/site/newsarchivefrom2004" will be excluded.

  • Exclusion REX & Required REX

    Similar ideas to Exclusion prefix & Required prefix, except you use our powerful REX pattern matcher to specify what should match instead of just a prefix. It's similar to regular expressions but much faster. Please see the the REX pages in the Vortex manual on our website (http://www.thunderstone.com/site/vortexman/rex_split.html) for more details on the exact syntax.

If you specify both requirements and exclusions, then URLs must satisfy both to be used — they must not match any Exclusion Prefix, AND they must match at least one Required Prefix (if specified).

There's an even more powerful way to exclude pages with Exclude by Field, but that's for another Tech Tips article. (Watch for it in next month's newsletter.)

If you have questions about how any of these operate, feel free contact Thunderstone Support.


HAPPENINGS

Steve Kolowich, a reporter for THE CHRONICLE OF HIGHER EDUCATION, noted what he referred to as Thunderstone's determined efforts at "Out-Googling Google" in his article entitled In Search of a Better Search Engine (http://chronicle.com/free/v55/i24/24a01501.htm) for the February 20, 2009 issue of The Chronicle. He wrote, in part:

The Virginia Bioinformatics Institute at Virginia Tech, facing a thickening swamp of digital documents, opted for Thunderstone's search appliance, which starts at $13,000, about six months ago. The institute uses the device to index reams of unpublished data and notes stored on its intranet. James E. Stoll, who leads Internet projects at the institute, said the appliance allowed research collaborators and other authorized users to retrieve items from across the institute's network of repositories without exposing those documents to the public Web, as basic site-search software would require. Researchers "don't want to be scooped," Mr. Stoll said. "This is their livelihood."

 

UPCOMING

Thunderstone's Fred Harmon (Channel Director and CSO) and Peter Thusat (Communication Director and CMO) will participate as exhibitors (Thunderstone Software Booth: 1045) during the AIIM International Exposition and Conference March 30 - April 2, 2009 at the Pennsylvania Convention Center in Philadelphia, PA.

Conference: March 30 - April 2, 2009
Exhibits: March 31 - April 2, 2009


GET YOUR FREE FLOOR PASS (A $75 VALUE) TO THE MARCH 31 - APRIL 2, 2009 AIIM INTERNATIONAL EXPOSITION IN PHILADELPHIA

REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS
and get access to all keynotes, general sessions,
Expo floor education and the ON DEMAND Expo!

To receive your free pass, use Registration Code: 615M
when you register at WWW.AIIMEXPO.COM
or call +1 888 824 3004.

Your FREE pass comes to you compliments of Thunderstone
Software. Please stop by and visit us at Booth 1045.


CUSTOMER SUCCESS STORY: USING WEBINATOR TO SEARCH ONLINE COLLECTIONS OF EURASIAN AND EAST EUROPEAN RESEARCH

The Center for Russian & East European Studies, a sub-unit of the larger University Center for International Studies (UCIS) at the University of Pittsburgh, won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library — a collection that includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. Thunderstone's Webinator indexing and retrieval software enabled the responsible Informatics team to accomplish this goal in an efficient and affordable manner.

The University Center for International Studies (UCIS) provides the organizational framework that supports the University of Pittsburgh's mission to integrate and reinforce all its strands of international scholarship in research, teaching and public service. UCIS includes — in addition to many other highly-acclaimed programs and component units — a Center for Russian & East European Studies, an Asian Studies Center, a Center for Latin American Studies, a European Studies Center, an International Business Center (jointly sponsored with the Katz School of Business) and a European Union Center of Excellence (funded by the European Union.)

As a thin layer on top of the whole UCIS structure, Central Administration handles all business-related core functions and technology issues. When individuals in any of the sub-units need advice or consulting related to I.T. Services, Knowledge Management, database planning, upgrading of their websites or anything else that would fall into technology-mediated information, they call upon Mark J. Weixel, Director of Informatics at UCIS.

Discovering Webinator and Getting Started With Using It as an Easily Customizable Development Tool

Weixel recalled, "Back in I guess it was '98, I found out about Webinator from a friend of mine who was at Princeton at the time. We had a particular niche here in International Studies, and we wanted to create mini search engines for web content that was specific to certain world regions. We were hoping to create search engines like AltaVista, since Google wasn't even around then, that would allow people to do full-text searching of those websites. But, because we were vetting the list of sites, we thought we could increase the probability that searchers would come across something really relevant to the part of the world we were focusing on.

"We used Webinator to index and search collections of websites that were in and dealt with Russia and Eastern Europe.

"So, that was my original introduction to Webinator. We bought the entry-level product to begin with, and we currently have the Enterprise version. What I really like about it, still, is the fact that it's relatively easy to configure. It's much easier to configure that it was back when we bought the original product, when everything was run through command lines. I like the notion of relevance in terms of returned hits. It seems to make a lot more sense to me than, for example, Google page ranking — which places a much higher priority on popularity than it does on the actual content of the pages where text matches.

"Another thing that has been nice is the fact there is support for synonym matching within the server. And I think Vortex as a scripting language is very powerful. Even though I haven't used it to its fullest ability, it's proven to be quite flexible when we've needed to make modifications." Read More...
Download the 3-page UCIS case study PDF here.


Feedback, suggestions and questions are welcome. Send your email to

Customer Success Story: Using Webinator To Search Online Collections Of Eurasian And East European Research

February 24, 2009
Customer Success Story: Using Webinator To Search Online Collections Of Eurasian And East European Research

The Center for Russian & East European Studies, a sub-unit of the larger University Center for International Studies (UCIS) at the University of Pittsburgh, won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library — a collection that includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. Thunderstone's Webinator indexing and retrieval software enabled the responsible Informatics team to accomplish this goal in an efficient and affordable manner.

The University Center for International Studies (UCIS) provides the organizational framework that supports the University of Pittsburgh's mission to integrate and reinforce all its strands of international scholarship in research, teaching and public service. UCIS includes — in addition to many other highly-acclaimed programs and component units — a Center for Russian & East European Studies, an Asian Studies Center, a Center for Latin American Studies, a European Studies Center, an International Business Center (jointly sponsored with the Katz School of Business) and a European Union Center of Excellence (funded by the European Union.)

As a thin layer on top of the whole UCIS structure, Central Administration handles all business-related core functions and technology issues. When individuals in any of the sub-units need advice or consulting related to I.T. Services, Knowledge Management, database planning, upgrading of their websites or anything else that would fall into technology-mediated information, they call upon Mark J. Weixel, Director of Informatics at UCIS. 

Discovering Webinator and Getting Started With Using It as an Easily Customizable Development Tool

Weixel recalled, "Back in I guess it was '98, I found out about Webinator from a friend of mine who was at Princeton at the time. We had a particular niche here in International Studies, and we wanted to create mini search engines for web content that was specific to certain world regions. We were hoping to create search engines like AltaVista, since Google wasn't even around then, that would allow people to do full-text searching of those websites. But, because we were vetting the list of sites, we thought we could increase the probability that searchers would come across something really relevant to the part of the world we were focusing on.

"We used Webinator to index and search collections of websites that were in and dealt with Russia and Eastern Europe.

"So, that was my original introduction to Webinator. We bought the entry-level product to begin with, and we currently have the Enterprise version. What I really like about it, still, is the fact that it's relatively easy to configure. It's much easier to configure that it was back when we bought the original product, when everything was run through command lines. I like the notion of relevance in terms of returned hits. It seems to make a lot more sense to me than, for example, Google page ranking — which places a much higher priority on popularity than it does on the actual content of the pages where text matches.

"Another thing that has been nice is the fact there is support for synonym matching within the server. And I think Vortex as a scripting language is very powerful. Even though I haven't used it to its fullest ability, it's proven to be quite flexible when we've needed to make modifications."

Implementing a Sophisticated Indexing and Retrieval Package with an Attractive ROI Track Record

Did they look at any competing products? According to Weixel, no, they didn't — for a couple of reasons. One, they're a small shop and they have to ask, "How much is this going to cost?" And, he said, the ROI for a one-time investment in a perpetual Webinator license was always pretty clear. It was a known quantity to them. Plus, Weixel strongly believed, as the person in charge of actually setting up and administering it, Webinator provided an affordable and high-quality solution for his specific application requirements. The business manager trusted Weixel's judgment, and by all accounts Webinator has delivered excellent results.

As to future expansion beyond the Center for Russian & East European Studies, discussions have begun with several of the other sub-units within UCIS. The Center for Latin American Studies and the European Studies Center also seem interested in putting more and more of their materials online — newsletters, conference reports, etc.

Webinator offers UCIS sub-units the possibility of acquiring a well-proven search engine that they could customize as desired and manage on their own.

Digitizing, Capturing and Making Searchable the Publications that Comprise the Vladimir I. Toumanoff Virtual Library

Weixel said their Webinator-powered search implementation getting the heaviest use right now is a project that the University of Pittsburgh's Center for Russian and East European Studies (REES) has done in conjunction with The National Council for Eurasian & East European Research (NCEEER, frequently pronounced 'Nickser') — a federally funded organization charged with supporting research, typically in social sciences, focusing on the former Soviet Union and Eastern Europe.

REES won a competition a number of years ago to create the Vladimir I. Toumanoff Virtual Library comprised of research reports and working papers submitted to NCEEER by scholars under their grants over the last two decades. This collection includes searchable online documents from many top U.S. researchers and analysts who write about politics, history, sociology, economics and foreign policy related to the states of the former Soviet Union and Central and Eastern Europe. NCEEER continues adding to the collection as its funded researchers prepare new papers.

"We proposed scanning and digitizing more than 20 years' worth of reports and then taking it and essentially pointing Webinator at it and, using the documents plug-in, doing a full-text index of the entire corpus. And I think one of the reasons that we won the competition is because, once we had done the really hard work of creating PDFs out of all the printed documents — we were going to be able to put it in once place and, overnight, have a full-text search index. It's my understanding that that was not a component of the other proposals," said Weixel.

He continued, "We successfully contended for that particular project, got it, spent the better part of nine months digitizing the materials and, I kid you not, it took, I think, less than 24 hours, and we had a fully searchable index of the entire corpus of research products. And it worked out well. We have this nice, targeted archive of material. We've got it set to re-index on a regular schedule, so anytime NCEEER gets a new batch of project reports — they upload them, they get caught in the next cycle of indexing, and it makes us very happy.

"The search interface for the archive materials of NCEEER is available through the Vladimir I. Toumanoff Virtual Library at the website of The National Council for Eurasian & East European Research. You kick off the search there, and then you're transported to Pittsburgh for the actual results set.

"Recently we put the server housing Webinator behind the firewall as part of our new increased security policy at the University of Pittsburgh. The fact that the folks at Thunderstone — John, in particular, in the Support Group — were able to work with me in coming up with a way to take a search query and pipe it through a back door into Webinator and then take the result set and present that to users in an accessible front-end, was just fantastic. It took me about two weeks once I had access to the beta version of the code, and that worked out really well. It was satisfying for me on a number of levels, not just because the product did what it was supposed to, but because I had support from people who could actually help me efficiently accomplish what I needed to do. That worked out very, very well."

Weixel added, "Our audience is interesting. Of course, we're housed within a major research university. So, we do have a number of our projects where we're trying to target our students and our faculty. But the area studies centers, these sub-units underneath the University Center for International Studies, most of them have federal funding that mandates what they call 'outreach' — trying to bring the message of international studies to a larger community, whether it's a local business community or whether it's local educators at the Kindergarten through high school level. Most of them probably have some kind of academic interest in one of the regions of focus. However you look at it, it's a pretty large and diverse audience.

"Being in an international studies environment, one thing that is important to us is foreign language support. I will admit to not having tried this yet with any of the CJK languages. But, in terms of the European and Cyrillic-based languages that we've indexed, Webinator has been a really good performer. And we've been quite happy with that."

For more information about UCIS or any of its area studies centers, you may contact UCIS by mail or email at:

University of Pittsburgh
University Center for International Studies
4400 Wesley W. Posvar Hall
Pittsburgh, PA 15260

January 2009 Newsletter

January 31, 2009

January 2009 - Archive

CONTENTS


IMPROVING EFFICIENCY IN THE NEW YEAR

Was one of you New Year's resolutions to improve the efficiency of your I.T. infrastructure? Our expert engineering staff is available to clients with a current maintenance contract, and for the next month you can schedule a free 15-minute consultation on a first come first served basis to discuss your architecture and get immediate advice on improving your solution. If you need more time, that can also be arranged. There are a limited number of time slots available. So, make sure to call today at +1 216 820 2200.


TECH TIPS: THE MANY WAYS OF SPECIFYING URLS FOR THE CRAWL WITH WEBINATOR OR YOUR SEARCH APPLIANCE

There are a number of ways to specify what URLs you'd like the software to crawl, and which will be easiest to use can depend on your situation.

 

    • Base URL
      The old standby -- URLs listed in the Base URL will be crawled, and the entirety of all pages they link to will be included. If you only have one or two sites and start from the top, this is definitely the way to go.

 

 

    • URL URL
      Sometimes you may have dozens or hundreds of base URLs, maybe for doing many different folders on a site (but not all of them). If putting them all in a text box is starting to feel unwieldy, you can use the URL URL instead.

       

      Create a text file somewhere on your website that contains the URLs you want to use as Base URLs, each on its own line. Then you can specify the URL to _that_ in the URL URL setting. The URL URL is fetched by the crawler, and every URL is treated as a Base URL. This can make it easier to manage a frequently-changing list of URLs.

      This is the only benefit of URL URL. The pages will still be crawled EXACTLY as if they were all listed as Base URLs. It exists only to make the list easier to manage for you.

 

 

    • Single Page
      Sometimes you have a single page, or a handful of pages, where you just want that page crawled, but none of its links. This is exactly what Single Page is for.

       

      The URLs listed in Single Page are fetched, and their links are ignored.

 

  • Page URL
    Just as "URL URL" is a list of URLs for "Base URL", "Page URL" is a list of URLs for "Single Page". URLs listed here should point to a plain text file on your server, each URL on its own line. Every one of those pages is fetched, and their links are completely ignored.

     


QUOTE OF THE MONTH

"I should take a moment to let you know how much we appreciate the Webinator product. For us, it's very fast, easy to configure and meets all our needs. Thanks for such a great product!"

David Arbuthnot
VP IT
MS Society of Canada
http://www.mssociety.ca


MORE CUSTOMER SUCCESS STORIES COMING THIS YEAR

In the coming months you'll find a number of interesting new case studies at the Thunderstone.com website. As usual, we'll also feature links to them them here in this newsletter. Keep an eye out for them. You won't want to miss these case studies of TEXIS, Webinator and Thunderstone Search Appliances "in action".


Feedback, suggestions and questions are welcome. Send your email to editor@thunderstone.com.

Recent