|
In Webinator version 4 and earlier, the refresh walk checked every
page in the database to determine whether it needed updating. Since only changed
pages need updating, and those are typically a small percentage of the
site, checking for changed pages is faster than doing a complete new walk. However, it is still time-consuming, because the web server must be accessed for
every page on the site, and only the web server can inform Webinator whether
the page has changed.
In Webinator version 5 and later, there is an improved refresh process. The walk is adapted to focus on the small but important group of changing pages. As each page
is walked, Webinator calculates a refresh period for that
individual page. The calculation is based on whether the page has changed since the
last time it was fetched, and how long ago that fetch was. This
refresh information is used to determine when the page should be checked
again. In this way, the walk prioritizes the walking of pages that change often or are new, and it delays the fetch of pages that seldom change.
Thus, when a walk (scheduled or manual) takes place, only the pages
that need to be refreshed now are actually fetched - not the entire
database. The result is a database that is updated by a process that
consumes fewer server resources.
Copyright © Thunderstone Software Last updated: Tue Nov 6 10:55:12 EST 2007
|