|
In Webinator version 4 and earlier, the refresh walk checked every
page in the database to determine whether it needed updating. Since only changed
pages need updating, and those are typically a small percentage of the
site, checking for changed pages is faster than doing a complete new walk. However, it is still time-consuming, because the web server must be accessed for
every page on the site, and only the web server can inform Webinator whether
the page has changed.
In Webinator version 5 and later, there is an improved refresh
process. The walk is adapted to focus on the small but important group
of changing pages. As each page is walked, a refresh period is
calculated for that individual page. The calculation is based on
whether the page has changed since the last time it was fetched, and
how long ago that fetch was. This refresh information is used to
determine when the page should be checked again. In this way, the walk
prioritizes the walking of pages that change often or are new, and it
delays the fetch of pages that seldom change.
Thus, when a walk (scheduled or manual) takes place, only the pages
that need to be refreshed now are actually fetched - not the entire
database. The result is a database that is updated by a process that
consumes fewer server resources.
Copyright © Thunderstone Software Last updated: Thu Mar 11 16:13:32 EST 2010
|