Note: This documentation is for an old version of Webinator. The latest documentaion is here.

Rewalk Type

Syntax: select from drop down box

This determines how rewalks are performed. The default type is New which creates a new database and does a complete walk of everything while not disturbing the existing database.

The rewalk type Refresh works on the existing database and only downloads files that have been modified or created since the last walk. Pages that are no longer present on the server are removed from the database. Pages that were referenced but missing in the initial walk but appear later will be missed by refresh if their parent page has not been modified. If you change your settings to be more inclusive (ie add extensions, ignore robots, add domains, etc.) you should do a New walk once as a Refresh will not be likely to find the newly allowed data unless all of the pages leading to such data have been modified.

If more than 30%-50% of your site changes between walks you may be better off using a New walk instead of Refresh. Also, many dynamic content generators do not give modified dates which will cause every page to be rewalked. In that case you should use New instead of Refresh.

Method Advantages Disadvantages
New Guarantees most accurate representation of current site. Uses more bandwidth.
Does not disturb an existing search database. Uses more temporary disk space.
Refresh Faster. Could get out of sync with actual site under rare circumstances.
Uses less bandwidth. A lot of changed pages could substantially slow searches during the walk.
Uses less temporary disk space. Requires "if-modified-since" support on walked web server.


Copyright © Thunderstone Software     Last updated: Tue Nov 6 10:58:37 EST 2007
Copyright © 2024 Thunderstone Software LLC. All rights reserved.