Syntax: select from drop down box
This determines how rewalks are performed. The default type is New
which creates a new database and does a complete walk of everything while
not disturbing the existing database.
The rewalk type Refresh
works on the existing database and only
downloads files that have been modified or created since the last walk.
Pages that are no longer present on the server are removed from the database.
Pages that were referenced but missing in the initial walk but appear later
will be missed by refresh if their parent page has not been modified.
If you change your settings to be more inclusive (ie add extensions, ignore
robots, add domains, etc.) you should do a New
walk once as a
Refresh
will not be likely to find the newly allowed data unless
all of the pages leading to such data have been modified.
If more than 30%-50% of your site changes between walks you may be better
off using a New
walk instead of Refresh
. Also, many dynamic
content generators do not give modified dates which will cause every page
to be rewalked. In that case you should use New
instead of
Refresh
.
Method | Advantages | Disadvantages |
New | Guarantees most accurate representation of current site. | Uses more bandwidth. |
Does not disturb an existing search database. | Uses more temporary disk space. | |
Refresh | Faster. | Could get out of sync with actual site under rare circumstances. |
Uses less bandwidth. | A lot of changed pages could substantially slow searches during the walk. | |
Uses less temporary disk space. | Requires "if-modified-since" support on walked web server. | |