|
Syntax: -unique
This option will enable extra checking for duplicate documents. Documents
with the same content will only be stored once, even if their URLs are
different. This is accomplished by placing a unique index on the id field
of the html table and storing a hash code for the HTML source of the
document there instead
of the normal counter variable. All subsequent walks will perform hashing.
This option should only be used on an empty database since any existing
counter id's would not be proper hash codes. This option must be respecified,
if desired, after performing a database wipe with -wipe.
NOTE: Dynamic debugging insertions into the HTML source, such as the current
time, whether visible or in comments, will change the hash thereby defeating
this feature.
Copyright © Thunderstone Software Last updated: Tue Nov 6 10:58:47 EST 2007
|