Syntax: -unique
This option will enable extra checking for duplicate documents. Documents with the same content will only be stored once, even if their URLs are different. This is accomplished by placing a unique index on the id field of the html table and storing a hash code for the HTML source of the document there instead of the normal counter variable. All subsequent walks will perform hashing.
This option should only be used on an empty database since any existing
counter id's would not be proper hash codes. This option must be respecified,
if desired, after performing a database wipe with -wipe
.
NOTE: Dynamic debugging insertions into the HTML source, such as the current time, whether visible or in comments, will change the hash thereby defeating this feature.