Note: This documentation is for an old version of Webinator. The latest documentaion is here.

Plugin Split

A group of settings that control whether and how to split anytotx plugin output into multiple sub-URLs in the table. Non-text files such as PDFs that anytotx processes are often very large or composed of sub-files. The Plugin Split setting allows these files to be split up for finer-grain searching. Split files will cause more than one URL to be entered in the html table (and thus potential search results) for the original URL. Such subsequent URLs will have an anchor appended to distinguish them from each other; usually this is the sub-file name, but it may be generic eg. ``#part5'' if there are no sub-files. Note: setting any of these settings may affect the ability of Refresh-type rewalks to complete successfully (New walks operate as usual).

The Depth setting controls what depth to split anytotx output at. Each time a multi-file archive is unpacked by anytotx, the depth increases. Depth 0 (the default) means split at the top level, ie. do not split. Depth 1 would therefore insert each file of a ZIP file as a separate URL in the table.

The Bytes setting controls how many bytes to split each part into. The default of 0 does not split at all. This is useful for large monolithic files that have no detectable sub-file or page structure.

The AtPage setting controls whether to force the Bytes-controlled splitting to occur at a page boundary (a Ctrl-L). Checking this may make each part arbitrarily larger than the Bytes setting, as it may extend to the next page break. With this setting unchecked, a part may be up to 50% larger than the Bytes setting, as the page-break check will only go that far over the limit.

The Pages setting controls how many pages to group in a part. The default of 0 does not split at all. If both Pages and Bytes are set, the first limit reached is used for each part. Eg. setting Pages to 10 and Bytes to 100000 would break at 10 pages or 100KB, whichever comes first. This is useful to catch page-bounded documents like PDFs but without generating huge text for non-paged documents.

Plugin Split was added in version 4.03.1049838346 Apr 8 2003.


Copyright © Thunderstone Software     Last updated: Tue Nov 6 10:58:37 EST 2007
Copyright © 2024 Thunderstone Software LLC. All rights reserved.