|
The following urlcp settings control how links from formatted
documents are processed:
-
contentlocationasbaseurl (boolean)
Whether to interpret the Content-Location header (if
present) as the base URL for the document (can be overridden by
<base> tag). The default is on. Added in version
5.01.1249455000 20090805. -
eatlinkspace (boolean)
Whether to strip leading/trailing whitespace from links before
processing into absolute links. The default is on. Added in
version 3.01.968173351 20000905. -
refs (2, 4 or 6 list arguments)
Which HTML tag/attributes are to be considered links, images,
frames or iframes when formatting or reparenting HTML. Removing
or adding tag/attributes can remove or add them from the returned
lists of <urlinfo links> etc. This setting takes several
parallel argument lists, in the form: refs action
"flags,..." tag attr [attr2 val2]:
-
action (single value)
A single value of add, del, or set,
indicating how to apply the following arguments as a whole.
If add, the arguments are added (flags ORed) to the
existing values; if del, the arguments are deleted
(flags cleared); if set, the existing values are
cleared first and replaced with the arguments. -
flags,... (list)
Each value is a comma-separated list of one or more flags to
apply:
- link The
tag's attr value should be
considered a link. - image The value should be considered an image.
- frame The value should be considered a frame.
- iframe The value should be considered an iframe.
The above flags apply to both formatting (<urlinfo links>)
and reparenting (here). If
format or reparent is appended to a flag, the
flag applies only to that action instead. -
tag (list)
The HTML tag referred to. -
attr (list)
The attribute whose value(s) the flags apply to. -
attr2 (list, optional)
An optional second attribute that if specified, must be
present and have the value val2 for the flags to take
effect. For example, by default <INPUT SRC=...>
values are considered images, but only if the attribute
TYPE is also present with the value IMAGE.
This value can be empty or unspecified if not needed. -
val2 (list, optional)
Value for attr2. Required only if attr2 value given.
The return value is the previous setting, as a single list with
the tag, attr, attr2 and val2
arguments space-separated in each value. The return value may
given as a single refs set argument to restore the previous
settings. Also, the single argument defaults may be given
to refs set to restore the built-in default settings. Note
that HTML tags/attributes that are not currently known by the
internal parser cannot be specified. Added in version
5.01.1159397148 20060927.
This example removes treating <INPUT
SRC=... TYPE=IMAGE> values as images, and adds
<LINK SRC=...> values as both images and links:
<urlcp refs del image input src type image>
<urlcp refs add link,image link src> -
scriptstrlinks (string)
Which types of JavaScript string links (those determined from
scanning all JavaScript strings, instead of known true JavaScript
links) to return. One or more of the values none (for no
strings at all), file (for strings that resemble files),
protocol (for strings that resemble URL protocols), or
all (for all strings) may be specified. Note that script
string links are unreliable and not guaranteed to be legitimate or
even syntactically correct. This is a method of attempting to
obtain links that the JavaScript module is otherwise missing. The
strings are returned via <urlinfo strlinks>
(here). Returns previous setting.
Added in version 5.00.1087588168 20040618. Default is
protocol and file. -
scriptstrlinkabs or scriptstrlinksabs (boolean)
Whether to absolute URLs from JavaScript string links. If on (the
default) these URLs will be absolute. If off, they are left as-is
(i.e. so the caller can perform additional scans or cleanup).
Returns previous setting. Added in version 5.00.1087588646 20040618. -
urlnonprint (string)
How to treat non-printable characters (those outside the range
space through tilde) encountered in URL links of fetched pages:
-
asis
Default: leave non-printable characters alone. -
strip
Remove non-printable characters. -
encode
URL-encode non-printable characters.
Added in version 4.00.1006200000 20011119.
Copyright © Thunderstone Software Last updated: Mon Feb 18 10:28:15 EST 2013
|