|
The following urlcp settings control how or whether pages and
related URLs are fetched, such as frames and iframes:
-
encodings [add|del|set] [$encodings ...]
Sets the list of allowed content/transfer encodings for pages
fetched. The $encodings argument(s) are zero or more of
the values identity, chunked, gzip,
deflate or compress. The chunked encoding
only applies to transfer encodings; the remainder apply to both
content and transfer encodings. If the first value of the first
argument is add, the given encoding(s) will be added to the
allowed list; if del, deleted from it; if set, the
list is cleared and set to $encodings (this is the default
action if no add/del/set action is given).
The keyword all may be used to refer to all encodings, and
default may be used (with set) to re-set the default
(which is identity, chunked, gzip,
deflate and compress).
The Vortex fetch library will declare the list of encodings it
allows in Accept-Encoding and TE request headers, if
httpversion is set to 1.1 (these are 1.1 headers,
and some servers do not handle them as expected in a 1.0 request;
httpversion is 1.1 by default in version 6 and later).
It is up to the remote server to then choose encoding(s) from the
declared list(s). The content encoding(s) (if any) of the returned
document should be declared by the server in the
Content-Encoding header, and transfer encoding(s) in the
Transfer-Encoding header. Both types of encodings will be
decoded before the document is returned from <fetch> or
<urlinfo rawdoc>. If an encoding that is not allowed is
encountered, a "Disallowed Content- or Transfer-Encoding"
error is generated.
Added in version 5.01.1249073000 20090731. Returns previous list
of allowed encodings. See also the maxpgsize and
maxdownloadsize settings for how they interact with
encodings. -
fileexclude (list)
List of file trees to exclude (disallow) when fetching a local
file:// URL. The default is none (no restrictions) for
Windows, and "/dev/", "/proc/" and
"/debug/" for Unix. After fileroot is applied, if
the resulting local file path from a file:// URL has one of
these paths as a prefix, the URL will not be fetched. This can be
used to protect certain unsafe or private directories on a local
filesystem from being inadvertently walked. Does not apply to
FTP-mapped non-localhost file:// URLs. Added in version
4.02.1048785087 20030327. Aka fileexcludes. Returns
previous setting. -
fileinclude (list)
List of file trees to include (require) when fetching a local
file:// URL. The default is none (no restrictions). After
fileroot is applied, if the resulting local file path from
a file:// URL does not have one of these paths as a
prefix, the URL will not be fetched. This can be used to keep a
local filesystem walk within certain directories. Does not apply
to FTP-mapped non-localhost file:// URLs. Added in version
4.02.1048785087 20030327. Aka fileincludes. Returns
previous setting. -
filenonlocal (string)
How to handle non-localhost file:// URLs, i.e. ones
with a specific host other than empty string or
"localhost". The value can be one of:
-
off
Default: do not allow non-localhost file:// URLs. This
ensures that no FTP or UNC paths are used. -
unc
Map non-localhost file:// URLs to their UNC paths and
attempt to open as a local file. E.g. the URL
"file://myhost/mydir/myfile" would map to the file
"\\myhost\mydir\myfile" under Windows and
"//myhost/mydir/myfile" under Unix (but see
modifications under fileroot below). This allows the
behavior of web browsers that support UNC paths to be emulated
on operating systems that support UNC, for consistency with
browser views. -
ftp
Map non-localhost file:// URLs to FTP. E.g. the URL
"file://myhost/mydir/myfile" would map to the URL
"ftp://myhost/mydir/myfile" and be fetched as such.
This allows the behavior of some browsers/operating systems
that do not support UNC paths to be emulated.
Added in version 4.02.1048785087 20030327. Returns previous
setting. -
fileroot (string)
Sets the root directory to prepend to local file:// URL
paths; default none. E.g. with fileroot set to
"/docs", the URL "file://localhost/dir/file.txt"
would be read from the file
"/docs/localhost/dir/file.txt". Also applies to
non-localhost URLs when filenonlocal is set to
unc, e.g. the URL "file://myhost/mydir/myfile" is
read from the file "/docs/myhost/mydir/myfile". This
allows both localhost and non-localhost
file:// URLs to be mapped to a single directory hierarchy,
perhaps where network filesystems corresponding to individual
host(s) are mounted. Added in version 4.02.1048785087 20030327.
Returns previous setting. -
filetypes [add|del|set] [file|dir|device|symlink|other ...]
Sets the list of allowed file types for local file:// URLs.
The possible values are file for ordinary files, dir
for directories, device for devices, symlink for
symbolic links (if supported by operating system), and
other for other types (sockets etc.). If the first value
of the first argument is add, the given list will be added
to the allowed list; if del, deleted from; if set,
cleared and set (the default). The default list is
file, dir and symlink. If the file derived
from a local file:// URL is not one of these types, it is
disallowed. This prevents links to URLs like
"file://localhost/dev/zero" from hampering a walk. Added
in version 4.02.1048785087 20030327. Returns previous
setting. -
ftpactivepassivefallback (boolean)
If on (the default), FTP passive mode fetches will fall back to
active mode on failure, and vice-versa. This may help resolve a
fetch to an FTP server that does not support the current mode (or
is firewalled), i.e. in cases where ftppassive is not set
properly for the given situation. Only failures of the
PORT or PASV command, or a temporary (5nn) error
response to the main (RETR/STOR/etc.) command will
trigger the mode switch. Added in version 6.00.1304040000
20110428. Returns previous setting.
Note that if the correct mode (active or passive) is already known
in advance, it is preferable to set it from the outset via the
ftppassive setting, to avoid potential delays and/or errors
from relying on this fallback switchover. -
ftppassive (boolean)
If on (the default), FTP passive mode is used first for FTP
protocol fetches. If off, FTP active mode is used first. Passive
mode can be useful in situations where a firewall on the client
(Vortex) side of the network prevents an FTP transfer
(e.g. timeout). This is due to the nature of active-mode data
transfers, where the remote (server) side is required to initiate
a separate socket connection back to the client (even though the
client initiates the original control connection). Many firewalls
will block such incoming connections, causing the transfer to
timeout. Passive mode allows the client to initiate both the
control and data connections, which is often permitted by the
client's firewall. Added in version 5.01.1121350905 20050714.
Note: Prior to version 6.00.1304040000 20110428 this setting was
off by default. Returns previous setting.
Note that if ftpactivepassivefallback is on (the default),
the alternate mode may be used if the first mode (set by this
setting) fails. -
ftprelativepaths (boolean)
If on (the default), FTP paths are assumed to be
login-dir-relative, so the URL "ftp://host/dir/file.txt"
would be fetched with "RETR home/dir/file.txt"
instead of "RETR /dir/file.txt" (where home is the FTP
user's login directory). For most (i.e. anonymous) FTP URLs this
makes no difference, as the FTP login dir is typically at the root
of the FTP-accessible tree. However, for many FTP URLs that
require a true login, the FTP login dir is not the root dir, but
the user's home directory. Thus, with ftprelativepaths on,
the above URL would fetch "dir/file.txt" from the user's
home directory - not "/dir/file.txt" from the root dir,
where it may not exist. With ftprelativepaths off, the
user's home directory - which may be unknown or vary from user to
user - would have to be specified in the FTP URL in order to get
back to the FTP login dir.
Dirs outside the FTP login dir may still be accessed when
ftprelativepaths is on, however, by encoding an extra slash
in the URL, e.g. "ftp://host/%2Fdir/file.txt". Added in
version 6. In previous versions the setting was effectively off. -
ftpsendrelativepathsasabsolute (boolean)
If on (the default), relative FTP paths (i.e. due to
ftprelativepaths) are changed to absolute paths when sent
to the server, by prefixing the login directory (obtained with a
PWD after login). This avoids the occasional need for a
no-argument "CWD" command to go back to the login
directory (which some servers do not support), while still
supporting the functionality of ftprelativepaths (no home
dir needed in URLs). If off, the login directory is not prefixed;
i.e. the URL "ftp://host/dir/file.txt" is fetched with
"RETR dir/file.txt". This setting has no effect if
ftprelativepaths is off. Added in version 6.00.1301360000
20110328. -
getframes (boolean)
If on, text frames are fetched for framed documents. The raw
HTML returned will remain the same (the original document), but
the formatted text from <urlinfo text>
will be replaced and instead contain each frame in sequence. The
links returned by <urlinfo links> will be the list of all the
frames' links. The default is false, e.g. frames are not fetched.
Only applies to HTTP URLs. -
getiframes (boolean)
If on, inline <IFRAME> documents are fetched. The raw
HTML returned will remain the same (the original document). The
formatted text from <urlinfo text> will
also remain, except that <IFRAME> blocks will be replaced
with their referenced document text in-line. The <IFRAME>s
are removed from the iframes and links lists
returned by the urlinfo function. Only applies to HTTP
URLs. The default is false. Added in version 3.01.963000000 20000707. -
getscripts (boolean)
If on, and javascript is on, <SCRIPT SRC=...></SCRIPT>
URLs on a page will also be fetched and run if they refer to
JavaScript. If off (default), such URLs are not fetched, and only
inline <SCRIPT>...</SCRIPT> scripts are run (if
javascript is on). Returns previous value. Added in version
4.01.1023800000 20020611. -
httpversion $version
Sets the HTTP version to use for requests. The $version
argument is one of 0.9, 1.0 or 1.1. HTTP/1.0
is the default for Texis/Webinator version 5 and earlier; HTTP/1.1
is the default for Texis/Webinator version 6 and later (and is
only conditionally supported). It may be necessary to set
1.1 to fully utilize some features, e.g. content/transfer
encodings (see the encodings setting,
here). Added in version 5.01.1249039000
20090731. Returns previous version. -
ignoreanchorframes (boolean)
Whether to ignore frames and IFRAMEs that are just anchors, e.g.
src="#". These usually just contain JavaScript, and fetching
them just doubles up the content, links etc. of the parent URL.
On by default. Added in version 6. -
inputfileroot (string)
If set, all set/non-empty <input type="file"> values must
be within this local directory tree (and not contain
"../" components to get out of it), when <urlcp
domvalue "...submitContent"> or variants are called. Value(s)
that are outside this setting will cause an error such as "
Will not add form input `...' file `...' to submit content: Not in
inputfileroot directory or contains `../'", and will be treated
as empty (i.e. sent as empty value with no file). This is for
security, to ensure all to-be-uploaded files are from a known
directory. Added in version 6.00.1335222312 20120423. Default is
unset (i.e. no check is performed). Returns 1 on success, 0 on
error. -
linkprotocols [add|del|set] [$protocols|allowed ...]
Sets the list of protocols allowed to be returned in links from a
page (i.e. the links value of the urlinfo function,
here). Note that this setting does not
control what can be fetched, only the list of links returned from a
page. It can be used as a filter to remove invalid-protocol links
returned by a page. The $protocols argument(s) are a list
of zero or more values, each of which is either a recognized
protocol (see protocols below), the value unknown
for unknown protocols, or the value allowed for just
protocols permitted by the protocols setting. The default
is all protocols plus unknown.
If the first value of the first argument is add, the given
list will be added to the allowed list; if del, deleted
from; if set, cleared and set (the default). Returns
previous setting. Added in version 4.01.1029180431 20020812. -
methods [add|del|set] [$methods ...]
Sets the list of request methods allowed for page fetching
(default all). The $methods argument(s) are zero or more
of the values OPTIONS, GET, HEAD,
POST, PUT, DELETE, TRACE,
MKDIR, RENAME, SCHEDULE, COMPILE or
RUN. Not all methods are supported by all protocols;
e.g. MKDIR is only supported by FTP. If the first value of
the first argument is add, the given method list will be
added to the allowed list; if del, deleted from; if
set, cleared and set (the default). Alternately, the
default methods may be restored with set default. Returns
previous setting. Added in version 5.01.1232696000 20090123. -
netmode (string)
Sets the routines to use for page fetching. The default is
int, which uses Texis' internal routines. For Windows
versions, netmode may be set to sys, which uses the
system routines. This may allow certain authenticated sites
to be accessed, if the internal routines' NTLM authentication
is not sufficient for example. However, parallelization
and certain other features are disabled. Added in version
4.04.1068000000 20031104. -
offsiteok (boolean)
If on (default), documents that are off-site from the original URL
(e.g. redirects) will be fetched if needed. If false, such
redirects will not be fetched. -
protocols [add|del|set] [$protocols ...]
Sets the list of URL protocols allowed to be fetched.
$protocols is a list of zero or more of the values
http, ftp, gopher, javascript,
https or file. The default list is http,
ftp, gopher, javascript and
https. (Note that <urlcp javascript> must be also on
if JavaScript URLs are to work.) If the first value of the first
argument is add, the given list will be added to the
allowed list; if del, deleted from; if set, cleared
and set (the default). Returns the previous list of allowed
protocols. Added in version 4.01.1024300000 20020617.
file support added in version 4.02.1048785087 20030327. -
proxy (string)
Takes a URL as an argument. This URL will be used as the proxy
server to fetch documents. All future page fetches will go
through this server, instead of being fetched directly. Must
be an HTTP or HTTPS (in version 4.02.1048785087 20030327 and later) URL.
In version 4.04.1077500000 20040222 and later, an empty string
value will clear the proxy, i.e. turn it off.
Copyright © Thunderstone Software Last updated: Mon Feb 18 10:28:15 EST 2013
|