Miscellaneous

The following urlcp settings control miscellaneous page fetching behaviors:

  • allowbadchunkedinfo (boolean)

    If on (the default), try to allow certain bad chunked Transfer-Encoding information if encountered in a response (e.g. missing chunk size), by just passing through remaining data as-is (because chunked coding is mostly clear text anyway). If off, fail the transaction. Note that regardless of the setting's value, bad chunked information may indicate corrupt data in the response. May help recover a fetch when the server erroneously reports chunked coding in the response. Added in version 6.00.1315620000 20110909; previous versions behaved as if setting was off.

  • alarmclose (boolean) Whether to use an alarm() to terminate a blocking connection close(). The default is off. Added in version 4.04.1050700000 20030418.

  • badhdrmidok or badheadermidok (boolean) Whether to accept malformed headers in the middle (i.e. not last) of the response headers. If on, malformed headers will be discarded if followed by at least one valid header. If off, or no valid header follows, the malformed header will be considered the start of the body (which it might very well be). Added (and defaults to on) in version 5.01.1102538511 20041208. Returns previous setting.

  • checkidleconneof (boolean) If on (the default), idle connections in the Keep-Alive cache are checked for EOF (i.e. server closure) before being reused. The server may have timed out the connection while the socket was idle, which would otherwise cause the next fetch to fail. Added in version 5.01.1115130972 20050503 (default off in previous versions). Returns previous setting.

  • clearproxycache (no arguments)

    Clears the cache of "bad" (non-responsive) proxies, which are set to lower priority in PAC responses when other proxies are listed (see proxyretrydelay, here). Also clears the last PAC fetch timestamp (see pacfetchretrydelay, here). Added in version 7.05.

  • closeidleconn (no arguments) Closes any currently idle connections in the Keep-Alive cache. Returns 1 if successful, 0 on error. Added in version 5.00.1093895662 20040830.

  • defaults (no arguments) Resets all urlcp settings to their default values.

  • delaysave (boolean) Sets whether to delay the saving of output when using <submit TOFILE=$file> until the connection starts returning data. The default is off, e.g. open the file immediately (always deleting previous copy). Turning this setting on is useful when repeatedly downloading to the same file, e.g. obtaining a periodic update of a large data file. The original file will then be preserved if the new fetch fails immediately (e.g. remote server down), yet saves the disk space of a separate backup copy. Added in version 4.0.997840000 20010814.

  • domvalue $dompath $value

    Sets the value of the DOM item indicated by $dompath to $value. Note that this does not affect the JavaScript DOM, but the near-parallel page DOM. This can be used to set form input values, etc. and then obtain the submit URL and content via <urlinfo domvalue>. Added in version 5. Returns 0 on error.

  • emptyhttp09ok (boolean) Whether to accept empty HTTP/0.9 responses, i.e. a 0-byte response with no headers. Such responses are technically legal (an empty HTTP/0.9 document), but since few pre-HTTP/1.0 servers exist, are more likely indicative of a server error. If off, an error message is issued and an error is set. Added (and defaults to off) in version 5.01.1097502096 20041011. Returns previous setting.

  • linger (boolean) Sets whether to set SO_LINGER time of 4 seconds on sockets. Added (and defaults to off) in version 5.01.1105153893 20050107. Returns previous setting.

  • reparent (string) If given a full path (e.g. "/local/tree"), sets reroot reparent mode and uses that path as the local tree root. If given a full URL (e.g. "http://somesite.com/dir/page.html"), sets abs reparent mode and uses that URL as the page's URL (this is not recommended; links may become incorrect).

  • reparentimg (boolean) If true (default), image links will be reparented; if false, they will not. Only significant if reparentmode is not off.

  • reparentmode (string) The returned HTML from a page will be reparented: all the links will be changed in the raw document returned. How the links are modified depends on the mode:

    • abs or 1 Make all links absolute. With this mode, a page can be fetched from a remote site and the returned document placed directly in a local source tree, and even relative links will correctly point to the original locations. If a URL is set with reparent, the page is reparented as if it were fetched from there, instead of its actual location (the default). (Setting a URL is not recommended, as the links may become incorrect.)

    • reroot or 2 Re-path same-site links as if the entire remote site were being copied locally to a subtree rooted at the URL path given with reparent. For example, with a reparent path of /local/tree, if the URL http://somesite.com/dir/page.html is fetched, it is assumed it will be saved to /local/tree/dir/page.html. Thus a link such as /top/list.html will become /local/tree/top/list.html. The link ../upone.html would become /local/tree/upone.html. If no path is set with reparent, links become relative, as if the root were /.

    • mirror or 3 Make all links absolute, URL-encode them, and prefix the reparent URL.

    • relatedfiles or 4

      For an email message, change all internal links (to other parts of the message) to their safe filenames, as same-dir relative links. All other (external) links will be made absolute. This mode is used internally by the mimeEntityGetBody() function (here) when reparenting. Note: since it requires additional parsed email message information, it cannot currently explicitly be used by Vortex scripts.

    • hideexternal or 5

      Change all external links - those referring to outside the page's current directory or below - to have a prefix of "thismessage:", to prevent their access. This can be used to hide/disable external references in HTML email message bodies during web display, after the HTML has been message-reparented by the mimeEntityGetBody() (here) function.

    • off or 0 Turn off all reparenting. Added in version 3.01.968705387 20000911. Default.
    Note that the reparent and reparentmode settings do not affect the links returned by <urlinfo links>.

  • sendemptycontent (boolean) Whether to set Content-Length: 0 for empty requests. Some servers will time out, expecting an EOF from the client, if an empty request is sent with no Content-Length. Added (and defaults to on) in version 5.01.1097006042 20041005. Returns previous setting.

  • shutdownwr (boolean) Turns on or off the use of shutdown(SHUT_WR) on HTTP or Gopher sockets when all data has been sent and the connection is not to be re-used (e.g. Keep-Alive has expired or is not in use). This sends an EOF to the server to indicate that the client has finished sending data. Some broken servers may expect such an EOF even if Content-Length is set properly in the request, and may thus time out the request waiting for one. (Note that for shutdown() to actually be used, it may be necessary to disable Keep-Alive via <urlcp maxconnrequests 1>.) Added (and defaults to on) in version 5.01.1105300267 20050109. Returns previous setting.

  • urlcanonslash (boolean) Whether to canonicalize backslashes ("\") to forward slashes ("/") in URLs. On by default. Turning off may impair URL parsing.

  • urlcollapseslashes (boolean)

    Whether to collapse multiple forward slashes (e.g. "//") to a single forward slash in the path part of URLs. Note: does not affect the double-slash that immediately follows the protocol plus colon in some URL protocols. Off by default. Added in version 7.03.1434400000 20150615.


DIAGNOSTICS
urlcp returns 1 on success, or 0 or nothing on error, except as noted under specific options.


EXAMPLE

<urlcp "maxpgsize" "1MB">   <urlcp "timeout" 300>
<fetch "http://www.somesite.com/bigpage.html">


CAVEATS
The urlcp function was added Mar. 26 1997. Various settings were added later.


SEE ALSO
fetch, submit, urlinfo, nslookup


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.