urlutil - URL/network utility

SYNOPSIS

<urlutil $action [$arg ...]>


DESCRIPTION
The urlutil function provides URL and other network-related utility functions. The $action argument determines what it does:

  • abs $absurl $relurl or absurl $absurl $relurl Makes URLs absolute (fully specified). The $absurl values are one or more absolute page URLs. The $relurl values are corresponding links - relative or not - from those page(s). For each $relurl value, its absolute value is returned, as if it were a link on that page. If there are fewer $absurl values than $relurl values, the last $absurl value is re-used. The protocol and hostname (if any) in each returned value will be lowercase.

  • charsetcanon $charset Returns canonical name for charset name $charset, according to current Charset Config file (here). Can be used to map charset aliases to canonical names.

  • charsetconv $buf $from [$to] Converts text buffer $buf from charset $from to charset $to. The default for $to if unspecified or empty is the current <urlcp charsettxt> setting. Some character sets may require the use of an external charset converter (the default is iconv, see <urlcp charsetconverter> to change it), which is automatically executed when needed. Added in version 5.00.1090598954 20040723.

  • charsetdetect $buf Returns guess at charset for text buffer $buf, or "Unknown" if charset unknown. Only limited charset detection is supported, primarily UTF-8, UTF-16BE/UTF-16LE, and all-7-bit ISO-8859-1. Added in version 7.02.1398457000 20140425.

  • filepath $u Takes $u, which must be a file:// URL, and returns the local file path that would be used to read the file, as determined by the current <urlcp fileroot> etc. settings.

  • pacinit

    Initializes proxy auto-config by fetching PAC script (if configured, here) and running it. Returns 1 if successful, 0 if not. The error from the fetch, messages from the fetch and script execution, and the body of the script (if fetched) are available afterwards via <urlinfo>. If no PAC script nor URL is configured, or the script was already initialized, no action is taken, and 1 (success) is returned.

    Calling <urlutil pacinit> when using a PAC script is not necessary: the PAC script is automatically fetched and run when needed, i.e. at the first <fetch> or <submit>, and any messages at PAC initialization are reported. However any PAC failure during such automatic initialization merely translates into a Proxy auto-config error for the <fetch>. The <urlutil pacinit> action provides a way to get more detailed information about the PAC script, if desired for diagnostic purposes.

  • split $u $part Splits a URL into parts. The $u value is the URL to split. The $part value is a single part to return. The part can be any of protocol, user, pass, authority, host, hostIsIPv6, port, path, type, query, anchor, or allpartnames.

    In Texis version 8 and later, authority and hostIsIPv6 were added. The authority part is a composite/alias of user, pass, host, and port: it is the part of the URL after the trailing // of the protocol and before the path, including all separators therein. Thus if present, it contains the host (with any IPv6 brackets), optional user/pass info, and optional port (with colon). The hostIsIPv6 value is 1 if the host looks like a bracketed IPv6 address - the host value will have the brackets stripped then - or 0 if not; in version 8.00.1637010861 20211115 and later, it is a long value, in earlier versions, a string.

    In version 8.00.1637010861 20211115, user and pass support was added, and allpartnames was added. Also in this version, support for multiple parts in $part was removed (now gives an error message). This allows a missing part (zero return values) to be distinguished from a present but empty part (one empty string return value). In previous versions, multiple parts could be requested, and thus the return values were in sync with $part, which required missing part(s) to be returned as empty string instead; user/pass were also always silently returned as empty. allpartnames will return a list of the names of the zero or more part(s) that are present in the URL.

  • sslcertificate $pem tostring Parses an SSL certificate string buffer $pem (in PEM format). The tostring sub-action returns a human-readable string version of the certificate, with subject, issuer, expiration etc. printed. This can be used to view a server certificate returned from <urlinfo sslservercertificate>.

Several actions take inet style argument(s). This is an IPv4 or IPv6 address string, optionally followed by a netmask.

For IPv4, the format is dotted-decimal, i.e. N[.N[.[N.N]]] where N is a decimal, octal or hexadecimal integer from 0 to 255. If x < 4 values of N are given, the last N is taken as the last 5-x bytes instead of 1 byte, with missing bytes padded to the right. E.g. 192.258 is valid and equivalent to 192.1.2.0: the last N is 2 bytes in size, and covers 5 - 2 = 3 needed bytes, including 1 zero pad to the right. Conversely, 192.168.4.1027 is not valid: the last N is too large.

An IPv4 address may optionally be followed by a netmask, either of the form /B or :IPv4, where B is a decimal, octal or hexadecimal netmask integer from 0 to 32, and IPv4 is a dotted-decimal IPv4 address of the same format described above. If an :IPv4 netmask is given, only the largest contiguous set of most-significant 1 bits are used (because netmasks are contiguous). If no netmask is given, it will be calculated from standard IPv4 class A/B/C/D/E rules, but will be large enough to include all given bytes of the IP. E.g. 1.2.3.4 is Class A which has a netmask of 8, but the netmask will be extended to 32 to include all 4 given bytes.

In version 8 and later, IPv6 addresses are supported as well. These are given in standard IPv6 hex format, i.e. H:H:H:H where H is a 16-bit hexadecimal number, with :: supported for a single span of zero bits, as per canonical IPv6 text representation.

An IPv6 address may optionally be followed by a netmask, of the form /B, where B is a decimal, octal or hexadecimal netmask integer from 0 to 128. If no netmask is given, it defaults to the host-only network (i.e. 128).

In version 7.07.1554395000 20190404 and later, error messages are reported.

The inet actions were added in version 5.01.1112986377 20050408, and include the following (see also the SQL equivalents):

  • inetabbrev $inet

    Returns a possibly shorter-than-canonical representation of $inet, where trailing zero byte(s) of an IPv4 address may be omitted. All bytes of the network, and leading non-zero bytes of the host, will be included. E.g. <urlutil inetabbrev "192.100.0.0/24"> returns 192.100.0/24. The /B netmask is included, except if (in version 7.07.1554840000 20190409 and later) the network is host-only (i.e. netmask is the full size of the IP address). Empty string is returned on error.

  • inetcanon $inet

    Returns canonical representation of $inet. For IPv4, this is dotted-decimal with all 4 bytes. For IPv6, this is 8 16-bit hexadecimal integers (no leading zeroes), colon-separated, possibly with a :: for zero bits. The /B netmask is included, except if (in version 7.07.1554840000 20190409 and later) the network is host-only (i.e. netmask is the full size of the IP address). Empty string is returned on error.

  • inetnetwork $inet Returns string IP address with the network bits of $inet, and the host bits set to 0. Empty string is returned on error.

  • inethost $inet Returns string IP address with the host bits of $inet, and the network bits set to 0. Empty string is returned on error.

  • inetbroadcast $inet Returns string IP broadcast address for $inet, i.e. with the network bits, and host bits set to 1. Empty string is returned on error.

  • inetnetmask $inet Returns string IP netmask for $inet, i.e. with the network bits set to 1, and host bits set to 0. Empty string is returned on error.

  • inetnetmasklen $inet Returns integer netmask length of $inet. -1 is returned on error.

  • inetcontains $inetA $inetB Returns 1 if $inetA contains $inetB, i.e. every address in $inetB occurs within the $inetA network. 0 is returned if not, or -1 on error. Note that an IPv4 address is not considered to be contained within the equivalent IPv4-mapped IPv6 address, nor vice-versa (e.g. ::ffff:1.2.3.4 is considered different from 1.2.3.4). To treat IPv4 addresses the same as their IPv4-mapped IPv6 equivalents, promote both arguments to IPv6 with inetToIPv6 (here).

  • inetclass $inet Returns class of $inet, e.g. A, B, C, D, E or classless if a different netmask is used (or the address is IPv6). Empty string is returned on error.

  • inet2int $inet

    Returns integer representation of IP network/host bits of $inet (i.e. without netmask); useful for compact storage of address as integer(s) instead of string. Returns a varint with 1 value for IPv4 addresses, 4 for IPv6 addresses, or 0 values on error (i.e. return compares equal to empty string on error). Note that in version 7 and earlier, a single int was always returned, with -1 for error (or 255.255.255.255).

  • int2inet $i Returns inet string for integer $i taken as an IP address. Since no netmask can be stored in the integer form of an IP address, the returned IP string will not have a netmask. Empty string is returned on error.

  • inetToIPv4 $inet

    Converts $inet to IPv4 (including netmask), iff IPv4-mapped IPv6. Returns the equivalent IPv4 address for $inet iff it is an IPv4-mapped IPv6 address; e.g. ::ffff:1.2.3.4 would return 1.2.3.4. Otherwise, returns canonical version of $inet iff it is some other IPv6 address; e.g. 2000::a:000b:c:d would return 2000::a:b:c:d. Otherwise returns empty string (i.e. on error). May be useful when storing both IPv4 and IPv6 addresses in a common compact int(4) field from inet2int, in order to recover original IP family format on display (after int2inet reconversion). Added in version 8.

  • inetToIPv6 $inet

    Converts $inet to IPv4-mapped IPv6 (including netmask), iff IPv4. Returns the equivalent IPv4-mapped IPv6 address for $inet iff it is IPv4; e.g. 1.2.3.4 would return ::ffff:1.2.3.4. Otherwise, returns canonical version of $inet iff it is IPv6; e.g. 2000::a:000b:c:d would return 2000::a:b:c:d. Otherwise returns empty string (i.e. on error). May be useful when storing both IPv4 and IPv6 addresses in a common compact int(4) field from inet2int, in order to convert potential IPv4 addresses to IPv6 before inet2int conversion. Added in version 8.

  • inetAddressFamily $inet

    Returns IP address family for $inet: IPv4 iff IPv4 address, IPv6 iff IPv6 address, otherwise empty string. Added in version 8.


EXAMPLE

<urlutil abs "http://example.com/dir/page.html" "other.html">

The return value in $ret would be http://example.com/dir/other.html.


CAVEATS
The urlutil function was added in version 3.0.957600000 20000505.


SEE ALSO
fetch, urlinfo


Copyright © Thunderstone Software     Last updated: Oct 24 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.