gw
maintains a database that contains text from HTML pages,
links to other pages, and a list of pages yet to retrieve.
The list of pages yet to retrieve is called the ``todo'' list.
When gw
runs it inserts any specified URL into the todo list. It
then begins taking URLs from the todo list. It retrieves and stores the
HTML page and its references. Each reference not seen before is also
placed into the todo list. Processing continues until there is nothing
left in the todo list.
If gw
is killed it will finish the page it is working on and exit.
When run again with no URL it will pick up where it left off taking URLs from
the todo list.
By default gw
operates on the database in the current directory
(if there is one) or the default one as configured during installation.
This may be overridden with the -d
option discussed later.