Database Usage

Note: This documentation is for an old version of Webinator. The latest documentaion is here.

Database Usage

gw maintains a database that contains text from HTML pages, links to other pages, and a list of pages yet to retrieve. The list of pages yet to retrieve is called the ``todo'' list.

When gw runs it inserts any specified URL into the todo list. It then begins taking URLs from the todo list. It retrieves and stores the HTML page and its references. Each reference not seen before is also placed into the todo list. Processing continues until there is nothing left in the todo list.

If gw is killed it will finish the page it is working on and exit. When run again with no URL it will pick up where it left off taking URLs from the todo list.

By default gw operates on the database in the current directory (if there is one) or the default one as configured during installation. This may be overridden with the -d option discussed later.