On the first access to a site the file /robots.txt will be retrieved, if its exists. Settings there will be respected. Anything in the todo list that is disallowed by robots.txt will be discarded.
/robots.txt
robots.txt