The 404 Checker User Agent
Whenever one of the scripts is used on this website (either to check HTTP status codes, headers or a full link check), the script identifies itself to the server that it is connecting to. It does this by setting the "User Agent" directive in the header of the request to the server.
The User Agent for all scripts run from this site is "404 Checker".
This means that the destination server knows who is connecting to it and can handle any requests specifically tailored to that agent should it wish.
Additionally any server can preempt requests from a particular user agent by means of a robots.txt file which contains details of the Robots Exclusion Protocol for that server (or website). A Robot Exclusion file can specify what user agents can and can not access specific areas of a website.
Both the HTTP Status Checker and the Full Header Checker check for exclusion the target domain's robots.txt file (if it exists). The result of the check will be cached for 24 hours, meaning that if a domain is blocked then it will remain so for at least 24 hours even if the robots exclusion policy for that domain changes.
The Link Checker checks the robots.txt for both target page and every link that is checked. Pages that are blocked by the robots exclusion policy can not be checked, neither can individual links that point to resources that are blocked by the robots exclusion policy.
If you want the Link Checker to access your site please allow (or do not exclude) the user agent "404 Checker" in your robots.txt file.
If you do not want the Link Checker to access your site please exclude the user agent "404 Checker" in your robots.txt file.