- HTTP HEAD
- An HTTP method that requests only response headers, not the body — used for efficient existence checks without downloading the full resource.
- HTTP GET
- An HTTP method that requests both headers and body — used by the crawler on the seed page to fetch HTML and extract links.
- Status code 2xx
- Success responses (200 OK, 201 Created, 204 No Content) — the link is reachable and working.
- Status code 3xx
- Redirection responses (301 Moved Permanently, 302 Found) — the link points elsewhere; the crawler follows automatically.
- Status code 4xx
- Client error responses — 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 410 Gone. Indicates the link is broken or restricted.
- Status code 5xx
- Server error responses — 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout. Indicates a server-side problem at the target.
- Redirect chain
- A sequence of 3xx responses leading to a final 2xx or 4xx — long chains slow page load and may trigger SEO 'redirect hop' penalties.
- 404 Not Found
- The most common 'broken link' status — the target server responded but the requested resource does not exist at that URL.
- robots.txt
- A file at /robots.txt that tells crawlers which paths to avoid; this tool does not consult robots.txt because it acts on user-supplied seed URLs only.
- Crawl depth
- How many link-hops deep the crawler follows from the seed URL — depth 1 means only direct outbound links from the seed.
- Soft 404
- A page that returns HTTP 200 but contains a 'not found' message in the body — these are not flagged as broken by status-only checkers.
- Crawl budget
- The total number of URLs a crawler will fetch in one job — 200 here, to keep server resources fair across users.