UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

Robots.txt Tester

Test whether a specific URL is allowed or blocked for a given user-agent according to a robots.txt file.

About Robots.txt Tester

The Robots.txt Tester lets you paste any robots.txt content and test whether specific URLs are allowed or disallowed for a given user-agent (e.g. Googlebot, Bingbot, or *). The tool parses the robots.txt directives — including Allow, Disallow, wildcard patterns, and Crawl-delay — and evaluates whether the target URL matches any rule for the specified agent, following the Google robots.txt specification for precedence and pattern matching. This is an essential tool for SEO audits, diagnosing crawl blocks, verifying sitemap inclusion, and ensuring that important pages are not accidentally blocked from search engine spiders.

Why use Robots.txt Tester

Tests specific URL + user-agent combinations against robots.txt rules.
Shows which specific rule matched and why the URL is allowed or blocked.
Handles Allow/Disallow precedence and wildcard (* and $) patterns correctly.
Instant result — no server query needed, works with any robots.txt content.
Tests specific URL + user-agent combinations against robots.txt rules instantly.
Handles Allow/Disallow precedence and wildcard (* and $) patterns correctly per RFC 9309.

How to use Robots.txt Tester

Paste your robots.txt content into the first text area.
Enter the URL path you want to test (e.g. /blog/post-1).
Enter the user-agent to check (e.g. Googlebot or *).
Click Test — the result shows Allowed or Blocked with the matching rule.
Paste your robots.txt content into the first text area — the entire file including all User-agent groups.
Enter the URL path you want to test (e.g. /blog/post-1 or /admin/dashboard).
Enter the user-agent name to check against (e.g. Googlebot, Bingbot, GPTBot, or *).

When to use Robots.txt Tester

Verifying important pages (blog posts, product pages) are not blocked from Googlebot.
Diagnosing why a page is not being indexed despite being published.
Testing robots.txt rules before deploying a new configuration.
Auditing robots.txt for accidental blocks on crawl-critical paths.
Testing robots.txt rules before deploying a new configuration to production.
Confirming AI crawler directives (GPTBot, ClaudeBot, CCBot) match your stated policy.

Examples

Standard SEO-friendly robots.txt

Input: User-agent: * Disallow: /admin/ Disallow: /private/ Allow: / Sitemap: https://example.com/sitemap.xml Test URL: /blog/post-1 User-agent: Googlebot

Output: Result: ALLOWED ✓ Matched rule: Allow: / (line 4) Reason: No specific Disallow matches /blog/post-1; explicit Allow grants access.

Block AI crawlers, allow search

Input: User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: * Allow: / Test URL: /article/example User-agent: GPTBot

Output: Result: BLOCKED ✗ Matched rule: Disallow: / (line 2) Reason: GPTBot has its own group with Disallow: / which blocks all paths.

Wildcard pattern test

Input: User-agent: * Disallow: /*.pdf$ Allow: / Test URL: /files/report.pdf User-agent: *

Output: Result: BLOCKED ✗ Matched rule: Disallow: /*.pdf$ (line 2) Reason: URL ends in .pdf and matches the wildcard pattern.

Tips

Always declare your sitemap with 'Sitemap: https://example.com/sitemap.xml' at the top or bottom — search engines pick it up regardless of user-agent group.
Use 'Disallow: /' under a specific user-agent like GPTBot to block AI training while allowing search engines.
Wildcards (*) match any character sequence; $ anchors to URL end. 'Disallow: /*.pdf$' blocks all PDFs but not PDF query parameters.
Test each major bot — Googlebot, Bingbot, AppleBot, GPTBot — separately. Different bots may match different groups.
Avoid blocking critical CSS/JS via robots.txt — Googlebot needs them to render pages and judge mobile-friendliness.
Robots.txt only suggests crawler behavior — it does not enforce it. Use authentication or noindex meta tags for true privacy.
Keep robots.txt under 500KB; Google ignores anything past that size limit per the official spec.

Frequently Asked Questions

What is robots.txt?▾

robots.txt is a text file at the root of a domain that tells crawlers which pages they are allowed or not allowed to request. It follows the Robots Exclusion Protocol standard.

Does robots.txt prevent a page from being indexed?▾

No. Blocking crawlers via Disallow prevents the page from being fetched, but the URL may still appear in search results if other pages link to it. Use 'noindex' meta tags to prevent indexing.

How does the Allow/Disallow precedence work?▾

The most specific matching rule wins. If two rules match, the longer (more specific) rule takes precedence. If both are the same length, Allow takes precedence over Disallow.

Does the tester fetch the actual robots.txt from a URL?▾

No. You paste the robots.txt content directly. The evaluation is performed entirely in your browser against the pasted content.

What do * and $ wildcards mean in robots.txt?▾

* matches any sequence of characters. $ matches the end of the URL path. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf.

Explore the category

Glossary

robots.txt directives: The set of instructions in a robots.txt file: User-agent, Allow, Disallow, Crawl-delay, and Sitemap. Each directive has a name and value separated by a colon.
User-agent: An identifier string declared in robots.txt that specifies which crawler the following rules apply to. '*' matches all crawlers; specific names (Googlebot, Bingbot) override the wildcard.
Allow: A directive that explicitly permits a URL pattern to be crawled, often used to override a broader Disallow rule. Google honors Allow; some older crawlers do not.
Disallow: A directive that tells the matching user-agent not to fetch URLs matching the pattern. Empty Disallow value (Disallow:) is a no-op.
Sitemap declaration: A line of the form 'Sitemap: https://example.com/sitemap.xml' that tells crawlers where to find the XML sitemap. Independent of User-agent groups.
Wildcard (*): A pattern character that matches any sequence of characters in a URL path. Google supports * in both Allow and Disallow values.
End-of-URL ($): A pattern character that anchors the match to the end of the URL path. 'Disallow: /*.pdf$' blocks .pdf URLs but allows /file.pdf?query=1.
Crawl-delay: A directive specifying minimum seconds between successive crawler requests. Honored by Bing and Yandex but ignored by Google (which throttles automatically).
RFC 9309: The IETF standard published in 2022 that formalized the robots.txt protocol after decades of de-facto implementation. Defines parsing, precedence, and limits.