Robots.txt Generator

  6 min read · 1358 words

Build robots.txt files visually with user-agent rules, path directives, sitemap URLs, and crawl-delay settings. Block AI crawlers, use common presets - all running privately in your browser.

Quick Presets

User-Agent Rules

Sitemap URLs

Generated robots.txt

# Click "Generate robots.txt" in the Builder tab

Syntax Validation

i Generate your robots.txt first to see validation results.

Directive Reference

User-agent: *

Specifies which crawler the following rules apply to. Use * for all crawlers, or a specific name like Googlebot, GPTBot, etc. Each rule group must start with a User-agent directive.

Disallow: /path/

Tells the specified user-agent not to crawl the given path. Disallow: / blocks everything. Disallow: (empty value) allows everything. Paths are case-sensitive and relative to the root.

Allow: /path/

Explicitly allows crawling of a path, overriding a broader Disallow. Useful for exceptions like allowing /private/public-page.html while blocking /private/. Supported by Google, Bing, and most major crawlers.

Crawl-delay: 10

Requests that the crawler wait the specified number of seconds between requests. Helps reduce server load. Supported by Bing, Yandex, and others. Google ignores this directive - use Google Search Console to control crawl rate for Googlebot.

Sitemap: https://example.com/sitemap.xml

Tells search engines where to find your XML sitemap. Must be an absolute URL. You can list multiple sitemaps. This directive is not tied to any user-agent and applies globally.

# Comment

Lines starting with # are comments and ignored by crawlers. Use them to document your rules and explain why certain paths are blocked or allowed.

Common AI Crawlers

GPTBot

OpenAI's web crawler used to train and improve AI models. Blocking this prevents your content from being used in GPT training data.

ChatGPT-User

OpenAI's crawler used when ChatGPT users browse the web via the "Browse with Bing" feature. Separate from GPTBot.

Claude-Web

Anthropic's web crawler. Blocking this prevents your content from being accessed by Claude's web browsing capabilities.

Google-Extended

Google's crawler for AI training (Bard/Gemini). Blocking this does not affect Google Search indexing - it only prevents use in Google's AI products.

CCBot

Common Crawl's bot that builds a publicly available web archive. Many AI companies use Common Crawl data for training.

Bytespider

ByteDance's (TikTok parent company) web crawler, used for various purposes including AI training.

About robots.txt

The robots.txt file is one of the simplest yet most important files for managing how search engines and web crawlers interact with your website. Placed at the root of your domain, it provides instructions to bots about which areas of your site they should and should not access.

How robots.txt Works

When a well-behaved web crawler (like Googlebot) visits your site, the first thing it does is request /robots.txt. The file contains one or more rule groups, each starting with a User-agent directive followed by Allow and Disallow rules. The crawler matches its own user-agent string and follows the relevant rules.

Important Limitations

robots.txt is advisory, not enforceable - malicious bots can ignore it entirely
Blocking a page does not remove it from search results if other pages link to it
For true access control, use authentication or server-level blocking
robots.txt is publicly accessible - do not use it to hide sensitive URLs

Blocking AI Crawlers

With the rise of AI language models, many website owners want to prevent their content from being used in AI training. You can add specific user-agent rules for known AI crawlers like GPTBot, ChatGPT-User, Claude-Web, Google-Extended, CCBot, and Bytespider to block them while still allowing search engines to index your site.

Hacker News Discussions

Show HN: Robots.txt generator for blocking AI crawlers 9 points · 1 comments
Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn) 373 points · 277 comments
Show HN: Quixotic – a tool for wasting bot and LLM scraper time 6 points · 2 comments

Source: Hacker News

Research Methodology

This robots txt generator tool was built after analyzing search patterns, user requirements, and existing solutions. We tested across Chrome, Firefox, Safari, and Edge. All processing runs client-side with zero data transmitted to external servers. Last reviewed March 19, 2026.

Performance Comparison

Robots Txt Generator speed comparison chart

Benchmark: processing speed relative to alternatives. Higher is better.

Video Tutorial

Robots.txt Explained

PageSpeed Performance

Performance

100

Accessibility

100

Best Practices

SEO

Measured via Google Lighthouse. Single HTML file with zero external JS dependencies ensures fast load times.

Tested on Chrome 134.0.6998.45 (March 2026)

Live Stats

Page loads today

Active users

Uptime

99.9%

Community Questions

How to write a robots.txt file? 14 answers · tagged: robots.txt, seo, web-crawling
Should I block JavaScript files in robots.txt? 8 answers · tagged: robots.txt, javascript, seo
robots.txt vs meta robots tag? 11 answers · tagged: robots.txt, meta-robots, seo

How This Tool Works

The Robots.txt Generator processes your inputs in real time using JavaScript running directly in your browser. There is no server involved, which means your data stays private and the tool works even without an internet connection after the page has loaded.

When you provide your settings and click generate, the tool applies its internal logic to produce the output. Depending on the type of content being generated, this may involve template rendering, algorithmic construction, randomization with constraints, or format conversion. The result appears instantly and can be copied, downloaded, or further customized.

The interface is designed for iterative use. You can adjust parameters and regenerate as many times as needed without any rate limits or account requirements. Each generation is independent, so you can experiment freely until you get exactly the result you want.

Features and Options

This tool offers several configuration options to tailor the output to your exact needs. Each option is clearly labeled and comes with sensible defaults so you can generate useful results immediately without adjusting anything. For advanced use cases, the additional controls give you fine-grained customization.

Output can typically be copied to your clipboard with a single click or downloaded as a file. Some tools also provide a preview mode so you can see how the result will look in context before committing to it. This preview updates in real time as you change settings.

Accessibility has been considered throughout the interface. Labels are associated with their inputs, color contrast meets WCAG guidelines against the dark background, and keyboard navigation is supported for all interactive elements.

Real World Use Cases

Developers frequently use this tool during prototyping and development when they need quick, correctly formatted output without writing throwaway code. It eliminates the context switch of searching for the right library, reading its documentation, and writing a script for a one-off task.

Content creators and marketers find it valuable for producing assets on tight deadlines. When a client or stakeholder needs something immediately, having a browser-based tool that requires no installation or sign-up can save significant time.

Students and educators use it as both a practical utility and a learning aid. Generating examples and then examining the output helps build understanding of the underlying format or standard. It turns an abstract specification into something concrete and explorable.

Frequently Asked Questions

What is a robots.txt file?▾

A robots.txt file is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that tells web crawlers and bots which pages or sections of the site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol standard.

How does robots.txt work?▾

When a web crawler visits your site, it first checks for a robots.txt file at the root. The file contains rules specifying which user-agents (crawlers) can access which paths. Well-behaved crawlers follow these rules, but compliance is voluntary - malicious bots may ignore them entirely.

Can robots.txt block AI crawlers?▾

Yes. You can add User-agent rules for AI crawlers like GPTBot (OpenAI), ChatGPT-User, Claude-Web (Anthropic), Google-Extended (Google AI), CCBot (Common Crawl), and Bytespider (ByteDance) with Disallow: / to block them from crawling your content for AI training.

What is the difference between Allow and Disallow?▾

Disallow tells crawlers they should not access a specified path. Allow explicitly permits access to a path, which is useful for overriding a broader Disallow rule. For example, you can disallow /private/ but allow /private/public-page.html.

What is crawl-delay?▾

Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between requests. It helps reduce server load from aggressive crawlers. Note that Google does not support crawl-delay - use Google Search Console instead to control Googlebot's crawl rate.

Where do I put the robots.txt file?▾

The robots.txt file must be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt. It only applies to the domain and protocol where it is hosted. Subdomains need their own separate robots.txt files.

Does robots.txt hide pages from Google?▾

Not exactly. Blocking a page in robots.txt prevents Google from crawling it, but if other pages link to it, Google may still index the URL and show it in search results without a description. To fully prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header instead.

Can I include a sitemap in robots.txt?▾

Yes. Adding a Sitemap directive (e.g., Sitemap: https://example.com/sitemap.xml) in your robots.txt file helps search engines discover and crawl your sitemap. You can include multiple Sitemap directives for different sitemaps.

Package	Weekly Downloads	Version
nanoid	1.2M	5.0.4
crypto-random-string	245K	5.0.0

Frequently Asked Questions

Q: What is a robots.txt file?

Q: How does robots.txt work?

When a web crawler visits your site, it first checks for a robots.txt file at the root. The file contains rules specifying which user-agents (crawlers) can access which paths. Crawlers are expected to follow these rules, but compliance is voluntary , malicious bots may ignore them.

Q: Can robots.txt block AI crawlers?

Q: What is the difference between Allow and Disallow?

Q: What is crawl-delay?

Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between requests. It helps reduce server load from aggressive crawlers. Note that Google does not support crawl-delay , use Google Search Console instead. Bing, Yandex, and other crawlers do support it.

Q: Where do I put the robots.txt file?

Q: Does robots.txt hide pages from Google?

Not exactly. Blocking a page in robots.txt prevents Google from crawling it, but if other pages link to it, Google may still index the URL (showing it in search results without a description). To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header instead.

Robots.txt Generator

Quick Presets

User-Agent Rules

Sitemap URLs

Generated robots.txt

Syntax Validation

Directive Reference

Common AI Crawlers

About robots.txt

How robots.txt Works

Important Limitations

Blocking AI Crawlers

Hacker News Discussions

Research Methodology

Performance Comparison

Video Tutorial

PageSpeed Performance

Live Stats

Community Questions

How This Tool Works

Features and Options

Real World Use Cases

Frequently Asked Questions

npm Ecosystem

Our Testing

Frequently Asked Questions

Q: What is a robots.txt file?

Q: How does robots.txt work?

Q: Can robots.txt block AI crawlers?

Q: What is the difference between Allow and Disallow?

Q: What is crawl-delay?

Q: Where do I put the robots.txt file?

Q: Does robots.txt hide pages from Google?

Q: Can I include a sitemap in robots.txt?

About This Tool