Tooldit
BlogAboutContact
Browse Tools
HomeAll ToolsSEO ToolsRobots.txt Generator

Robots.txt Generator

Generate a valid robots.txt with smart Allow/Disallow defaults, a sitemap reference, crawl-delay support and per-bot controls for Googlebot, Bingbot and Yandex. Free, instant, no signup.

Search engine bots

SEO Title GeneratorMeta Description GeneratorAI SEO AuditAll SEO Tools

What Is robots.txt?

robots.txt is a plain-text file that lives at the root of your domain and tells search engine crawlers which URLs they're allowed to fetch. It's the first thing Googlebot, Bingbot and other crawlers request when they visit your site. A clean, well-formed robots.txt protects private routes, points crawlers at your sitemap, and keeps your crawl budget focused on the pages that actually matter for SEO.

Our free robots.txt generator writes a valid file in seconds. It applies smart defaults for common admin and system routes, lets you allow or block specific crawlers, and automatically adds your sitemap reference.

The format was proposed by Dutch engineer Martijn Koster in 1994 as the Robots Exclusion Protocol, and despite being 30+ years old it remains the de-facto standard. Every major search-engine crawler — Googlebot, Bingbot, DuckDuckBot, YandexBot, Baiduspider — fetches /robots.txt before crawling anything else on a domain. The file uses simple User-agent: and Disallow: directives plus optional Allow: overrides, wildcard patterns, and Sitemap: pointers. Lines starting with # are comments.

A critical distinction many site owners miss: robots.txt controls crawling, not indexing. If you disallow a page in robots.txt but another site links to that URL, Google can still index the URL (showing the title but no description, since it never crawled the content). To truly keep a page out of search results, use a noindex meta tag (which requires Google to crawl the page once to see the tag). Use robots.txt for crawl-budget control and noindex for search-visibility control.

How to Use the Robots.txt Generator

  1. Enter your website URL (for example https://example.com).
  2. Optionally list any Allow paths (one per line) — these are public routes you explicitly want indexed.
  3. List any Disallow paths you want blocked — admin, dashboard, internal search, etc.
  4. Add a Sitemap URL (defaults to /sitemap.xml) and an optional Crawl delay in seconds.
  5. Toggle Googlebot, Bingbot and Yandex on or off, then click Generate.
  6. Copy or download the generated robots.txt and upload it to the root of your site.

Smart Defaults Explained

When the Smart defaults option is on, the generator automatically blocks routes that should almost never be indexed:

  • /admin/, /dashboard/, /account/ — protected admin areas.
  • /login, /signup — authentication pages that produce thin or duplicate content.
  • /api/, /cgi-bin/— service endpoints that shouldn't appear in search.
  • /private/, /tmp/, /.well-known/ — internal directories.

The tool will also automatically reclaim accidentally-blocked critical paths like / and /blog/, moving them from Disallow to Allowso you don't deindex your site by mistake.

Common Use Cases

  • Block staging or dev subdomains — prevent search engines indexing staging.yourdomain.com or dev.yourdomain.com. Put a separate robots.txt at each subdomain root with Disallow: /.
  • Prevent crawling of admin and login URLs — /admin/, /login, /wp-admin/, /wp-includes/. Keeps these out of search results and saves crawl budget.
  • Stop crawl waste on filtered URLs — e-commerce sites with millions of filter combinations (?color=red&size=L) burn through crawl budget. Disallow query-string patterns to keep Googlebot on canonical pages.
  • Allow specific bots, block others — for example, Allow: Googlebot, Disallow: AhrefsBot, SemrushBot to keep competitor SEO crawlers out without affecting search visibility.
  • Block AI training crawlers — GPTBot, ClaudeBot, CCBot, anthropic-ai, Google-Extended. Each compliant crawler checks robots.txt before fetching content for AI training datasets.
  • Internal search result pages — site-search URLs (/search?q=...) produce infinite low-value pages. Disallowing them is standard practice.

SEO Best Practices for robots.txt

  • Don't block CSS or JavaScript — Google needs them to render your page correctly.
  • Always include a Sitemap:directive — it's the fastest way to surface new URLs.
  • Use noindex meta tags (not just Disallow) for pages you want fully hidden from search.
  • Test your file in Google Search Console's robots.txt Tester before deploying.
  • Remember: Disallowblocks crawling, not indexing — a blocked URL can still appear in search if it's linked externally.

Robots.txt Generator vs Other Tools

Versus writing robots.txt by hand — manual editing works but is error-prone (typos in directives, wrong user-agent matching). Tooldit produces syntactically valid output every time.

Versus CMS plugins — WordPress (Yoast), Drupal, and others have robots.txt features built in, but lock you to that platform. Tooldit works for any site stack.

Versus Google's testing tool — Google's robots.txt Tester validates an existing file; it doesn't generate one. Use Tooldit to create, then validate in Google Search Console.

Versus Ahrefs / SEMrush — paid tools audit existing robots.txt for SEO issues. Tooldit creates the file from scratch with sane defaults.

Troubleshooting & Common Issues

  • Blocked good content by accident — the wildcard Disallow: / blocks your entire site. Always test with Google's robots.txt Tester before deploying. A trailing slash bug can de-index a whole site overnight.
  • Sitemap not picked up — the Sitemap: directive needs the absolute URL (https:// example.com/sitemap.xml), not a relative path. Re-check after the generator runs.
  • Bot still crawls disallowed pages — robots.txt is a request, not an enforced block. Malicious bots ignore it; legitimate crawlers respect it. For real protection, use authentication or noindex meta tags.
  • Wrong placement — robots.txt must live at the root of your domain (https://example.com/ robots.txt). Subdirectory paths (/blog/robots.txt) are ignored.
  • Case sensitivity — paths in robots.txt are case-sensitive on most servers. If your URLs use mixed case, account for both versions.
  • Blocked pages still showing in Google — this is the classic robots.txt vs noindex confusion. If another site links to a URL you've disallowed, Google can index the URL (showing the title only, often with no description — the "A description for this result is not available" message). To genuinely de-index a page, allow it in robots.txt AND add a <meta name="robots" content="noindex"> tag — Googlebot needs to crawl the page to see the noindex directive.
  • Wildcard syntax not working — Google supports * (any sequence) and $ (end of URL). For example, Disallow: /*.pdf$ blocks all PDFs. Not all crawlers support wildcards — older or simpler bots may ignore them.
  • Multiple sitemaps — you can list as many Sitemap: directives as you need, each on its own line. Useful for large sites with separate sitemap-index files for pages, products, images, and videos.

Frequently Asked Questions

+What is a robots.txt file?

robots.txt is a plain-text file at the root of your site that tells search engine crawlers which URLs they can fetch. It's the first file Googlebot, Bingbot and other crawlers request when they visit your domain.

+Where do I put my robots.txt file?

It must live at the root of your domain — for example https://example.com/robots.txt. Crawlers ignore robots.txt files placed in any other directory.

+Does robots.txt prevent indexing?

Not exactly. Disallowblocks crawling, but a URL can still appear in Google's index if other sites link to it. To prevent indexing entirely, use a noindex meta tag or HTTP header instead.

+Should I include my sitemap in robots.txt?

Yes. Adding a Sitemap:directive is the canonical way to point all crawlers — including ones you haven't submitted to manually — at your XML sitemap.

+Does Google honor crawl-delay?

Googlebot ignores the Crawl-delay directive. Bingbot, Yandex and most other bots do honor it. To slow Googlebot, use the crawl rate setting in Google Search Console.

+Should I block AI crawlers (GPTBot, ClaudeBot, etc.)?

If you don't want your content used for AI training, yes. Add User-agent: GPTBot with Disallow: /. Similar lines for ClaudeBot, Bytespider, CCBot, and Google-Extended. Compliant bots respect these; some scrape anyway.

+Do I need a robots.txt file?

Technically no — without one, all bots crawl freely. But you should have one to point to your sitemap and to block any admin/staging paths that shouldn't be indexed.

+Is my data uploaded?

No. The generator builds the file client-side. Your domain name and directives never reach a server.

Footer

Tooldit

Free, private, browser-based PDF, image, and AI tools. No sign-up, no uploads — your files never leave your device.

info@tooldit.com
  • Private
  • Fast
  • Offline
  • Free Forever

PDF Tools

  • Merge PDF
  • Split PDF
  • Compress PDF
  • PDF to Images
  • Image to PDF

Image Tools

  • Image Editor
  • Image Cropper
  • Image Merge
  • PNG Converter
  • JPG Converter

Calculators

  • Age Calculator
  • Percentage Calculator
  • BMI Calculator
  • Tip Calculator
  • GPA Calculator

Text & Dev

  • Word Counter
  • Character Counter
  • Case Converter
  • Lorem Ipsum Generator
  • Text Diff Checker

AI & Utility

  • Background Remover
  • Object Remover
  • Internet Speed Test
  • Typing Speed Test
  • Stopwatch & Timer
  • Games

Company

  • Blog
  • About Us
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 Tooldit. All tools run locally in your browser.