
About a month ago, I noticed something odd: a brand-new website wasn’t getting indexed by Google. It had backlinks, everything looked fine technically. Yet weeks passed, and Google still hadn’t picked it up.
That led me to take a closer look. What I found was that Cloudflare had automatically created a robots.txt file for the site—even though I hadn’t uploaded one. After some testing, I started seeing this same pattern across roughly a dozen new sites. Each had the same automatically generated robots.txt file, and each was still waiting to be indexed.
I’ve now turned off Cloudflare’s auto-creation of the robots.txt file across these sites. I’ll be keeping an eye on them to see if Google begins to crawl and index them now that the feature is disabled. It could just be coincidence—but the timing and consistency are suspicious enough to warrant a closer look.
Why Cloudflare Is Doing This
Cloudflare recently added a feature that automatically creates a robots.txt file containing what it calls “content signals.” These signals indicate how site content can be used for search indexing, AI training, or AI model inputs, which is in addition to their “block AI bots” feature that is turned on by default–you can disable it.
The intention seems reasonable—Cloudflare wants to help site owners declare how their content can be used. But for most site owners, this default behavior doesn’t add much value. In fact, it primarily benefits Cloudflare by reducing bandwidth usage when AI crawlers are blocked.
Unless your site is actually seeing bandwidth issues or resource drain from bots, you probably want your content to be crawlable and indexable—by search engines and by legitimate AI crawlers alike.
The Problem: Cloudflare’s Auto-Generated Robots.txt File
If your site doesn’t already have a robots.txt file, Cloudflare will generate one automatically that looks something like this:
# As a condition of accessing this website, you agree to abide by the following
# content signals:
# (a) If a content-signal = yes, you may collect content for the corresponding
# use.
# (b) If a content-signal = no, you may not collect content for the
# corresponding use.
# (c) If the website operator does not include a content signal for a
# corresponding use, the website operator neither grants nor restricts
# permission via content signal with respect to the corresponding use.
# The content signals and their meanings are:
# search: building a search index and providing search results (e.g., returning
# hyperlinks and short excerpts from your website's contents). Search does not
# include providing AI-generated search summaries.
# ai-input: inputting content into one or more AI models (e.g., retrieval
# augmented generation, grounding, or other real-time taking of content for
# generative AI search answers).
# ai-train: training or fine-tuning AI models.
# ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT
# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
There’s nothing overtly blocking in that file—no “Disallow” directives—but after reviewing multiple cases, I’ve noticed that sites with this Cloudflare-generated file often fail to get indexed for weeks.
It’s possible that Google’s crawler treats the legal-style wording as a restrictive or unclear signal. While there’s no definitive proof yet, the correlation between this auto-generated file and indexing delays is strong enough to pay attention to.
The Setting You Need to Check
To see if this setting is affecting your site, log into your Cloudflare dashboard and navigate to:
Security ? Bots ? AI Scrapers and Crawlers.
Look for the option labeled “Disable robots.txt configuration.”
By default, it’s set to “Content signals policy.” That’s the one that causes Cloudflare to generate this robots.txt file automatically.
To stop Cloudflare from creating it, change the setting to “Disable robots.txt configuration.” Then manually upload your own simple robots.txt file like this:
User-agent: * Disallow:
This tells all crawlers that they’re allowed to access and index your site—nothing confusing, no legal jargon, no AI references.
Why It Matters
New sites are already at a disadvantage when it comes to being indexed quickly. Add in any ambiguity from your robots.txt file, and you may be slowing things down further.
Google tends to err on the side of caution with new domains. If there’s any uncertainty about what’s allowed, it may delay crawling entirely. After removing Cloudflare’s auto-generated file and uploading a clean one, I’ve seen some sites begin to appear in Google within days.
When to Consider Blocking AI Crawlers
There are valid reasons to restrict AI crawlers—if your content is private, proprietary, or resource-heavy to serve. But for most sites, blocking AI bots isn’t necessary.
In fact, allowing your content to be seen by AI systems like ChatGPT or Perplexity can increase your visibility in AI-assisted search and conversational contexts. Unless those crawlers are significantly eating into your bandwidth or slowing your site, there’s little downside to allowing them.
To be clear, I’m not saying Cloudflare is intentionally blocking Google or other search engines. It could just be an odd coincidence. But I’ve now seen enough examples to treat this setting as worth checking—especially if your new site isn’t showing up in Google after several weeks.
I’ll be watching a batch of sites where I’ve disabled Cloudflare’s auto-generated robots.txt files to see if their indexing improves. If they start appearing in Google’s index soon, that’ll tell us a lot.
For now, the takeaway is simple: if you’re using Cloudflare and not getting indexed, check this setting. Sometimes, what seems like an SEO problem turns out to be a small technical detail hiding in your CDN.
