Is ChatGPT Blocked From Your Website? How to Audit AI Crawler Access
You can do everything right for AI search: rewrite the pages, add the specs, earn the citations. And still be completely invisible in ChatGPT for one stupid reason: a line in a file you've never opened is telling the crawler to go away.
Most manufacturers have never looked. A real share of them are blocking AI bots right now and have no idea, because the block isn't something they chose. It came from a CDN setting, an outdated robots.txt file, or a site built in a way that bots can't read.
So before you spend another dollar on content, spend an afternoon on access. Auditing AI crawler access is the cheapest, fastest win in AI search, and it's the one nobody checks. Here's how to find the blocks on your own site and fix them, usually in a single line.
You're probably blocked, and you'd have no way of knowing
Here's the uncomfortable part. This is a config problem, not a content problem. Which means your excellent website can be invisible to ChatGPT while looking perfect to you.
It's more common than anyone admits. Roughly 27% of B2B sites are accidentally blocking major AI crawlers via CDN-level rules, often without anyone realizing it. A quarter of the top 1,000 websites now block GPTBot outright. And Cloudflare alone reported more than 2.5 million sites had opted to block AI crawling through its managed settings as of last August, a number that has only grown since.
Read that again. Millions of sites are blocking AI bots through a setting that somebody clicked once, or that was switched on by default.
Now think about your own setup. Who wrote your robots.txt? Be honest. It was probably a developer or an agency, years ago, and nobody has opened it since. Your site sits behind a CDN with a security tab full of toggles your marketing team has never seen. Manufacturers get hit by this more than most because their sites are older, their IT teams are security-first, and half of them were rebuilt or migrated by a vendor who set defaults that nobody reviewed. The block might already be there. You just haven't looked.
The mistake that gets it backward: training bots vs search bots
Before you touch anything, learn one distinction, because getting it wrong does the exact opposite of what you intend.
AI companies don't run one bot. They run several, and they do different jobs. OpenAI runs three that you can control independently:
- GPTBot crawls content for training. Block it, and you opt out of training. That's all it does.
- OAI-SearchBot is the one that puts you in ChatGPT's search answers. Block this, and you disappear from ChatGPT search. This is the one you want crawling.
- ChatGPT-User fetches a page when someone asks ChatGPT to go look at something specific.
Most people hear "block AI" and reach for the kill switch on everything. That's the trap. Block GPTBot if you have an opinion about training, fine. But block OAI-SearchBot, and you have just removed yourself from the exact answers you were trying to show up in. You did the work and locked the door on the way out.
Google works the same way. Googlebot powers Search, including AI Overviews. Google-Extended only controls whether Google uses your content to train Gemini, and it has zero effect on your Search or AI Overviews visibility. So you can block Google-Extended and lose nothing in search. Block Googlebot and you fall off the map entirely.
The posture that's become the consensus is simple: block the training crawlers if you want to, but always allow the search and retrieval bots, because those are the ones that send buyers back to your site. The same logic applies to ClaudeBot and PerplexityBot. If it can send you traffic, let it in.
robots.txt isn't the only gate (this is where most audits quit too early)
Here's what trips up even the people who do check. They open robots.txt, see it's clean, declare victory, and walk away. Two other things can still slam the door.
First, your CDN. Cloudflare and tools like it have a bot-management layer that sits above your robots.txt and can block AI crawlers no matter what your file says. If someone flipped on "block AI bots" in a dashboard, your perfect robots.txt is irrelevant. The CDN stops the crawler before it ever reaches the file.
Second, and almost nobody knows this one: rendering. Roughly 69% of AI crawlers can't execute JavaScript. If your site builds its pages in the browser with JavaScript, which plenty of modern manufacturer sites do after a flashy rebuild, the bot shows up, gets a blank shell with no real content, and leaves. You didn't block it anywhere. You just served it nothing.
A green light in one place means nothing if another is red. You have to check all three: the file, the CDN, and what the bot actually sees when it lands.
Getting found in AI search starts with being reachable. We make sure the right crawlers can see your site, your buyers find you in the answer instead of a competitor, and that visibility turns into pipeline. See how our SEO and AI search work.
How to audit AI crawler access in an afternoon
None of this needs a developer to diagnose. Here's the whole audit, start to finish.
1. Read your robots.txt. Go to yoursite.com/robots.txt and actually read it. Look for Disallow lines under GPTBot, OAI-SearchBot, ChatGPT-User, Google-Extended, ClaudeBot, PerplexityBot, CCBot, or Bytespider. If the search and retrieval bots are disallowed, there's your problem in black and white.
2. Check your CDN. Log in to Cloudflare, or whatever sits in front of your site, and find the bot or AI settings. Look for a "block AI scrapers" or "block AI bots" toggle that's switched on. This is the quiet one that overrides everything else.
3. Test what the bot sees. Open a key product page, right-click, and view page source. Search the raw HTML for a sentence you can see on the page. There? Good. If the source is mostly empty and the content is loaded by script, AI crawlers get that empty version.
4. Check your logs. Ask whoever has server access to pull AI bot user-agents from the last month. Are GPTBot and OAI-SearchBot actually visiting? If they have never shown up, something is keeping them from arriving.
5. Run the live test. Ask ChatGPT and Perplexity about your company and one specific product. Can they describe it accurately? Can they cite your page? If they are guessing, or pulling from a directory instead of you, you have your answer.
Write down what you find against those five checks. That's your access audit. It genuinely fits in an afternoon, and most teams find at least one thing they didn't expect.
Found a block? Good. Here's how to fix it and set a real policy
If you found a block, that's the good news. The fix is usually one line, and you just found a pipeline that was sitting on the floor.
Decide your posture on purpose, not by accident. For most manufacturers, the same default looks like this:
# Let the bots that send traffic in
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
# Your call on training crawlers
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
Allow the search and retrieval bots. Make a deliberate call on the training ones based on how you feel about feeding model training. Block sensitive paths like account or checkout directories, not your whole public site. Then rerun the live test a few weeks later and watch your name start appearing where it didn't before.
Two honest things before you go. robots.txt is a voluntary protocol: the bots agree to respect it, but nothing forces them to, so treat it as direction, not a deadbolt. And access is the floor, not the ceiling. Getting the crawler in the door makes you eligible for a citation. The content still has to be worth quoting once it's there, and that's a separate job.
But that job is pointless until the door is open. With 71% of B2B buyers now using AI tools to research vendors, and a buying group of six to ten people each running their own searches, an accidental block isn't a technical footnote. It's deals forming without you in the room. Most manufacturers are pouring money into the content and ignoring the door. Check the door first.
If you want us to run this across your site and tell you exactly where you're blocked and what it's costing you, book an exploratory call, and we'll audit your coverage across the full buying committee.
Want the wider view on where AI search is heading? Here's our take.
Check your AI search readiness in two minutes
Find Out Where You Stand in AI Search.
Your buyers build their shortlist from AI answers before they ever fill out a form. We see it happening in manufacturing right now, and most sites are invisible to it. Ten questions, about two minutes, for a clear read on yours. No email needed to see your score.
Question text.
Hint text.
Tier copy.
We Bring the Findings to the Call.
We run the full read on your actual site: which queries your buyers use, where competitors win citations, and the exact pages to fix. Then we walk you through it in 20 minutes. The findings are the deliverable, not a pitch.
Pipeline your sales team actually wants starts with knowing where you stand.
HubSpot Platinum Partner. Google Partner. Clutch Global Leaders. Over 20 years in B2B.