Blocking AI Scraper Bots Chris Coyier

September 19, 2023
I think it was a month or two ago when OpenAI published docs on how you can tell their “web crawler” (scraper) to not scrape your site, following a similar syntax to Google’s:
User-agent: GPTBot Disallow: /Code language: HTTP (http)The instant I saw it I put it in my sites robots.txt file. I wanted to see how it felt and see if I felt any different over time.
Nope, not so far.
I don’t like the idea of a big machine scraping all my data to make it smarter, with no credit or links to where it got that data.
I’m not against the idea of language models, nor do I have any particular opinion on how you should feel about it. I just don’t want my labor and thoughts to be a part of a multi-billion dollar company’s business model, and in fact, I think they should have asked me to begin with.
If a huge company sent a robot to your door to ask for a lock of your hair, would you give it to them? If they asked for one square inch of your land, would you sign it over? If they asked you to run on a treadmill for one minute a day for them, would you hop to it? What if they didn’t ask?
UPDATE: A more comprehensive setup:
User-agent: CCBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: /Code language: HTTP (http)Related
🤘
ncG1vNJzZmibmKe2tK%2FOsqCeql6jsrV7kWlpbGdgbnxyhY6bo6ibm567qHnAomSsm6KWvaa%2BjJumratf