Home » News » How to Stop OpenAI's GPTBot from Crawling Your Website

How to Stop OpenAI's GPTBot from Crawling Your Website

(Image Credit Google)
(Image credit- Search Engine Journal) OpenAI, the company that created ChatGPT, has released GPTBot, a new web crawler, along with guidelines on how to stop it. In spite of recent claims suggesting that ChatGPT's intelligence is waning, it is one of the most powerful AI systems ever created. The AI chatbot's creator, OpenAI, is still improving its large language models (LLMs), notably GPT-3.5 and GPT-4.

How GPTBot Works

This system searches the web for information that can improve the accuracy, capability, and safety of AI technology. According to reports, it should strictly filter out any paywall-restricted sources, sources that contravene OpenAI's norms, or sources that collect personally identifying information.
  • The use of GPTBot has the potential to give AI models a big boost.
  • You can add to this data pool by granting it access to your website, which enhances the AI ecosystem as a whole.
  • Web administrators are now given the freedom to decide whether or not to allow GPTBot access to their websites thanks to OpenAI.
[caption id="" align="aligncenter" width="1456"]OpenAI's ChatGPT Web Crawler - GPTBot Image credit- Search Engine Roundtable[/caption]

How to Stop OpenAI's GPTBot from Crawling Your Website

Website owners can change the robots.txt file to prevent GPTBot from accessing their site. They can stop GPTBot from viewing their full website by adding the following:
  • User-agent: GPTBot
Disallow:/ On the other hand, the directories that GPTBot may access can be modified by people who just want to provide partial access. Add the following to the robots.txt file to accomplish this.
  • User-agent: GPTBot
Allow: /directory-1/ Disallow: /directory-2 [caption id="" align="aligncenter" width="1200"]OpenAI launches web crawling GPTBot, sparking blocking effort by website owners and creators | VentureBeat Image credit- VentureBeat[/caption]   When it comes to GPTBot's technological operations, all calls to websites come from IP address ranges listed on OpenAI's website. This information gives web administrators more openness and clarity regarding the traffic source for their websites. The GPTBot web crawler's access to your site, its security, and its ability to improve its AI could all be severely impacted by your decision.

Legal and Moral Issues

The most recent information from OpenAI has generated a discussion on the morality and legality of using web data that has been scraped to train proprietary AI systems. GPTBot self-identifies so that web administrators can disable it via robots.txt, although some argue that, unlike traffic-driving search engine crawlers, there is no advantage to enabling it. Copyrighted content being reproduced without credit is a serious issue. Also read: AI.com switches from ChatGPT to Elon Musk’s X.ai.  

By Monica Green

I am specialised in latest tech and tech discoveries.

RELATED NEWS

In the ever-changing world of technology and retai...

news-extra-space

In a bid to capture the attention of users and dri...

news-extra-space

Apple is preparing for a game-changing move with i...

news-extra-space

Google has been making huge headways in artificial...

news-extra-space

Elon Musk's artificial intelligence firm, xAI, is ...

news-extra-space

In a digital showdown that has captured the attent...

news-extra-space
2
3
4
5
6
7
8
9
10