upvote
Following your link above, https://openai.com/gptbot

> ChatGPT-User is not used for crawling the web in an automatic fashion. Because these actions are initiated by a user, robots.txt rules may not apply.

So, not AI training in this case, nor any other large-batch scraping, but rather inference-time Retrieval Augmented Generation, with the "retrieval" happening over the web?

reply
Likely, at least for some. I've caught various chatbots/CLI harnesses more than once inspecting a github repo file by file (often multiple times, because context rot)

But the sheer volume makes it unlikely that's the only reason. It's not like everybody has constantly questions bout the same tiny website.

reply