upvote
Agents don't buy stuff they see in an ad
reply
So why serve them at all?
reply
If your website itself is advertising a product or service you sell you would still want LLMs to see and fetch it. If you are a news site, blog, or any other website that doesn’t exist to sell something, you are only harmed by ai agents.
reply
In those situations you wouldn't have ads on the human version of the site either, surely?
reply
Sure, if it’s paywalled. Web hosting isn’t free
reply
modern agents already do this via content negotiation and will attempt to retrieve the markdown version of a given site

https://www.sanity.io/learn/course/markdown-routes-with-next...

reply
But that isn't that different from requesting the llms.txt version. Why not just make it so the useful content you want the LLM to focus on is easily retrievable from the same HTML the user's browser gets?

The sanity.io page writes:

> serving agents a bunch of HTML might just bloat their context window.

That's only true if you assume the the agent can't extract the useful text before it goes into the model as tokens. Your browser's reader mode uses heuristics to identify what the actual content is in a large HTML response and strips away the rest.

To me this is a far better approach than worrying about an llms.txt files or looking at HTTP headers to see if markdown is preferred. Such efforts could easily be directed at ensuring the useful content on your site carries the appropriate markup for an agent or any other tool to extract it. And it would require less work to implement for the publisher of the content.

reply