TL;DR llms.txt
is not currently widely adopted or officially recognized by major LLM providers or AI search engines. While the llms file a proposed standard and some sites have implemented it, Google and other major players have stated that they are not using it. If it becomes a widely adopted standard we’ll look to add it to our plugins.
What is llms.txt
?
- It’s a proposed standard (similar to
robots.txt
for traditional search engine crawlers) designed to help LLMs and AI agents understand and process website content more effectively. - The idea is to provide a clean, structured, and LLM-friendly version of a website’s important content, often in Markdown format, devoid of ads, navigation, and other extraneous HTML elements.
- It’s intended to give LLMs a “curated map” of high-value content, such as API documentation, return policies, or key articles, to improve the accuracy and relevance of AI-generated responses.
Why it’s NOT a “thing” (yet):
- Lack of Official Adoption: Major LLM providers like Google (for Gemini/Bard), OpenAI (for GPTBot), and Meta (for LLaMA) have explicitly stated that they do not currently use or check for
llms.txt
. They primarily rely on existing web standards likerobots.txt
and sitemaps, along with their advanced crawling and understanding capabilities. - Redundancy: As Google’s John Mueller has pointed out, if AI bots already download full web pages and structured data, why would they need a separate file? They can already extract the necessary information.
- Potential for Abuse: There’s a concern that
llms.txt
could be abused to show AI bots one version of content while users see another, leading to cloaking issues. - User Experience Concerns: If LLMs were to cite
llms.txt
files directly, users clicking on those citations might land on bare text files without proper formatting or navigation, leading to a poor user experience. - Limited Observed Benefit: Community feedback from early adopters of
llms.txt
has shown little to no activity from major AI crawlers accessing the file. While some minor increases in “LLM traffic” have been reported by very few sites, it’s often attributed to the overall growth of AI usage rather than the specific influence ofllms.txt
.
What LLMs and AI Engines do care about (and what you should focus on):
Instead of llms.txt
, focus your efforts on these established and proven strategies for optimizing content for AI (TL;DR: be awesome at SEO):
- Structured Data : This is paramount. Use markup to explicitly define the entities, relationships, and meaning of your content. This gives LLMs clear, machine-readable information about your pages. (This is why SEO for AI has focused here)
- High-Quality, Well-Structured Content:
- Clarity and Conciseness: Write clearly and avoid jargon.
- Logical Headings and Subheadings: Use
<h1>
,<h2>
, etc., to create a clear hierarchy. - Semantic HTML: Use appropriate HTML tags (
<article>
,<section>
,<ul>
,<ol>
,<p>
, etc.) to convey meaning.
- E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness): While not a technical tag, creating content that demonstrates these qualities is crucial for LLMs to consider your information reliable and trustworthy.
robots.txt
and Sitemaps: Continue to use these for traditional crawl management and indicating what content is available on your site.- Technical SEO Best Practices: Fast loading times, mobile-friendliness, secure (HTTPS) sites, and a clean code base all contribute to better crawlability and understanding by any bot, including AI ones.
In summary: While llms.txt
is an interesting proposal, it has not gained traction with the major AI players. Investing time and resources into it at this point would be a misallocation. Focus on creating high-quality, semantically rich content using established structured data formats and good web development practices.