llms.txt strategy for tagging docs
Your documentation has at least two audiences now: humans with browsers, and LLMs retrieving context for their users’ questions. The second audience is growing fast. Every ChatGPT user who asks “how do I set up GA4 ecommerce tracking” is consuming somebody’s documentation, one way or another — either through a web search, an MCP call, or a retrieval step inside the model.
This page is about structuring docs so the LLM-audience path actually works. Valid as of April 2026.
What llms.txt is
Section titled “What llms.txt is”The llms.txt proposal is a simple convention: publish a single Markdown file at /llms.txt on your site root, listing the key pages LLMs should read to understand the site. The format is a Markdown H1 (site title), a short description, and a structured list of links grouped by section.
There are two common flavours:
| File | Purpose | Typical size |
|---|---|---|
llms.txt | Navigational index, pointing to the pages worth reading | 5-50 KB |
llms-full.txt | Full content of every page concatenated | 500 KB - 5 MB |
The distinction matters because context windows are still finite. A model with a 200K token context (≈ 800 KB of text) can ingest most llms-full.txt files comfortably. A model with a 32K context can’t; it wants llms.txt and then a focused fetch of specific pages.
How TaggingDocs generates them
Section titled “How TaggingDocs generates them”This site is built on Astro Starlight with the starlight-llms-txt plugin, configured in astro.config.mjs. At build time the plugin:
- Walks every content collection entry.
- Emits
/llms.txt— a tree of headings and page links. - Emits
/llms-full.txt— all pages concatenated in reading order, with frontmatter-derived titles as H1s. - Adds each file to the site’s build output so they’re served at
https://taggingdocs.com/llms.txtand/llms-full.txt.
You don’t need to maintain either file by hand. What you do need to maintain is what goes into it.
What to prioritise
Section titled “What to prioritise”The default starlight-llms-txt output includes every page. That’s reasonable for a small site. For a site the size of TaggingDocs (330+ pages as of this writing), priority matters because downstream models have finite attention even when context fits.
The hierarchy that works, highest-value first:
Foundations and glossary
Section titled “Foundations and glossary”If an LLM reads nothing else on your site, the glossary should be in its context. It’s the cheapest way to establish shared vocabulary. Every term defined on /foundations/glossary/ is a term a model can use correctly instead of paraphrasing.
Conceptual explainers
Section titled “Conceptual explainers”Pages that explain how something works (the dataLayer lifecycle, how GTM loads, what Consent Mode does) are disproportionately valuable. Models lean on these to reason about novel situations. One good page on dataLayer behaviour answers a hundred derived questions.
Reference tables and spec pages
Section titled “Reference tables and spec pages”The GA4 event reference. The consent-type matrix. The GTM API resource shapes. These are lookup tables — the model finds the row it needs and uses it. Highly structured reference content is the best fit for LLM consumption.
Worked examples and recipes
Section titled “Worked examples and recipes”Recipes (“here’s how to set up Cookiebot with GTM”) are what models cite when a user asks “how do I do X.” The recipe’s specificity is what makes the answer useful. Include the full worked example — abbreviated recipes are the opposite of useful to a model trying to apply them.
Lower-priority: opinion and meta-content
Section titled “Lower-priority: opinion and meta-content”This page. The about page. The roadmap. These are fine to include but they don’t help the model answer a tagging question. If context is tight, they should be trimmed first.
Writing for LLM readability
Section titled “Writing for LLM readability”The patterns that help human readers also help models, but models lean harder on a few specific things.
Explicit definitions over implicit ones
Section titled “Explicit definitions over implicit ones”A paragraph that starts “The dataLayer merge model is…” is easier for a model to retrieve and cite than one that describes the same concept without naming it. Models pattern-match on “term: definition” structures.
Tables for enumerations
Section titled “Tables for enumerations”A bullet list with 10 items and a table with 10 rows carry the same information to a human. To a model, the table is better — the row/column structure makes each cell addressable.
Code blocks with complete examples
Section titled “Code blocks with complete examples”Copy-pasteable code is what models copy-paste. A code snippet that assumes context the reader has is a snippet the model will fill in from its training data, which is a gamble. Every code example should run as-is or explicitly flag its placeholders.
Numeric specifics
Section titled “Numeric specifics”“Often about 5% of traffic” is better than “some traffic.” Models cite numbers when you give them numbers. Fuzzy language invites fuzzy answers.
Freshness signalling
Section titled “Freshness signalling”This is the bit most sites do badly. An LLM can’t tell how old a page is without a signal. It doesn’t matter what year is in your footer; what matters is machine-readable information about content staleness.
TaggingDocs uses two complementary signals:
Frontmatter lastUpdated
Section titled “Frontmatter lastUpdated”Every page has a lastUpdated: YYYY-MM-DD frontmatter field. The plugin surfaces this in the rendered page and in the generated llms-full.txt. A model reading the concatenated file can see when each page was last touched.
Body-line validity statement
Section titled “Body-line validity statement”More important for time-sensitive content: an inline sentence near the top of the page, like:
Valid as of April 2026, for GTM Web containers and the GA4 event reference at that date.
Body-line statements beat frontmatter for two reasons:
- They survive retrieval. When a model pulls a snippet of the page (not the whole document), the validity line is usually in the snippet.
- They’re in the model’s active context. Frontmatter sometimes gets stripped by scrapers; inline prose does not.
Pages on this site that touch APIs, product names, or vendor behaviour carry a validity line. Pages that are essentially timeless (conceptual explainers) carry only lastUpdated.
Context-window strategy
Section titled “Context-window strategy”If you publish llms-full.txt, know what size you’re targeting.
| Context size | Tokens | Bytes at ~4 chars/token | What fits |
|---|---|---|---|
| Small | 32 K | ~130 KB | Navigational llms.txt + 10-20 key pages |
| Medium | 128 K | ~500 KB | Full site for a focused topic section |
| Large | 200 K | ~800 KB | Whole TaggingDocs site (roughly) |
| XL | 1 M | ~4 MB | Whole site plus external references |
For most sites, the llms-full.txt fits entirely in modern large-context models. For sites that grow past 1-2 MB, you want either multiple topical llms-full.txt files (e.g. /llms-gtm.txt, /llms-ga4.txt) or a navigational llms.txt that sends the model to per-section full files.
What to exclude
Section titled “What to exclude”Not every page helps. Things to leave out of llms-full.txt if your build system lets you:
- Auto-generated index pages. Table-of-contents-style pages that only link elsewhere. Duplicate the links without the content.
- Pages that exist purely for SEO. If a page was written to rank and isn’t useful to a working engineer, it’s not useful to a model either.
- Deprecated / archive pages. If a page is kept for historical reference but the content is no longer correct, a prominent header saying so is not enough — models sometimes cite the body anyway. Consider excluding these from the generated feeds.
- Deeply interactive content. If a page is mostly a live form, calculator, or chart, the static Markdown version won’t capture its value. Better to link to it than dump its shell into the feed.
Measuring whether it works
Section titled “Measuring whether it works”The honest answer: you can’t really measure LLM consumption directly. The signals you can measure:
- Referrer traffic from chat.openai.com, claude.ai, etc. When a model surfaces a TaggingDocs link to its user and the user clicks, you get a referrer. This undercounts dramatically (most model answers don’t generate clicks) but trends are informative.
- Increased “I asked ChatGPT about GTM and it told me to read TaggingDocs” mentions. Anecdotal, but real — track these when they come up.
- MCP server usage. If you run an MCP server (like
mcp.taggingdocs.com), the usage counters give a clean signal of LLM-driven consumption. Users searching docs via the MCP tool is unambiguously LLM traffic. - Qualitative audit. Periodically ask a few frontier models a question your site should answer (“how do I debug a GTM tag that fires in Preview but not production”) and see whether the answer cites, paraphrases, or contradicts your content.