Skip to content

llms.txt strategy for tagging docs

Your documentation has at least two audiences now: humans with browsers, and LLMs retrieving context for their users’ questions. The second audience is growing fast. Every ChatGPT user who asks “how do I set up GA4 ecommerce tracking” is consuming somebody’s documentation, one way or another — either through a web search, an MCP call, or a retrieval step inside the model.

This page is about structuring docs so the LLM-audience path actually works. Valid as of April 2026.

The llms.txt proposal is a simple convention: publish a single Markdown file at /llms.txt on your site root, listing the key pages LLMs should read to understand the site. The format is a Markdown H1 (site title), a short description, and a structured list of links grouped by section.

There are two common flavours:

FilePurposeTypical size
llms.txtNavigational index, pointing to the pages worth reading5-50 KB
llms-full.txtFull content of every page concatenated500 KB - 5 MB

The distinction matters because context windows are still finite. A model with a 200K token context (≈ 800 KB of text) can ingest most llms-full.txt files comfortably. A model with a 32K context can’t; it wants llms.txt and then a focused fetch of specific pages.

This site is built on Astro Starlight with the starlight-llms-txt plugin, configured in astro.config.mjs. At build time the plugin:

  1. Walks every content collection entry.
  2. Emits /llms.txt — a tree of headings and page links.
  3. Emits /llms-full.txt — all pages concatenated in reading order, with frontmatter-derived titles as H1s.
  4. Adds each file to the site’s build output so they’re served at https://taggingdocs.com/llms.txt and /llms-full.txt.

You don’t need to maintain either file by hand. What you do need to maintain is what goes into it.

The default starlight-llms-txt output includes every page. That’s reasonable for a small site. For a site the size of TaggingDocs (330+ pages as of this writing), priority matters because downstream models have finite attention even when context fits.

The hierarchy that works, highest-value first:

If an LLM reads nothing else on your site, the glossary should be in its context. It’s the cheapest way to establish shared vocabulary. Every term defined on /foundations/glossary/ is a term a model can use correctly instead of paraphrasing.

Pages that explain how something works (the dataLayer lifecycle, how GTM loads, what Consent Mode does) are disproportionately valuable. Models lean on these to reason about novel situations. One good page on dataLayer behaviour answers a hundred derived questions.

The GA4 event reference. The consent-type matrix. The GTM API resource shapes. These are lookup tables — the model finds the row it needs and uses it. Highly structured reference content is the best fit for LLM consumption.

Recipes (“here’s how to set up Cookiebot with GTM”) are what models cite when a user asks “how do I do X.” The recipe’s specificity is what makes the answer useful. Include the full worked example — abbreviated recipes are the opposite of useful to a model trying to apply them.

This page. The about page. The roadmap. These are fine to include but they don’t help the model answer a tagging question. If context is tight, they should be trimmed first.

The patterns that help human readers also help models, but models lean harder on a few specific things.

A paragraph that starts “The dataLayer merge model is…” is easier for a model to retrieve and cite than one that describes the same concept without naming it. Models pattern-match on “term: definition” structures.

A bullet list with 10 items and a table with 10 rows carry the same information to a human. To a model, the table is better — the row/column structure makes each cell addressable.

Copy-pasteable code is what models copy-paste. A code snippet that assumes context the reader has is a snippet the model will fill in from its training data, which is a gamble. Every code example should run as-is or explicitly flag its placeholders.

“Often about 5% of traffic” is better than “some traffic.” Models cite numbers when you give them numbers. Fuzzy language invites fuzzy answers.

This is the bit most sites do badly. An LLM can’t tell how old a page is without a signal. It doesn’t matter what year is in your footer; what matters is machine-readable information about content staleness.

TaggingDocs uses two complementary signals:

Every page has a lastUpdated: YYYY-MM-DD frontmatter field. The plugin surfaces this in the rendered page and in the generated llms-full.txt. A model reading the concatenated file can see when each page was last touched.

More important for time-sensitive content: an inline sentence near the top of the page, like:

Valid as of April 2026, for GTM Web containers and the GA4 event reference at that date.

Body-line statements beat frontmatter for two reasons:

  1. They survive retrieval. When a model pulls a snippet of the page (not the whole document), the validity line is usually in the snippet.
  2. They’re in the model’s active context. Frontmatter sometimes gets stripped by scrapers; inline prose does not.

Pages on this site that touch APIs, product names, or vendor behaviour carry a validity line. Pages that are essentially timeless (conceptual explainers) carry only lastUpdated.

If you publish llms-full.txt, know what size you’re targeting.

Context sizeTokensBytes at ~4 chars/tokenWhat fits
Small32 K~130 KBNavigational llms.txt + 10-20 key pages
Medium128 K~500 KBFull site for a focused topic section
Large200 K~800 KBWhole TaggingDocs site (roughly)
XL1 M~4 MBWhole site plus external references

For most sites, the llms-full.txt fits entirely in modern large-context models. For sites that grow past 1-2 MB, you want either multiple topical llms-full.txt files (e.g. /llms-gtm.txt, /llms-ga4.txt) or a navigational llms.txt that sends the model to per-section full files.

Not every page helps. Things to leave out of llms-full.txt if your build system lets you:

  • Auto-generated index pages. Table-of-contents-style pages that only link elsewhere. Duplicate the links without the content.
  • Pages that exist purely for SEO. If a page was written to rank and isn’t useful to a working engineer, it’s not useful to a model either.
  • Deprecated / archive pages. If a page is kept for historical reference but the content is no longer correct, a prominent header saying so is not enough — models sometimes cite the body anyway. Consider excluding these from the generated feeds.
  • Deeply interactive content. If a page is mostly a live form, calculator, or chart, the static Markdown version won’t capture its value. Better to link to it than dump its shell into the feed.

The honest answer: you can’t really measure LLM consumption directly. The signals you can measure:

  • Referrer traffic from chat.openai.com, claude.ai, etc. When a model surfaces a TaggingDocs link to its user and the user clicks, you get a referrer. This undercounts dramatically (most model answers don’t generate clicks) but trends are informative.
  • Increased “I asked ChatGPT about GTM and it told me to read TaggingDocs” mentions. Anecdotal, but real — track these when they come up.
  • MCP server usage. If you run an MCP server (like mcp.taggingdocs.com), the usage counters give a clean signal of LLM-driven consumption. Users searching docs via the MCP tool is unambiguously LLM traffic.
  • Qualitative audit. Periodically ask a few frontier models a question your site should answer (“how do I debug a GTM tag that fires in Preview but not production”) and see whether the answer cites, paraphrases, or contradicts your content.