--- title: Making Your Hugo Blog Citable by LLMs date: 2026-03-12 url: https://mazurov.dev/posts/making-your-blog-citable-by-llms/ description: Practical steps to help LLMs accurately cite your blog: llms.txt, Schema.org BlogPosting markup, structured metadata, and problem-first writing. Implemented on a Hugo site with PaperMod. tags: llm, hugo, seo, structured-data --- LLMs are already crawling your blog. Cloudflare's [AI bot analytics][3] show GPTBot, OAI-SearchBot, and others making dozens of requests per day to small personal sites. But when these models cite your content, they hallucinate URLs, misattribute claims, and lose context. The problem isn't access - most blogs just serve content optimized for humans and search engines, not for language models. This post covers the changes I made to this Hugo site (PaperMod theme) to fix that. ## What LLMs Need to Cite You Correctly When an LLM reads your blog post, it's working with raw HTML. It guesses which text is the title, who the author is, what the publication date is. Usually it gets close enough. When it doesn't, you get cited with the wrong date, a hallucinated URL, or a summary that misrepresents your point. What helps: - **Structured metadata** - author, date, description in machine-readable formats (JSON-LD, front matter) - **A plaintext index** - a single file listing all content with descriptions, so models can discover posts without crawling HTML - **Pre-summarized content** - descriptions and key takeaways that models can use directly instead of generating their own - **Problem-first writing** - leading with the claim rather than the anecdote, so the first paragraph carries the key context ## llms.txt - A Plaintext Index for Models The [llms.txt specification][1] defines a simple plaintext file at your site root that lists content with descriptions. Think `robots.txt`, but for helping models understand what's on your site rather than restricting access. Here's what the llms.txt for this site looks like: ``` # Musings of an AI Wrangler > I am a software engineer that loves solving complicated problems. Once in a while I solve a problem I want to document, so I'll do that here. ## Posts - [Making Your Hugo Blog Citable by LLMs](https://mazurov.dev/posts/making-your-blog-citable-by-llms/llms.txt): Practical steps to help LLMs accurately cite your blog... - [Fixing the '?' Hostname Problem on OpenWrt Access Points](https://mazurov.dev/posts/openwrt-ap-hostname-sniffer/llms.txt): OpenWrt dumb APs show '?' for client hostnames... ``` Each post also gets its own `/posts/slug/llms.txt` endpoint with the raw markdown content, so a model can fetch the full text without parsing HTML. ### Hugo implementation Two pieces: output format definitions in `config.toml` and two templates. In `config.toml`, define the output formats and assign them: ```toml [outputFormats.llms] baseName = "llms" isPlainText = true mediaType = "text/plain" rel = "alternate" root = true [outputFormats.llmsmd] baseName = "llms" isPlainText = true mediaType = "text/plain" [outputs] home = ["HTML", "RSS", "llms"] page = ["HTML", "llmsmd"] ``` The `llms` format is for the site-wide index (rooted at `/llms.txt`). The `llmsmd` format is for per-page plaintext. See the Hugo [custom output formats documentation][6] for details on these fields. The homepage template at `layouts/index.llms.txt`: ``` # {{ site.Title }} > {{ site.Params.profileMode.subtitle }} ## Posts {{ range where site.RegularPages "Section" "posts" -}} - [{{ .Title }}]({{ .Permalink }}llms.txt): {{ with .Description }}{{ . }}{{ else }}{{ .Summary | plainify | truncate 160 }}{{ end }} {{ end -}} ``` The per-page template at `layouts/_default/single.llmsmd.txt`: ``` --- title: {{ .Title }} date: {{ .Date.Format "2006-01-02" }} url: {{ .Permalink }} {{- with .Description }} description: {{ . }} {{- end }} {{- with .Params.tags }} tags: {{ delimit . ", " }} {{- end }} --- {{ .RawContent }} ``` Each post link in the index points to `{permalink}llms.txt`, giving models a direct path from index to full plaintext content. ## Schema.org BlogPosting Markup [JSON-LD `BlogPosting` schema][2] gives models structured fields they can extract without guessing: `headline`, `description`, `author`, `datePublished`, `dateModified`, `articleBody`, and `wordCount`. The author object includes `sameAs` links to social profiles, which helps models verify identity across platforms. Here's a simplified example of the JSON-LD this site produces: ```json { "@context": "https://schema.org", "@type": "BlogPosting", "headline": "Making Your Hugo Blog Citable by LLMs", "description": "Practical steps to help LLMs accurately cite your blog...", "datePublished": "2026-03-12", "dateModified": "2026-03-12", "wordCount": "1200", "author": { "@type": "Person", "name": "Stepan Mazurov", "sameAs": [ "https://github.com/smazurov", "https://www.linkedin.com/in/smazurov/", "https://bsky.app/profile/mazurov.dev" ] } } ``` ### Hugo implementation The schema is rendered by a partial at `layouts/partials/templates/schema_json.html`. PaperMod includes a version of this; I extended it to include `sameAs` on the author object and to emit a `BreadcrumbList` for navigation context. This is also where Google's [article structured data guidelines][5] overlap - the same markup that helps Google's Rich Results also helps LLMs parse your content. ## Structured Front Matter Two front matter fields do the most work here: **`description`** - a concise factual summary of the post. Models use this as a citation snippet instead of generating their own. Without it, they summarize from the body text and often miss the point. **`takeaways`** - a list of pre-summarized key points. These render as a "Key Takeaways" section before the content and give models a structured list of claims to cite directly. Before: ```yaml --- title: "My Cool Post" date: 2026-01-15 tags: - networking --- ``` After: ```yaml --- title: "Fixing the '?' Hostname Problem on OpenWrt Access Points" date: 2026-03-09T22:00:00-07:00 description: "OpenWrt dumb APs show '?' for client hostnames because /tmp/dhcp.leases is empty. A tcpdump script sniffs DHCP Option 12 from bridge traffic and writes lease entries that LuCI can display." tags: - openwrt - networking - dhcp takeaways: - LuCI shows '?' hostnames on dumb APs because /tmp/dhcp.leases is empty without a local DHCP server - A tcpdump script can sniff DHCP Option 12 hostnames from bridge traffic on br-lan - The script writes to the lease file format that rpcd-mod-luci expects, so LuCI displays hostnames normally --- ``` The `description` ends up in both the Schema.org JSON-LD and the llms.txt index. The `takeaways` render via a Hugo partial (`layouts/partials/takeaways.html`) as a styled list before the post body: ```html {{- with .Params.takeaways }}