Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025
Scope and Intent
This guide standardizes HTML and semantic-web markup patterns so Large Language Models (LLMs) can ingest, contextualize, and reason over web content with minimal hallucination and maximal recall. It is P1 in the AIO Standards & Protocols category and is required reading for all engineers, technical writers, SEO strategists, and content architects integrating AIO principles.
Terminology
• LLM-Friendly — Markup that is syntactically valid, semantically explicit, and disambiguated enough for an LLM to parse deterministically.
• Canonical Node — The single, authoritative representation of an entity in a document (e.g., Article, Product).
• Semantic Neighborhood — The immediate graph of properties and linked entities surrounding a canonical node.
• Dual Embedding — Publishing schema.org data in both JSON-LD and visible HTML to serve crawlers without penalizing accessibility.
• Fence — Any code block (
, ``` etc.) that prevents the LLM from interpreting contents as prose.Core Principles
- Determinism over Density — Prefer fewer, unambiguous triples to verbose, overlapping vocabularies.
- Canonical First — Each page must expose one—and only one—
mainEntity
.- Symmetric Visibility — Ensure that information displayed to users is represented in structured data, and vice-versa.
- Explicit Units & Context — Always qualify numbers, times, currencies, and geographies.
- Immutable Identifiers — Use stable
@id
IRIs or full URLs to prevent entity drift in model embeddings.Baseline HTML Structure
<!DOCTYPE html> <html lang="en" data-aiotype="primary"> <head> <meta charset="utf-8" /> <title>LLM-Friendly Markup Explained</title> <link rel="canonical" href="https://example.com/llm-friendly-markup" /> <script type="application/ld+json" id="aio-schema"> { /* JSON-LD injected in later section */ } </script> </head> <body vocab="https://schema.org/" typeof="Article" resource="#article"> <header> <h1 property="headline">LLM-Friendly Markup Explained</h1> <p property="author" typeof="Person"> <span property="name">Ada Lovelace</span> </p> </header> <article property="articleBody"> <!-- visible content goes here --> </article> </body> </html>
JSON-LD Canonical Pattern
{ "@context": ["https://schema.org", { "aio": "https://aio.fabledsky.com/vocab#" }], "@id": "https://example.com/llm-friendly-markup#article", "@type": "Article", "mainEntityOfPage": { "@id": "https://example.com/llm-friendly-markup" }, "headline": "LLM-Friendly Markup Explained", "author": { "@type": "Person", "name": "Ada Lovelace", "aio:contributorRole": "Technical Writer" }, "datePublished": "2025-04-16", "dateModified": "2025-04-16", "keywords": [ "LLM", "Semantic HTML", "schema.org", "AIO Standards" ], "aio:priority": "P1" }
Correct vs. Incorrect Patterns
Correct: Use
@context
as an array so custom vocab (aio) does not shadow schema.org.
Incorrect:{ "@context": "https://schema.org aio:https://aio.fabledsky.com/vocab#" }
— Space-delimited contexts are invalid JSON; LLMs may tokenize incorrectly.
Correct: Expose author once in visible HTML and JSON-LD.
Incorrect: Hide a different author in structured data for SEO manipulation.Correct: Declare language on every root
<html>
element (lang="en"
).
Incorrect: Omit language; LLM may default to incorrect locale and mis-stem tokens.Semantic Neighborhood Design
- Identify the Canonical Node (Article, Product, Dataset).
- Map first-degree properties:
headline
,name
,description
,image
,url
.- Attach second-degree entities (Person, Organization) with explicit
@id
s.- Validate using both the W3C RDF validator and the AIO linter (
aio lint markup
).Microdata + RDFa Hybrid Example
<section vocab="https://schema.org/" typeof="Product" resource="#prod42"> <h2 property="name">Photon Stabilizer 3000</h2> <img property="image" src="/img/photon.jpg" alt="Photon Stabilizer 3000" /> <p> <span property="description"> A compact, LLM-optimized photon stabilizer for quantum workloads. </span> </p> <data property="sku">PS-3000-LLM</data> <span property="offers" typeof="Offer"> $<span property="price" content="1999.00">1,999</span> <meta property="priceCurrency" content="USD" /> </span> </section>
This hybrid approach grants crawlers micro-parseable attributes while preserving JSON-LD as the canonical graph.
Disambiguation Strategies
• Units: Always annotate with
<meta property="unitText" content="USD">
or a fullQuantitativeValue
.
• Time: ISO-8601 only (2025-04-16T05:01:00Z
).
• Person names: first-classPerson
entity rather than a string whenever a biography link exists.
• Acronyms: Providerdfs:label
expansions in@context
when domain-specific.Fenced Code Blocks
LLMs treat fenced content as immutable tokens. Use for examples, not for critical semantic cues.
Bad:
<article> ```html <!-- nested fence confusing LLM --> ...
Good:
Separate narrative and fenced blocks, and never nest fences of the same delimiter depth.Accessibility Alignment
Because most modern LLMs co-train on accessibility corpora, meeting WCAG 2.2 AA guidelines increases parse fidelity:
• Always pairaria-label
with visual cues.
• Use<figure>/<figcaption>
for images; mapfigcaption
toschema:image.caption
.
• Provide language alternatives linked viainLanguage
.Testing Workflow
- Run
npm run build && aio lint markup ./dist
.- Inspect generated triples in Turtle:
aio export ./dist --to turtle
.- Load graph into a local SPARQL endpoint; verify canonical node count (
SELECT (COUNT(?s) AS ?count) WHERE { ?s a schema:Article }
). Must equal 1.- Pass output through open-source LLM (e.g., Llama-3-instruct) and ask factual recall questions. Expect ≤1% hallucination.
Versioning and Change Management
• Increment
aio:schemaVersion
in@context
on breaking changes.
• Keep an immutable archive in/schema/versions/{semver}/
.
• Notify the AIO Schema mailing list 14 days before deprecation.Security and Integrity
To prevent prompt injection via embedded JSON-LD:
• Sanitize user-generated properties server-side.
• Validate IRIs against an allow-list to block malicious@id
references (e.g.,javascript:
URIs).
• CSP header should includescript-src 'self'
to confine inlined JSON-LD execution context.Reference Implementations
• AIO Demo Portal: https://demo.aio.fabledsky.com
• Fabled Sky GitHub Template: https://github.com/FabledSky/aio-markup-starterAdditional Resources
• W3C HTML Living Standard
• schema.org Full Hierarchy (CSV dump updated nightly)
• “Designing Data-Intensive Applications” — Chapter 12 for data contractsAdhering to these patterns ensures that your content becomes a first-class citizen in LLM knowledge graphs, reduces semantic drift, and aligns fully with the AIO vision of deterministic, machine-interpretable web experiences.