Document Type: Implementation Guide Section: Docs Repository:https://aio.fabledsky.com Maintainer: Fabled Sky Research Last updated: April 2025
Scope and Intent
This guide standardizes HTML and semantic-web markup patterns so Large Language Models (LLMs) can ingest, contextualize, and reason over web content with minimal hallucination and maximal recall. It is P1 in the AIO Standards & Protocols category and is required reading for all engineers, technical writers, SEO strategists, and content architects integrating AIO principles.
Terminology
• LLM-Friendly — Markup that is syntactically valid, semantically explicit, and disambiguated enough for an LLM to parse deterministically.
• Canonical Node — The single, authoritative representation of an entity in a document (e.g., Article, Product).
• Semantic Neighborhood — The immediate graph of properties and linked entities surrounding a canonical node.
• Dual Embedding — Publishing schema.org data in both JSON-LD and visible HTML to serve crawlers without penalizing accessibility.
• Fence — Any code block (
, ``` etc.) that prevents the LLM from interpreting contents as prose.
### Core Principles
1. Determinism over Density — Prefer fewer, unambiguous triples to verbose, overlapping vocabularies.
2. Canonical First — Each page must expose one—and only one—`mainEntity`.
3. Symmetric Visibility — Ensure that information displayed to users is represented in structured data, and vice-versa.
4. Explicit Units & Context — Always qualify numbers, times, currencies, and geographies.
5. Immutable Identifiers — Use stable `@id` IRIs or full URLs to prevent entity drift in model embeddings.
### Baseline HTML Structure
```html
LLM-Friendly Markup Explained
{ /* JSON-LD injected in later section */ }
LLM-Friendly Markup Explained
Ada Lovelace
```
### JSON-LD Canonical Pattern
```json-ld
{
"@context": ["https://schema.org", { "aio": "https://aio.fabledsky.com/vocab#" }],
"@id": "https://example.com/llm-friendly-markup#article",
"@type": "Article",
"mainEntityOfPage": { "@id": "https://example.com/llm-friendly-markup" },
"headline": "LLM-Friendly Markup Explained",
"author": {
"@type": "Person",
"name": "Ada Lovelace",
"aio:contributorRole": "Technical Writer"
},
"datePublished": "2025-04-16",
"dateModified": "2025-04-16",
"keywords": [
"LLM",
"Semantic HTML",
"schema.org",
"AIO Standards"
],
"aio:priority": "P1"
}
```
### Correct vs. Incorrect Patterns
Correct: Use `@context` as an array so custom vocab (aio) does not shadow schema.org.
Incorrect:
```json
{ "@context": "https://schema.org aio:https://aio.fabledsky.com/vocab#" }
```
— Space-delimited contexts are invalid JSON; LLMs may tokenize incorrectly.
Correct: Expose author once in visible HTML and JSON-LD.
Incorrect: Hide a different author in structured data for SEO manipulation.
Correct: Declare language on every root `
` element (`lang="en"`).
Incorrect: Omit language; LLM may default to incorrect locale and mis-stem tokens.
### Semantic Neighborhood Design
1. Identify the Canonical Node (Article, Product, Dataset).
2. Map first-degree properties: `headline`, `name`, `description`, `image`, `url`.
3. Attach second-degree entities (Person, Organization) with explicit `@id`s.
4. Validate using both the W3C RDF validator and the AIO linter (`aio lint markup`).
### Microdata + RDFa Hybrid Example
```html
Photon Stabilizer 3000
A compact, LLM-optimized photon stabilizer for quantum workloads.
PS-3000-LLM
$1,999
```
This hybrid approach grants crawlers micro-parseable attributes while preserving JSON-LD as the canonical graph.
### Disambiguation Strategies
• Units: Always annotate with `` or a full `QuantitativeValue`.
• Time: ISO-8601 only (`2025-04-16T05:01:00Z`).
• Person names: first-class `Person` entity rather than a string whenever a biography link exists.
• Acronyms: Provide `rdfs:label` expansions in `@context` when domain-specific.
### Fenced Code Blocks
LLMs treat fenced content as immutable tokens. Use for examples, not for critical semantic cues.
Bad:
```html
```html
...
```
Good:
Separate narrative and fenced blocks, and never nest fences of the same delimiter depth.
### Accessibility Alignment
Because most modern LLMs co-train on accessibility corpora, meeting WCAG 2.2 AA guidelines increases parse fidelity:
• Always pair `aria-label` with visual cues.
• Use `
/` for images; map `figcaption` to `schema:image.caption`.
• Provide language alternatives linked via `inLanguage`.
### Testing Workflow
1. Run `npm run build && aio lint markup ./dist`.
2. Inspect generated triples in Turtle: `aio export ./dist --to turtle`.
3. Load graph into a local SPARQL endpoint; verify canonical node count (`SELECT (COUNT(?s) AS ?count) WHERE { ?s a schema:Article }`). Must equal 1.
4. Pass output through open-source LLM (e.g., Llama-3-instruct) and ask factual recall questions. Expect ≤1% hallucination.
### Versioning and Change Management
• Increment `aio:schemaVersion` in `@context` on breaking changes.
• Keep an immutable archive in `/schema/versions/{semver}/`.
• Notify the AIO Schema mailing list 14 days before deprecation.
### Security and Integrity
To prevent prompt injection via embedded JSON-LD:
• Sanitize user-generated properties server-side.
• Validate IRIs against an allow-list to block malicious `@id` references (e.g., `javascript:` URIs).
• CSP header should include `script-src 'self'` to confine inlined JSON-LD execution context.
### Reference Implementations
• AIO Demo Portal: https://demo.aio.fabledsky.com
• Fabled Sky GitHub Template: https://github.com/FabledSky/aio-markup-starter
### Additional Resources
• W3C HTML Living Standard
• schema.org Full Hierarchy (CSV dump updated nightly)
• “Designing Data-Intensive Applications” — Chapter 12 for data contracts
Adhering to these patterns ensures that your content becomes a first-class citizen in LLM knowledge graphs, reduces semantic drift, and aligns fully with the AIO vision of deterministic, machine-interpretable web experiences.