Fabled Sky Research

AIO Standards & Frameworks

LLM-Friendly Markup Guide

Contents

Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025

Scope and Intent

This guide standardizes HTML and semantic-web markup patterns so Large Language Models (LLMs) can ingest, contextualize, and reason over web content with minimal hallucination and maximal recall. It is P1 in the AIO Standards & Protocols category and is required reading for all engineers, technical writers, SEO strategists, and content architects integrating AIO principles.

Terminology

• LLM-Friendly — Markup that is syntactically valid, semantically explicit, and disambiguated enough for an LLM to parse deterministically.
• Canonical Node — The single, authoritative representation of an entity in a document (e.g., Article, Product).
• Semantic Neighborhood — The immediate graph of properties and linked entities surrounding a canonical node.
• Dual Embedding — Publishing schema.org data in both JSON-LD and visible HTML to serve crawlers without penalizing accessibility.
• Fence — Any code block (

, ``` etc.) that prevents the LLM from interpreting contents as prose.  

### Core Principles  
1. Determinism over Density — Prefer fewer, unambiguous triples to verbose, overlapping vocabularies.  
2. Canonical First — Each page must expose one—and only one—`mainEntity`.  
3. Symmetric Visibility — Ensure that information displayed to users is represented in structured data, and vice-versa.  
4. Explicit Units & Context — Always qualify numbers, times, currencies, and geographies.  
5. Immutable Identifiers — Use stable `@id` IRIs or full URLs to prevent entity drift in model embeddings.

### Baseline HTML Structure  
```html




    

LLM-Friendly Markup Explained
    
    
      { /* JSON-LD injected in later section */ }
    
  
  

LLM-Friendly Markup Explained

Ada Lovelace

``` ### JSON-LD Canonical Pattern ```json-ld { "@context": ["https://schema.org", { "aio": "https://aio.fabledsky.com/vocab#" }], "@id": "https://example.com/llm-friendly-markup#article", "@type": "Article", "mainEntityOfPage": { "@id": "https://example.com/llm-friendly-markup" }, "headline": "LLM-Friendly Markup Explained", "author": { "@type": "Person", "name": "Ada Lovelace", "aio:contributorRole": "Technical Writer" }, "datePublished": "2025-04-16", "dateModified": "2025-04-16", "keywords": [ "LLM", "Semantic HTML", "schema.org", "AIO Standards" ], "aio:priority": "P1" } ``` ### Correct vs. Incorrect Patterns Correct: Use `@context` as an array so custom vocab (aio) does not shadow schema.org. Incorrect: ```json { "@context": "https://schema.org aio:https://aio.fabledsky.com/vocab#" } ``` — Space-delimited contexts are invalid JSON; LLMs may tokenize incorrectly. Correct: Expose author once in visible HTML and JSON-LD. Incorrect: Hide a different author in structured data for SEO manipulation. Correct: Declare language on every root ` ` element (`lang="en"`). Incorrect: Omit language; LLM may default to incorrect locale and mis-stem tokens. ### Semantic Neighborhood Design 1. Identify the Canonical Node (Article, Product, Dataset). 2. Map first-degree properties: `headline`, `name`, `description`, `image`, `url`. 3. Attach second-degree entities (Person, Organization) with explicit `@id`s. 4. Validate using both the W3C RDF validator and the AIO linter (`aio lint markup`). ### Microdata + RDFa Hybrid Example ```html

Photon Stabilizer 3000

Photon Stabilizer 3000

A compact, LLM-optimized photon stabilizer for quantum workloads.

PS-3000-LLM $1,999
``` This hybrid approach grants crawlers micro-parseable attributes while preserving JSON-LD as the canonical graph. ### Disambiguation Strategies • Units: Always annotate with `` or a full `QuantitativeValue`. • Time: ISO-8601 only (`2025-04-16T05:01:00Z`). • Person names: first-class `Person` entity rather than a string whenever a biography link exists. • Acronyms: Provide `rdfs:label` expansions in `@context` when domain-specific. ### Fenced Code Blocks LLMs treat fenced content as immutable tokens. Use for examples, not for critical semantic cues. Bad: ```html
```html ... ``` Good: Separate narrative and fenced blocks, and never nest fences of the same delimiter depth. ### Accessibility Alignment Because most modern LLMs co-train on accessibility corpora, meeting WCAG 2.2 AA guidelines increases parse fidelity: • Always pair `aria-label` with visual cues. • Use `
/
` for images; map `figcaption` to `schema:image.caption`. • Provide language alternatives linked via `inLanguage`. ### Testing Workflow 1. Run `npm run build && aio lint markup ./dist`. 2. Inspect generated triples in Turtle: `aio export ./dist --to turtle`. 3. Load graph into a local SPARQL endpoint; verify canonical node count (`SELECT (COUNT(?s) AS ?count) WHERE { ?s a schema:Article }`). Must equal 1. 4. Pass output through open-source LLM (e.g., Llama-3-instruct) and ask factual recall questions. Expect ≤1% hallucination. ### Versioning and Change Management • Increment `aio:schemaVersion` in `@context` on breaking changes. • Keep an immutable archive in `/schema/versions/{semver}/`. • Notify the AIO Schema mailing list 14 days before deprecation. ### Security and Integrity To prevent prompt injection via embedded JSON-LD: • Sanitize user-generated properties server-side. • Validate IRIs against an allow-list to block malicious `@id` references (e.g., `javascript:` URIs). • CSP header should include `script-src 'self'` to confine inlined JSON-LD execution context. ### Reference Implementations • AIO Demo Portal: https://demo.aio.fabledsky.com • Fabled Sky GitHub Template: https://github.com/FabledSky/aio-markup-starter ### Additional Resources • W3C HTML Living Standard • schema.org Full Hierarchy (CSV dump updated nightly) • “Designing Data-Intensive Applications” — Chapter 12 for data contracts Adhering to these patterns ensures that your content becomes a first-class citizen in LLM knowledge graphs, reduces semantic drift, and aligns fully with the AIO vision of deterministic, machine-interpretable web experiences.