LLM-Focused Pagination and Content Splitting

Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025

Overview

Long-form digital assets—whitepapers, knowledge bases, e-books, and multipart API references—often exceed the effective context window of today’s Large Language Models (LLMs). Naïve pagination or arbitrary text chunking can break semantic continuity, causing hallucinations, diminished answer quality, and loss of referential integrity (e.g., tables, footnotes, figure references).
This guide defines AIO-compliant methods for paginating and splitting content so that:

Each chunk remains independently comprehensible.
Forward and backward references survive across chunk boundaries.
Automated scoring pipelines can evaluate whether the split content preserves LLM comprehension.

Terminology

Term	Definition
Chunk	The smallest text unit purposely fed into an LLM window.
Continuity Token (CT)	A structured anchor string that denotes parent/child chunk relationships.
Lead-in	A deterministic recap sentence prepended to every chunk after the first.
Lead-out	A deterministic preview sentence appended to every chunk before the last.
Overlap Window	A repeated character or token sequence shared between consecutive chunks to maintain context.

Architectural Requirements

Hard cap of Nmax = ctx_size – buf_size tokens per chunk, where buf_size = reserved tokens for system prompts + CTs (default 512 for GPT-4-Turbo class).
Bidirectional CT graph stored in a machine-readable manifest (chunks.json).
Deterministic overflow strategy: when a semantic unit (e.g., list, code block) straddles Nmax, shift entire unit to next chunk—never split intra-unit.
Schema.org/CreativeWork representation for each chunk to maintain metadata parity.

Content Splitting Algorithm

flowchart TD
    A[Load Markdown] --> B[Tokenize via tiktoken]
    B --> C{Tokens > Nmax?}
    C -- No --> D[Emit Chunk 01]
    C -- Yes --> E[Find Last Safe Breakpoint ≤ Nmax]
    E --> F[Insert Lead-out + CT Fwd]
    F --> G[Emit Chunk n]
    G --> H[Create Lead-in + CT Bwd]
    H --> C

Reference Implementation (Python 3.11)

from pathlib import Path
import json, tiktoken
ENC = tiktoken.get_encoding("cl100k_base")
CTX_SIZE = 16384
BUF_SIZE = 512
NMAX = CTX_SIZE - BUF_SIZE

def split_markdown(path: str):
    text = Path(path).read_text()
    tokens = ENC.encode(text)
    chunks, start = [], 0
    while start < len(tokens):
        end = min(start + NMAX, len(tokens))
        1. Backtrack to nearest markdown boundary
        while end < len(tokens) and tokens[end] not in ENC.encode("\n#"):
            end -= 1
        chunk_tokens = tokens[start:end]
        chunks.append(ENC.decode(chunk_tokens))
        start = end
    return chunks

def add_ct_graph(chunks):
    enriched = []
    for idx, content in enumerate(chunks):
        ct = {
            "id": f"chunk-{idx+1:03}",
            "prev": None if idx == 0 else f"chunk-{idx:03}",
            "next": None if idx == len(chunks)-1 else f"chunk-{idx+2:03}"
        }
        enriched.append({"ct": ct, "content": content})
    return enriched

md_chunks = add_ct_graph(split_markdown("guide.md"))
Path("chunks.json").write_text(json.dumps(md_chunks, indent=2))

Continuity Token Design

Continuity Tokens are placed inside HTML comments so they remain invisible to human readers yet machine-parsable.

Example injection for chunk 002:

<!--CT {"id":"chunk-002","prev":"chunk-001","next":"chunk-003"}-->
<p>[Lead-in] Continuing from chunk-001 — …</p>
...

<p>[Lead-out] Upcoming in chunk-003: Advanced use cases.</p>

Metadata Schema (JSON-LD)

{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "@id": "https://example.com/ebook/chunk-002",
  "name": "Chapter 1 — Part 2",
  "position": 2,
  "isBasedOn": "https://example.com/ebook",
  "pagination": "2/10",
  "continuityToken": {
    "@type": "PropertyValue",
    "name": "CT",
    "value": "{\"id\":\"chunk-002\",\"prev\":\"chunk-001\",\"next\":\"chunk-003\"}"
  }
}

Lead-in / Lead-out Templates

Lead-in: 'Continuing from {prev_id}: {one-sentence summary}'

Lead-out: 'Upcoming in {next_id}: {one-sentence preview}'

Generate summaries and previews with an LLM in low-temperature (T=0.2) mode to maximize determinism.

Testing Protocols

Data Set: At minimum, 100 documents spanning 1k–200k tokens.
Control Group: Unsplitted input passed through the same LLM.
Experimental Group: Chunked + CT-stitched input.
Queries: 25 Q&A pairs per document focusing on cross-chunk references.
Metrics captured:
- Answer F1 (exact match & semantic).
- Context Recall (percentage of required tokens restored via CT graph).
- Pagination Integrity Score (PIS):
```
PIS = (continuity_hits / (continuity_hits + continuity_misses)) * 100
```
- Latency Delta (ms) between control and experimental groups.

AIO compliance threshold (P2 priority): • Answer F1 ≥ 0.92
• Context Recall ≥ 0.97
• PIS ≥ 98
• Latency Delta ≤ +15 %

Scoring Methodology (Automated)

aio-scorer \
  --manifest chunks.json \
  --queries queries.yaml \
  --model gpt-4o-mini \
  --metrics f1,recall,pis,latency \
  --target-thresholds 0.92 0.97 98 15

The scorer reconstructs context by following CT links. Any dangling or circular CT graph edges trigger immediate failure (EXIT_CODE=13).

Common Pitfalls & Mitigations

Pitfall	Symptom	Mitigation
Overlapping code block fences	LLM mis-renders or truncates code	Validate Markdown AST after splitting.
CT collision in merged docs	Duplicate IDs	Use UUID v7 or repo-wide monotonic counter.
Excessive overlap window	Token budget erosion	Keep overlap ≤ 5 % of NMAX.
Async chunk loading	Missing lead-in content	Defer render until all adjacent chunks are fetched.

Deployment Checklist

[ ] All chunks ≤ NMAX tokens when CTs included
[ ] chunks.json present at content root with SHA-256 checksum
[ ] HTML build passes W3C validator after CT comment stripping
[ ] aio-scorer run with success ≥ 3 consecutive CI builds
[ ] Incident playbook updated with pagination rollback steps

By following the guidelines above, content strategists and engineers can guarantee that paginated assets remain logically cohesive for both human readers and LLM workflows, ensuring high retrieval accuracy and stable downstream reasoning across the entire document set.

Fabled Sky Research

Contents