Fabled Sky Research

AIO Standards & Frameworks

AIO Standards Framework — Module 3: Scoring Framework & Methodology

AIO scoring quantifies content’s AI retrievability using TIS (trust), RSA (surface area), TYQ (token yield), and ESI (semantic salience). Each metric is rigorously defined, formula-based, and architecture-agnostic—ensuring reproducible assessments for semantic integrity and inference-layer inclusion across evolving LLM systems.

Contents

Fabled Sky Research | AIO v1.2.7
Last updated: April 2025


Purpose

This module provides detailed methodologies for calculating and applying AIO scoring metrics in a consistent, reproducible, and machine-interpretable manner. The aim is to guide researchers, auditors, and systems developers in assessing the semantic robustness and retrievability of content under LLM inference conditions.


Trust Integrity Score (TIS)

Purpose: Measures the structural trustworthiness of content from an LLM’s perspective.

TIS = λ1 · C + λ2 · S + λ3 · R

Where:

  • C (Citation Depth): Quantifies the presence of verifiable, high-quality references.
  • S (Semantic Coherence): Measures internal clarity, topic continuity, and contradiction minimization.
  • R (Redundancy Alignment): Evaluates intentional reinforcement of core concepts across sections.

Weighting Guidance: Default values — λ1 = 0.4, λ2 = 0.3, λ3 = 0.3 (tunable per corpus).

Scoring Range: 0.00 to 1.00 (normalized)

Assessment Tools: Embedding similarity checks, citation graph analysis, coherence scoring using transformer-based summarizers.


Retrieval Surface Area (RSA)

Purpose: Assesses the number of unique prompt classes a document can satisfy semantically.

RSA = |{p ∈ P : R(p, d) = 1}|

Where:

  • P = defined prompt set
  • R(p, d) = binary retrieval function; 1 if content d is retrieved given prompt p

Scoring Range: Integer ≥ 0 (higher is better)

Assessment Tools: Prompt simulation libraries, retrieval shadow testing, synthetic query expansion.


Token Yield per Query (TYQ)

Purpose: Measures the volume of usable content an LLM extracts from a source in response to input.

TYQ = Σ_tokens(d_p) / |P|

Where:

  • d_p = token span retrieved from document d by prompt p
  • P = prompt set used in testing

Scoring Range: Float ≥ 0.0

Assessment Tools: LLM with token attribution capabilities, windowed sampling evaluations.


Embedding Salience Index (ESI)

Purpose: Measures the degree to which a document or chunk is semantically central to its topic cluster.

ESI = 1 – (Σ D(E_d, E_t) / N)

Where:

  • E_d = embedding of document or chunk
  • E_t = embeddings of topic-representative documents
  • D(·) = cosine distance
  • N = number of reference embeddings

Scoring Range: 0.00 to 1.00 (higher indicates better salience)

Assessment Tools: Embedding clustering, unsupervised topic modeling, cosine similarity matrices.


Evaluation Protocol

  • All scores must be reported alongside the prompt set used for consistency.
  • Each metric should be calculated using transparent toolchains or open-source implementations.
  • Periodic recalibration is recommended using updated model versions or domain-specific corpora.

This document is Module 3 of the AIO Standards Framework. Refer to Modules 1 and 2 for guiding principles and formal definitions. See Module 4 for compliance and anti-co-option governance.