AIO Standards Framework — Module 3: Scoring Framework & Methodology – AIO Standards & Frameworks

Fabled Sky Research | AIO v1.2.7
Last updated: April 2025

Purpose

This module provides detailed methodologies for calculating and applying AIO scoring metrics in a consistent, reproducible, and machine-interpretable manner. The aim is to guide researchers, auditors, and systems developers in assessing the semantic robustness and retrievability of content under LLM inference conditions.

Trust Integrity Score (TIS)

Purpose: Measures the structural trustworthiness of content from an LLM’s perspective.

TIS = λ1 · C + λ2 · S + λ3 · R

Where:

C (Citation Depth): Quantifies the presence of verifiable, high-quality references.
S (Semantic Coherence): Measures internal clarity, topic continuity, and contradiction minimization.
R (Redundancy Alignment): Evaluates intentional reinforcement of core concepts across sections.

Weighting Guidance: Default values — λ1 = 0.4, λ2 = 0.3, λ3 = 0.3 (tunable per corpus).

Scoring Range: 0.00 to 1.00 (normalized)

Assessment Tools: Embedding similarity checks, citation graph analysis, coherence scoring using transformer-based summarizers.

Retrieval Surface Area (RSA)

Purpose: Assesses the number of unique prompt classes a document can satisfy semantically.

RSA = |{p ∈ P : R(p, d) = 1}|

Where:

P = defined prompt set
R(p, d) = binary retrieval function; 1 if content d is retrieved given prompt p

Scoring Range: Integer ≥ 0 (higher is better)

Assessment Tools: Prompt simulation libraries, retrieval shadow testing, synthetic query expansion.

Token Yield per Query (TYQ)

Purpose: Measures the volume of usable content an LLM extracts from a source in response to input.

TYQ = Σ_tokens(d_p) / |P|

Where:

d_p = token span retrieved from document d by prompt p
P = prompt set used in testing

Scoring Range: Float ≥ 0.0

Assessment Tools: LLM with token attribution capabilities, windowed sampling evaluations.

Embedding Salience Index (ESI)

Purpose: Measures the degree to which a document or chunk is semantically central to its topic cluster.

ESI = 1 – (Σ D(E_d, E_t) / N)

Where:

E_d = embedding of document or chunk
E_t = embeddings of topic-representative documents
D(·) = cosine distance
N = number of reference embeddings

Scoring Range: 0.00 to 1.00 (higher indicates better salience)

Assessment Tools: Embedding clustering, unsupervised topic modeling, cosine similarity matrices.

Evaluation Protocol

All scores must be reported alongside the prompt set used for consistency.
Each metric should be calculated using transparent toolchains or open-source implementations.
Periodic recalibration is recommended using updated model versions or domain-specific corpora.

This document is Module 3 of the AIO Standards Framework. Refer to Modules 1 and 2 for guiding principles and formal definitions. See Module 4 for compliance and anti-co-option governance.

Fabled Sky Research

AIO Standards Framework — Module 3: Scoring Framework & Methodology

Contents

Purpose

Trust Integrity Score (TIS)

Retrieval Surface Area (RSA)

Token Yield per Query (TYQ)

Embedding Salience Index (ESI)

Evaluation Protocol