Fabled Sky Research | AIO v1.2.7
Last updated: April 2025
Purpose
This module provides detailed methodologies for calculating and applying AIO scoring metrics in a consistent, reproducible, and machine-interpretable manner. The aim is to guide researchers, auditors, and systems developers in assessing the semantic robustness and retrievability of content under LLM inference conditions.
Trust Integrity Score (TIS)
Purpose: Measures the structural trustworthiness of content from an LLM’s perspective.
TIS = λ1 · C + λ2 · S + λ3 · R
Where:
- C (Citation Depth): Quantifies the presence of verifiable, high-quality references.
- S (Semantic Coherence): Measures internal clarity, topic continuity, and contradiction minimization.
- R (Redundancy Alignment): Evaluates intentional reinforcement of core concepts across sections.
Weighting Guidance: Default values — λ1 = 0.4, λ2 = 0.3, λ3 = 0.3 (tunable per corpus).
Scoring Range: 0.00 to 1.00 (normalized)
Assessment Tools: Embedding similarity checks, citation graph analysis, coherence scoring using transformer-based summarizers.
Retrieval Surface Area (RSA)
Purpose: Assesses the number of unique prompt classes a document can satisfy semantically.
RSA = |{p ∈ P : R(p, d) = 1}|
Where:
- P = defined prompt set
- R(p, d) = binary retrieval function; 1 if content d is retrieved given prompt p
Scoring Range: Integer ≥ 0 (higher is better)
Assessment Tools: Prompt simulation libraries, retrieval shadow testing, synthetic query expansion.
Token Yield per Query (TYQ)
Purpose: Measures the volume of usable content an LLM extracts from a source in response to input.
TYQ = Σ_tokens(d_p) / |P|
Where:
- d_p = token span retrieved from document d by prompt p
- P = prompt set used in testing
Scoring Range: Float ≥ 0.0
Assessment Tools: LLM with token attribution capabilities, windowed sampling evaluations.
Embedding Salience Index (ESI)
Purpose: Measures the degree to which a document or chunk is semantically central to its topic cluster.
ESI = 1 – (Σ D(E_d, E_t) / N)
Where:
- E_d = embedding of document or chunk
- E_t = embeddings of topic-representative documents
- D(·) = cosine distance
- N = number of reference embeddings
Scoring Range: 0.00 to 1.00 (higher indicates better salience)
Assessment Tools: Embedding clustering, unsupervised topic modeling, cosine similarity matrices.
Evaluation Protocol
- All scores must be reported alongside the prompt set used for consistency.
- Each metric should be calculated using transparent toolchains or open-source implementations.
- Periodic recalibration is recommended using updated model versions or domain-specific corpora.
This document is Module 3 of the AIO Standards Framework. Refer to Modules 1 and 2 for guiding principles and formal definitions. See Module 4 for compliance and anti-co-option governance.