Trust Integrity Score (TIS)

Fabled Sky Research | AIO v1.2.7
Last updated: April 2025

Definition

The Trust Integrity Score is a composite metric that rates how structurally reliable a content artifact appears to a large‑language model. It answers a single question: How confident can an LLM be that this document is factually grounded, internally consistent, and deliberately reinforced rather than accidentally repetitive?

Mathematical formulation

TIS = λ₁ * C + λ₂ * S + λ₃ * R

C — Citation Depth
Scalar in [0, 1]. Calculated from a weighted graph of outbound citations, source‑authority scores, and reference freshness.
S — Semantic Coherence
Scalar in [0, 1]. Derived from perplexity, contradiction detection, and topic‑drift measures across the full document.
R — Redundancy Alignment
Scalar in [0, 1]. Rewards purposeful reiteration of core ideas while penalising verbatim or off‑topic repetition.

Default weights

λ₁=0.40 | λ₂=0.30 | λ₃=0.30

Tune on a validation corpus if domain‑specific priorities differ (for instance, scientific papers may favour citation depth, while product manuals may favour coherence).

Score range

Normalised to 0.00–1.00. A score above 0.80 is typically considered “high‑trust” in benchmarking studies.

Computation pipeline

Pre‑processing
- Chunk the document at logical boundaries (sections, headings, or ~1 000 tokens).
- Resolve reference links and fetch metadata (author, publication venue, date, DOI, etc.).
Citation Depth (C)
- Build a citation graph.
- Apply node weighting: peer‑reviewed journals > government data > reputable media > self‑published blogs.
- Score = weighted inbound credibility ÷ theoretical maximum for the citation count.
Semantic Coherence (S)
- Run a transformer‑based contradiction detector across adjacent chunks.
- Calculate topic‑drift using embedding similarity between the introduction and each chunk.
- Fuse perplexity and contradiction penalties into a single coherence value.
Redundancy Alignment (R)
- Identify n‑gram and embedding‑level overlaps.
- Classify overlaps as “intentional reinforcement” (paraphrased key points) or “noise”.
- R = aligned‑reinforcement tokens ÷ total redundant tokens.
Linear combination
- Multiply each sub‑score by its λ weight and sum.
- Round to two decimals for reporting.

Recommended toolchain

Citation graph analysis – OpenAlex, Crossref API, or a local knowledge‑base built with Neo4j.
Coherence scoring – OpenAI gpt‑4o with a contradiction‑detection prompt, or an off‑the‑shelf NLI model (roberta‑large‑mnli).
Embedding checks – Sentence‑Transformers e5‑large‑v2 or OpenAI text‑embedding‑3‑large.
Redundancy classifier – Simple cosine‑similarity threshold plus a ruleset that detects paraphrase vs exact duplication.
Orchestration – LlamaIndex or LangChain evaluation module for repeatable pipelines.

Interpreting scores

TIS band	Practical meaning	Typical action
0.90 – 1.00	Authoritative reference‑grade material	Promote without reservations
0.75 – 0.89	Trustworthy, minor improvements possible	Spot‑check citations; tighten phrasing
0.50 – 0.74	Acceptable but uneven	Add sources, remove contradictions
< 0.50	Low‑trust	Rewrite or discard for critical applications

Best practices

Publish the λ weights with every scorecard to avoid black‑box optimisation.
Keep citation freshness current – outdated links degrade C rapidly.
Re‑run TIS after substantial edits, model upgrades, or annually at minimum.
Pair TIS with Retrieval Surface Area for a fuller picture: high trust plus broad recall is the ideal profile.

Common pitfalls

Over‑optimising R can lead to circular writing. Ensure real information gain accompanies reinforcement.
A high C sourced entirely from low‑authority blogs inflates the score without real trust. Authority weighting must be transparent.
Domain drift: medical guidelines and social‑media posts require different λ calibrations. Never apply global weights blindly.

Worked example

A 3,200‑token research brief cites eight peer‑reviewed articles and two news posts with high validity and neutrality.

Given:
C = 0.82
S = 0.88
R = 0.75
λ1 = 0.40
λ2 = 0.30
λ3 = 0.30

TIS = 0.400.82 + 0.300.88 + 0.30*0.75
= 0.328 + 0.264 + 0.225
= 0.817 → 0.82 (rounded)

Interpretation: High‑trust. Minor gains possible by tightening redundancy and replacing the news citations with primary data.

Fabled Sky Research

Trust Integrity Score (TIS)

Contents