Fabled Sky Research | AIO v1.2.7
Last updated: April 2025
Definition
The Trust Integrity Score is a composite metric that rates how structurally reliable a content artifact appears to a large‑language model. It answers a single question: How confident can an LLM be that this document is factually grounded, internally consistent, and deliberately reinforced rather than accidentally repetitive?
Mathematical formulation
TIS = λ1 * C + λ2 * S + λ3 * R
- C — Citation Depth
Scalar in [0, 1]. Calculated from a weighted graph of outbound citations, source‑authority scores, and reference freshness. - S — Semantic Coherence
Scalar in [0, 1]. Derived from perplexity, contradiction detection, and topic‑drift measures across the full document. - R — Redundancy Alignment
Scalar in [0, 1]. Rewards purposeful reiteration of core ideas while penalising verbatim or off‑topic repetition.
Default weights
λ1=0.40 | λ2=0.30 | λ3=0.30
Tune on a validation corpus if domain‑specific priorities differ (for instance, scientific papers may favour citation depth, while product manuals may favour coherence).
Score range
Normalised to 0.00–1.00. A score above 0.80 is typically considered “high‑trust” in benchmarking studies.
Computation pipeline
- Pre‑processing
- Chunk the document at logical boundaries (sections, headings, or ~1 000 tokens).
- Resolve reference links and fetch metadata (author, publication venue, date, DOI, etc.).
- Citation Depth (C)
- Build a citation graph.
- Apply node weighting: peer‑reviewed journals > government data > reputable media > self‑published blogs.
- Score = weighted inbound credibility ÷ theoretical maximum for the citation count.
- Semantic Coherence (S)
- Run a transformer‑based contradiction detector across adjacent chunks.
- Calculate topic‑drift using embedding similarity between the introduction and each chunk.
- Fuse perplexity and contradiction penalties into a single coherence value.
- Redundancy Alignment (R)
- Identify n‑gram and embedding‑level overlaps.
- Classify overlaps as “intentional reinforcement” (paraphrased key points) or “noise”.
- R = aligned‑reinforcement tokens ÷ total redundant tokens.
- Linear combination
- Multiply each sub‑score by its λ weight and sum.
- Round to two decimals for reporting.
Recommended toolchain
- Citation graph analysis – OpenAlex, Crossref API, or a local knowledge‑base built with Neo4j.
- Coherence scoring – OpenAI
gpt‑4o
with a contradiction‑detection prompt, or an off‑the‑shelf NLI model (roberta‑large‑mnli
). - Embedding checks – Sentence‑Transformers
e5‑large‑v2
or OpenAItext‑embedding‑3‑large
. - Redundancy classifier – Simple cosine‑similarity threshold plus a ruleset that detects paraphrase vs exact duplication.
- Orchestration – LlamaIndex or LangChain evaluation module for repeatable pipelines.
Interpreting scores
TIS band | Practical meaning | Typical action |
---|---|---|
0.90 – 1.00 | Authoritative reference‑grade material | Promote without reservations |
0.75 – 0.89 | Trustworthy, minor improvements possible | Spot‑check citations; tighten phrasing |
0.50 – 0.74 | Acceptable but uneven | Add sources, remove contradictions |
< 0.50 | Low‑trust | Rewrite or discard for critical applications |
Best practices
- Publish the λ weights with every scorecard to avoid black‑box optimisation.
- Keep citation freshness current – outdated links degrade C rapidly.
- Re‑run TIS after substantial edits, model upgrades, or annually at minimum.
- Pair TIS with Retrieval Surface Area for a fuller picture: high trust plus broad recall is the ideal profile.
Common pitfalls
- Over‑optimising R can lead to circular writing. Ensure real information gain accompanies reinforcement.
- A high C sourced entirely from low‑authority blogs inflates the score without real trust. Authority weighting must be transparent.
- Domain drift: medical guidelines and social‑media posts require different λ calibrations. Never apply global weights blindly.
Worked example
A 3,200‑token research brief cites eight peer‑reviewed articles and two news posts with high validity and neutrality.
Given:
C = 0.82
S = 0.88
R = 0.75
λ1 = 0.40
λ2 = 0.30
λ3 = 0.30
TIS = 0.400.82 + 0.300.88 + 0.30*0.75
= 0.328 + 0.264 + 0.225
= 0.817 → 0.82 (rounded)
Interpretation: High‑trust. Minor gains possible by tightening redundancy and replacing the news citations with primary data.