Retrieval Surface Area (RSA)

Definition

RSA measures breadth of eligibility: how many distinct prompt classes can successfully retrieve a document. A high RSA means the content is semantically versatile and discoverable across varied user intents; a low RSA means it answers only a narrow set of questions.

Mathematical formulation

RSA = | { p ∈ P : R(p, d) = 1 } |

P — predefined prompt set (your evaluation battery)

R(p, d) — binary retrieval function

1 → document d is retrieved (top‑k, similarity threshold, or ranking rule met)
→ document not retrieved for prompt p

Score range

Integer ≥ 0 (higher = broader surface area)

Computation Pipeline

Prompt‑set design
- Build a taxonomy of intents (definition, how‑to, comparison, critique, etc.).
- Generate or hand‑craft representative prompts for each intent.
- Typical size: 500–2 000 prompts for reliable Monte‑Carlo estimates.
Retrieval function R(p, d)
- Choose a retrieval method (vector similarity, BM25, hybrid).
- Define a success rule:
  Top‑k (document appears within top‑k results) — or —
  Threshold (similarity ≥ τ).
- Run the retrieval engine for every prompt.
Surface‑area count

RSA = count_successes(P)

The final integer is simply the number of prompts for which R = 1.

Normalisation (optional)

RSA_norm = RSA / |P|     # value in 0.00–1.00

Use this if you want a proportional metric instead of a raw count.

Interpreting RSA Values

RSA (raw)	RSA_norm	Practical meaning	Typical action
> 800	> 0.80	Exceptional breadth	Prioritise for indexing; surface in more answer contexts
500–800	0.50–0.80	Broad coverage	Good; refine for niche queries
200–500	0.20–0.50	Moderate	Expand content to cover additional intents
< 200	< 0.20	Narrow	Add sections/examples; improve metadata

(Assumes |P| ≈ 1 000 prompts; adjust thresholds proportionally.)

Recommended Toolchain

Task	Suggested Tools
Prompt generation	LlamaIndex `QuestionGenerator`, GPT‑4o bulk prompt scripts
Retrieval engine	FAISS / Elasticsearch hybrid; or OpenAI `embeddings + rerank`
Shadow testing	LangChain `BenchmarkRetrievalEvaluator`
Synthetic expansion	GPT‑4o “rewrite prompt in N variant intents” loop

Best‑Practice Guidelines

Publish the prompt taxonomy. RSA only has meaning when others can inspect or replicate P.
Include domain‑specific prompts. Generic Q&A often over‑estimates surface area.
Refresh annually. New user intents emerge; obsolete prompts skew the metric.
Combine with TIS. High RSA + high Trust Integrity Score = ideal retrieval profile.
Guard against prompt stuffing. Inflating |P| with near‑duplicates makes RSA easier to boost without real breadth.

Common Pitfalls

Over‑broad success rule (e.g., top‑50 instead of top‑5) inflates RSA artificially.
Ignoring multilingual prompts; if your audience is global, RSA should test other languages.
Assuming bigger is always better; a highly specialised document can be useful even at low RSA—context matters.

 Worked Example

Setup:
|P| = 1,000 prompts
Success rule: document must appear in top‑10 results

Results:
Retrieval successes = 643
RSA (raw) = 643
RSA_norm = 643 / 1,000 ≈ 0.64

Interpretation – The document satisfies 64 % of tested intents: strong but with room to expand into remaining niches.

Fabled Sky Research

Contents

Definition

Mathematical formulation

Score range

Computation Pipeline

Interpreting RSA Values

Recommended Toolchain

Best‑Practice Guidelines

Common Pitfalls

Worked Example

Fabled Sky Research

Retrieval Surface Area (RSA)

Contents

Definition

Mathematical formulation

Score range

Computation Pipeline

Interpreting RSA Values

Recommended Toolchain

Best‑Practice Guidelines

Common Pitfalls

Worked Example

Retrieval Surface Area (RSA)

 Worked Example