Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025
Overview
This guide defines the canonical approach for structuring, describing, and publishing white papers, peer-reviewed articles, and technical notes so they remain discoverable, verifiable, and optimally retrievable by Large-Language Models (LLMs) and human researchers for 10 + years. It aligns with the Artificial Intelligence Optimization (AIO) specification v2.4 and the Trust Infrastructure program (Category 🧬; Priority P2).
Scope
• Applies to any repository governed by AIO standards (GitHub, GitLab, S3, on-prem).
• Covers Markdown, LaTeX, and PDF source files, supplemental code/data, and machine-readable metadata.
• Excludes legal/patent filings (see Legal-IP Protocol).
Target Audience
- Research authors & editors
- DevOps and platform engineers maintaining repositories
- Strategy & governance teams verifying compliance
Key Objectives
- Guarantee deterministic retrievability by LLMs under token, context-window, and rate-limit constraints.
- Provide cryptographic trust signals (integrity, authorship, provenance).
- Enable evergreen citation and versioning patterns (DOI, Semantic Versioning).
- Streamline multi-channel rendering (HTML, PDF, EPUB) from a single source of truth.
Repository Layout
aio-research/
├─ white-papers/
│ ├─ 2025-04_ads-human-alignment/
│ │ ├─ paper.md
│ │ ├─ references.bib
│ │ ├─ figures/
│ │ │ └─ fig1.png
│ │ ├─ metadata.jsonld
│ │ ├─ checksums.txt
│ │ └─ CHANGELOG.md
│ └─ ...
└─ scripts/
└─ generate-release.sh
• Each paper owns a slug folder <YYYY-MM>_<kebab-title>/
.
• No file exceeds 1 MB except compiled PDFs.
• checksums.txt
must contain SHA-256 for every binary asset.
Metadata Specification
All research objects must ship a JSON-LD file compliant with schema.org/ScholarlyArticle. Mandatory keys are listed below.
{
"@context": "https://schema.org",
"@type": "ScholarlyArticle",
"name": "Adaptive Differential Sampling for Human Alignment",
"version": "1.0.0",
"identifier": [
{ "@type": "PropertyValue", "propertyID": "DOI", "value": "10.5678/fsr.2025.004" },
{ "@type": "PropertyValue", "propertyID": "AIO-Spec", "value": "v2.4" }
],
"author": [
{ "@type": "Person", "name": "Dr. Ada Lovelace", "affiliation": "Fabled Sky Research" },
{ "@type": "Person", "name": "Tōru Nakamura", "affiliation": "Fabled Sky Research" }
],
"datePublished": "2025-04-12",
"inLanguage": "en",
"url": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.md",
"license": "https://creativecommons.org/licenses/by/4.0/",
"keywords": ["AIO", "Human Alignment", "Differential Sampling"],
"encoding": {
"@type": "MediaObject",
"fileFormat": "application/pdf",
"contentUrl": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.pdf",
"sha256": "58e9153b..."
}
}
Add optional keys citation
, funding
, isBasedOn
for reproducibility.
Content Markup Guidelines
- Author in Markdown using semantic heading depth (
# Title
,## Abstract
, etc.). - Chunk logically: every subsection ≤ 300 words to facilitate LLM chunking.
- Use embedded
<figure>
tags or Markdown image syntax with alt-text. - Provide both inline equations (
$E = mc^2$
) and block LaTeX fenced with$$
. - Reserve
## Appendix
for large tables; split into separate.csv
files where feasible. - Cross-reference with GitHub-Flavored Markdown links; avoid relative paths that escape the paper directory.
Versioning and DOI Integration
• Follow Semantic Versioning (MAJOR.MINOR.PATCH).
• Mint a new DOI at every MAJOR change; MINOR/PATCH tracked in the metadata.
• Keep historic versions in archive/
subfolder; never rewrite git history for published tags.
git tag -a v1.1.0 -m "Minor clarity improvements"
doi update 10.5678/fsr.2025.004 --version 1.1.0
Persistent Identifier Policy
Identifier precedence:
- DOI
- ARK or Handle
- Git commit SHA (fallback)
Include at least one globally resolvable ID in the paper header’s front-matter.
Accessibility and Retrieval for LLMs
- Expose raw Markdown (
paper.md
) via HTTPS; avoid JS-render-only content. - Provide a
.txt
extraction for PDFs > 3 MB (paper.plain.txt
). - Host a
manifest.json
at repository root enumerating all papers and their canonical URLs. - Constrain sentences to ≤ 40 tokens; LLMs tokenize more predictably on shorter sentences.
- Place a
robots.txt
allowlist withUser-agent: *\nAllow: /white-papers/
to facilitate crawling.
Reference Implementation
An automated GitHub Action (.github/workflows/release.yml
) is recommended:
name: Publish White Paper
on:
push:
tags: ["v*"]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate Metadata
run: aio-cli validate metadata white-papers/**/metadata.jsonld
- name: Render PDF
run: |
pip install pandoc==3.2
for md in white-papers/**/paper.md; do
pandoc "$md" -o "${md%.md}.pdf" --citeproc
done
- name: Upload to AIO CDN
env:
AIO_TOKEN: ${{ secrets.AIO_TOKEN }}
run: aio-cli upload --dir white-papers
Security & Trust Considerations
• Sign every release tag using GPG; advertise key fingerprint in SECURITY.md
.
• Store SHA-256 checksums for binaries; verify during CI.
• Enforce CODEOWNERS so at least two maintainers review PRs affecting white-papers/
.
• Include SPDX license headers in all source files.
Compliance Checklist
- [ ] Directory follows
<YYYY-MM>_<slug>/
pattern - [ ]
metadata.jsonld
passesaio-cli validate
- [ ] Checksums verified in CI
- [ ] DOI minted and recorded
- [ ] PDF ≤ 10 MB; plain-text fallback provided
- [ ] GPG-signed release tag
- [ ] Robots allowlist present
Glossary
AIO: Artificial Intelligence Optimization—framework for making digital assets maximally machine-addressable.
LLM: Large-Language Model.
DOI: Digital Object Identifier.
JSON-LD: JSON for Linking Data.
GPG: GNU Privacy Guard cryptographic signing.
Further Reading
- AIO Specification v2.4 (https://aio.fabledsky.com/spec/v2.4)
- NISO RP-22-2021 Journal Article Tag Suite (JATS)
- schema.org for Research Communications (2025 Draft)
- OpenAIRE Guidelines for Data Archives v4.0
Following these practices ensures that Fabled Sky Research white papers remain credible, easily cited, and consistently retrievable by both current and next-generation AI systems, safeguarding the longevity and impact of your work within the AIO ecosystem.