Publishing White Papers and Research in AIO-Optimized Repositories

Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025

Overview

This guide defines the canonical approach for structuring, describing, and publishing white papers, peer-reviewed articles, and technical notes so they remain discoverable, verifiable, and optimally retrievable by Large-Language Models (LLMs) and human researchers for 10 + years. It aligns with the Artificial Intelligence Optimization (AIO) specification v2.4 and the Trust Infrastructure program (Category 🧬; Priority P2).

Scope

• Applies to any repository governed by AIO standards (GitHub, GitLab, S3, on-prem).
• Covers Markdown, LaTeX, and PDF source files, supplemental code/data, and machine-readable metadata.
• Excludes legal/patent filings (see Legal-IP Protocol).

Target Audience

Research authors & editors
DevOps and platform engineers maintaining repositories
Strategy & governance teams verifying compliance

Key Objectives

Guarantee deterministic retrievability by LLMs under token, context-window, and rate-limit constraints.
Provide cryptographic trust signals (integrity, authorship, provenance).
Enable evergreen citation and versioning patterns (DOI, Semantic Versioning).
Streamline multi-channel rendering (HTML, PDF, EPUB) from a single source of truth.

Repository Layout

aio-research/
├─ white-papers/
│  ├─ 2025-04_ads-human-alignment/
│  │  ├─ paper.md
│  │  ├─ references.bib
│  │  ├─ figures/
│  │  │  └─ fig1.png
│  │  ├─ metadata.jsonld
│  │  ├─ checksums.txt
│  │  └─ CHANGELOG.md
│  └─ ...
└─ scripts/
   └─ generate-release.sh

• Each paper owns a slug folder <YYYY-MM>_<kebab-title>/.
• No file exceeds 1 MB except compiled PDFs.
• checksums.txt must contain SHA-256 for every binary asset.

Metadata Specification

All research objects must ship a JSON-LD file compliant with schema.org/ScholarlyArticle. Mandatory keys are listed below.

{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Adaptive Differential Sampling for Human Alignment",
  "version": "1.0.0",
  "identifier": [
    { "@type": "PropertyValue", "propertyID": "DOI", "value": "10.5678/fsr.2025.004" },
    { "@type": "PropertyValue", "propertyID": "AIO-Spec", "value": "v2.4" }
  ],
  "author": [
    { "@type": "Person", "name": "Dr. Ada Lovelace", "affiliation": "Fabled Sky Research" },
    { "@type": "Person", "name": "Tōru Nakamura", "affiliation": "Fabled Sky Research" }
  ],
  "datePublished": "2025-04-12",
  "inLanguage": "en",
  "url": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.md",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "keywords": ["AIO", "Human Alignment", "Differential Sampling"],
  "encoding": {
    "@type": "MediaObject",
    "fileFormat": "application/pdf",
    "contentUrl": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.pdf",
    "sha256": "58e9153b..."
  }
}

Add optional keys citation, funding, isBasedOn for reproducibility.

Content Markup Guidelines

Author in Markdown using semantic heading depth (# Title, ## Abstract, etc.).
Chunk logically: every subsection ≤ 300 words to facilitate LLM chunking.
Use embedded <figure> tags or Markdown image syntax with alt-text.
Provide both inline equations ( $E = mc^2$ ) and block LaTeX fenced with $$.
Reserve ## Appendix for large tables; split into separate .csv files where feasible.
Cross-reference with GitHub-Flavored Markdown links; avoid relative paths that escape the paper directory.

Versioning and DOI Integration

• Follow Semantic Versioning (MAJOR.MINOR.PATCH).
• Mint a new DOI at every MAJOR change; MINOR/PATCH tracked in the metadata.
• Keep historic versions in archive/ subfolder; never rewrite git history for published tags.

git tag -a v1.1.0 -m "Minor clarity improvements"
doi update 10.5678/fsr.2025.004 --version 1.1.0

Persistent Identifier Policy

Identifier precedence:

DOI
ARK or Handle
Git commit SHA (fallback)

Include at least one globally resolvable ID in the paper header’s front-matter.

Accessibility and Retrieval for LLMs

Expose raw Markdown (paper.md) via HTTPS; avoid JS-render-only content.
Provide a .txt extraction for PDFs > 3 MB (paper.plain.txt).
Host a manifest.json at repository root enumerating all papers and their canonical URLs.
Constrain sentences to ≤ 40 tokens; LLMs tokenize more predictably on shorter sentences.
Place a robots.txt allowlist with User-agent: *\nAllow: /white-papers/ to facilitate crawling.

Reference Implementation

An automated GitHub Action (.github/workflows/release.yml) is recommended:

name: Publish White Paper
on:
  push:
    tags: ["v*"]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Validate Metadata
      run: aio-cli validate metadata white-papers/**/metadata.jsonld
    - name: Render PDF
      run: |
        pip install pandoc==3.2
        for md in white-papers/**/paper.md; do
          pandoc "$md" -o "${md%.md}.pdf" --citeproc
        done
    - name: Upload to AIO CDN
      env:
        AIO_TOKEN: ${{ secrets.AIO_TOKEN }}
      run: aio-cli upload --dir white-papers

Security & Trust Considerations

• Sign every release tag using GPG; advertise key fingerprint in SECURITY.md.
• Store SHA-256 checksums for binaries; verify during CI.
• Enforce CODEOWNERS so at least two maintainers review PRs affecting white-papers/.
• Include SPDX license headers in all source files.

Compliance Checklist

[ ] Directory follows <YYYY-MM>_<slug>/ pattern
[ ] metadata.jsonld passes aio-cli validate
[ ] Checksums verified in CI
[ ] DOI minted and recorded
[ ] PDF ≤ 10 MB; plain-text fallback provided
[ ] GPG-signed release tag
[ ] Robots allowlist present

Glossary

AIO: Artificial Intelligence Optimization—framework for making digital assets maximally machine-addressable.
LLM: Large-Language Model.
DOI: Digital Object Identifier.
JSON-LD: JSON for Linking Data.
GPG: GNU Privacy Guard cryptographic signing.

Fabled Sky Research

Contents