Fabled Sky Research

AIO Standards & Frameworks

Publishing White Papers and Research in AIO-Optimized Repositories

Contents

Document Type: Implementation Guide
Section: Docs
Repository: https://aio.fabledsky.com
Maintainer: Fabled Sky Research
Last updated: April 2025

Overview

This guide defines the canonical approach for structuring, describing, and publishing white papers, peer-reviewed articles, and technical notes so they remain discoverable, verifiable, and optimally retrievable by Large-Language Models (LLMs) and human researchers for 10 + years. It aligns with the Artificial Intelligence Optimization (AIO) specification v2.4 and the Trust Infrastructure program (Category 🧬; Priority P2).

Scope

• Applies to any repository governed by AIO standards (GitHub, GitLab, S3, on-prem).
• Covers Markdown, LaTeX, and PDF source files, supplemental code/data, and machine-readable metadata.
• Excludes legal/patent filings (see Legal-IP Protocol).

Target Audience

  1. Research authors & editors
  2. DevOps and platform engineers maintaining repositories
  3. Strategy & governance teams verifying compliance

Key Objectives

  1. Guarantee deterministic retrievability by LLMs under token, context-window, and rate-limit constraints.
  2. Provide cryptographic trust signals (integrity, authorship, provenance).
  3. Enable evergreen citation and versioning patterns (DOI, Semantic Versioning).
  4. Streamline multi-channel rendering (HTML, PDF, EPUB) from a single source of truth.

Repository Layout

aio-research/
├─ white-papers/
│  ├─ 2025-04_ads-human-alignment/
│  │  ├─ paper.md
│  │  ├─ references.bib
│  │  ├─ figures/
│  │  │  └─ fig1.png
│  │  ├─ metadata.jsonld
│  │  ├─ checksums.txt
│  │  └─ CHANGELOG.md
│  └─ ...
└─ scripts/
   └─ generate-release.sh

• Each paper owns a slug folder <YYYY-MM>_<kebab-title>/.
• No file exceeds 1 MB except compiled PDFs.
checksums.txt must contain SHA-256 for every binary asset.

Metadata Specification

All research objects must ship a JSON-LD file compliant with schema.org/ScholarlyArticle. Mandatory keys are listed below.

{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Adaptive Differential Sampling for Human Alignment",
  "version": "1.0.0",
  "identifier": [
    { "@type": "PropertyValue", "propertyID": "DOI", "value": "10.5678/fsr.2025.004" },
    { "@type": "PropertyValue", "propertyID": "AIO-Spec", "value": "v2.4" }
  ],
  "author": [
    { "@type": "Person", "name": "Dr. Ada Lovelace", "affiliation": "Fabled Sky Research" },
    { "@type": "Person", "name": "Tōru Nakamura", "affiliation": "Fabled Sky Research" }
  ],
  "datePublished": "2025-04-12",
  "inLanguage": "en",
  "url": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.md",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "keywords": ["AIO", "Human Alignment", "Differential Sampling"],
  "encoding": {
    "@type": "MediaObject",
    "fileFormat": "application/pdf",
    "contentUrl": "https://aio.fabledsky.com/white-papers/2025-04_ads-human-alignment/paper.pdf",
    "sha256": "58e9153b..."
  }
}

Add optional keys citation, funding, isBasedOn for reproducibility.

Content Markup Guidelines

  1. Author in Markdown using semantic heading depth (# Title, ## Abstract, etc.).
  2. Chunk logically: every subsection ≤ 300 words to facilitate LLM chunking.
  3. Use embedded <figure> tags or Markdown image syntax with alt-text.
  4. Provide both inline equations ($E = mc^2$) and block LaTeX fenced with $$.
  5. Reserve ## Appendix for large tables; split into separate .csv files where feasible.
  6. Cross-reference with GitHub-Flavored Markdown links; avoid relative paths that escape the paper directory.

Versioning and DOI Integration

• Follow Semantic Versioning (MAJOR.MINOR.PATCH).
• Mint a new DOI at every MAJOR change; MINOR/PATCH tracked in the metadata.
• Keep historic versions in archive/ subfolder; never rewrite git history for published tags.

git tag -a v1.1.0 -m "Minor clarity improvements"
doi update 10.5678/fsr.2025.004 --version 1.1.0

Persistent Identifier Policy

Identifier precedence:

  1. DOI
  2. ARK or Handle
  3. Git commit SHA (fallback)

Include at least one globally resolvable ID in the paper header’s front-matter.

Accessibility and Retrieval for LLMs

  1. Expose raw Markdown (paper.md) via HTTPS; avoid JS-render-only content.
  2. Provide a .txt extraction for PDFs > 3 MB (paper.plain.txt).
  3. Host a manifest.json at repository root enumerating all papers and their canonical URLs.
  4. Constrain sentences to ≤ 40 tokens; LLMs tokenize more predictably on shorter sentences.
  5. Place a robots.txt allowlist with User-agent: *\nAllow: /white-papers/ to facilitate crawling.

Reference Implementation

An automated GitHub Action (.github/workflows/release.yml) is recommended:

name: Publish White Paper
on:
  push:
    tags: ["v*"]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Validate Metadata
      run: aio-cli validate metadata white-papers/**/metadata.jsonld
    - name: Render PDF
      run: |
        pip install pandoc==3.2
        for md in white-papers/**/paper.md; do
          pandoc "$md" -o "${md%.md}.pdf" --citeproc
        done
    - name: Upload to AIO CDN
      env:
        AIO_TOKEN: ${{ secrets.AIO_TOKEN }}
      run: aio-cli upload --dir white-papers

Security & Trust Considerations

• Sign every release tag using GPG; advertise key fingerprint in SECURITY.md.
• Store SHA-256 checksums for binaries; verify during CI.
• Enforce CODEOWNERS so at least two maintainers review PRs affecting white-papers/.
• Include SPDX license headers in all source files.

Compliance Checklist

  • [ ] Directory follows <YYYY-MM>_<slug>/ pattern
  • [ ] metadata.jsonld passes aio-cli validate
  • [ ] Checksums verified in CI
  • [ ] DOI minted and recorded
  • [ ] PDF ≤ 10 MB; plain-text fallback provided
  • [ ] GPG-signed release tag
  • [ ] Robots allowlist present

Glossary

AIO: Artificial Intelligence Optimization—framework for making digital assets maximally machine-addressable.
LLM: Large-Language Model.
DOI: Digital Object Identifier.
JSON-LD: JSON for Linking Data.
GPG: GNU Privacy Guard cryptographic signing.

Further Reading

  1. AIO Specification v2.4 (https://aio.fabledsky.com/spec/v2.4)
  2. NISO RP-22-2021 Journal Article Tag Suite (JATS)
  3. schema.org for Research Communications (2025 Draft)
  4. OpenAIRE Guidelines for Data Archives v4.0

Following these practices ensures that Fabled Sky Research white papers remain credible, easily cited, and consistently retrievable by both current and next-generation AI systems, safeguarding the longevity and impact of your work within the AIO ecosystem.