01 / 10

a11oy — Governed Execution Fabric

The TypeScript substrate that connects live enterprise signals to human-confirmed decisions with cryptographic proof at every transition.
9 packages · 17 test files · 248 doctrine assertions · DSSE-signed receipts · Lean-verified invariants.

a11oy — governed execution fabric brand image
DOI: 10.5281/zenodo.20434276 License: BSL-1.1 OpenSSF Scorecard: 7.0 Tests: 17 files · 248 assertions CI: 6 workflows
02 / 10

What a11oy is

a11oy (Alloy) is the governed agentic execution fabric of SZL Holdings. It ships the TypeScript packages that the szl-holdings/platform monorepo consumes for policy enforcement, signal measurement, knowledge-graph traversal, and cryptographic proof-chain integrity across all SZL domain verticals.

Every action in the platform must pass through the policy engine before execution. No action proceeds without a DSSE-signed receipt. The Λ-invariant — a conjunctive Λ-axis gate across 13 doctrine axes — constrains every policy evaluation: recommendations below the configured threshold escalate; they do not silently proceed.

9 Packages — each with a single responsibility

sequence-pipeline
Ordered input sequencer that enforces deterministic ingestion order and propagates W3C TraceContext headers into the execution fabric from the first token.
W3C TraceContext · ordered ingestion
sparse-attention-kit
Λ-aware attention sparsification filter. Routes inputs through configurable sparsity masks while preserving the Λ-axis monotonicity bound across attended tokens.
sparsity mask · Λ-monotonicity
perception-loop
Signal ingestion loop that reads live enterprise signals and scores them via the PRISM correlation model before handing off to measurement for baseline drift detection.
PRISM correlation · signal ingestion
knowledge
Knowledge-graph traversal and domain ontology queries. Retrieves relevant context for policy explanations. Key types: KnowledgeGraph, OntologyQuery, DomainNode.
KnowledgeGraph · OntologyQuery · DomainNode
a11oy-knowledge
Alignment schema layer on top of the core knowledge package. Encodes the 12 alignment innovations from Constitutional AI, Sleeper Agents, and Alignment Faking as typed schema assertions. 26 of 27 tests pass; one pre-existing TH2 proof-sketch failure is documented.
alignment schema · 26 / 27 tests
measurement
Signal scoring, PRISM correlation, and baseline drift detection. Produces SignalScore and DriftReport consumed by the policy engine for threshold evaluation. Key types: SignalScore, PRISMFrame, DriftReport.
SignalScore · PRISMFrame · DriftReport
policy
Covenant Policy Engine — evaluates all agent actions against governance rules before execution. Creates ApprovalGate structs when human review is required. No action bypasses this check. Key types: CovenantPolicy, ApprovalGate, PolicyDecision.
CovenantPolicy · ApprovalGate · PolicyDecision
qec-integrity
Quantum-error-correction lineage verification backed by szl-holdings/lutar-lean (CSS-QEC). Verifies proof-chain cryptographic lineage. 24 of 24 tests pass via custom node:assert/strict runner. Key types: QECLineage, IntegrityProof, CSSVector.
QECLineage · IntegrityProof · 24 / 24 pass
receipt-substrate
DSSE envelope emitter. Wraps every governed action output in a Dead Simple Signing Envelope (DSSE) with SLSA Provenance v1 format. Outputs are independently verifiable via cosign verify-blob or raw jq.
DSSE · SLSA Provenance v1 · cosign
Repository layout

The standalone alignment packages live in packages/ and web/packages/. The deployment surface is szl-holdings/platform (76 packages); a11oy supplies the core governance kernel. The web/ directory contains the React SPA that cannot run standalone — it depends on workspace packages from the platform monorepo.

03 / 10

Architecture — 5 Lanes · 9 Packages

Packages are organized into five functional lanes: ingress (input sequencing, sparsification, perception), knowledge (graph traversal, alignment schema), measurement (signal scoring), policy + QEC (approval gate enforcement, proof-chain verification), and receipt emission. Data flows left-to-right; every lane outputs a typed artifact consumed by the next.

a11oy package architecture — 9 packages across 5 lanes leading to DSSE receipt output

Component DAG — dependency graph

a11oy component DAG — directed acyclic graph of package dependencies
Component dependency DAG — generated from actual package import graph. Source: github.com/szl-holdings/a11oy · Doctrine v6
Platform integration

The four packages/ packages — policy, measurement, knowledge, qec-integrity — are the primary contracts consumed by the platform. The five additional packages in web/packages/ and at the repo root handle ingress, sparsification, ledger, connection, and alignment schema duties.

Lean 4 proof anchoring

QEC-integrity lineage is formally verified in szl-holdings/lutar-lean (Lean 4 + Mathlib v4.13.0). The termination proof (Lutar.AgentLoop.terminates) and Λ-monotonicity proof (Lutar.AgentLoop.preserves_lambda) are in the v18.0 milestone, DOI 10.5281/zenodo.20434276.

04 / 10

How It Works — Receipt by Receipt

Every agent action traverses a seven-step chain. Each step produces a typed artifact that gates the next step. The chain cannot be short-circuited: the DSSE wrap in step 7 only fires after QEC integrity passes in step 6. The entire chain is captured in a receipt timeline.

a11oy receipt chain timeline — step-by-step trace from input to DSSE-signed output
Receipt chain timeline — horizontal swim-lane per step, timestamps in ms. Source: github.com/szl-holdings/a11oy · Doctrine v6
  1. Input → sequence-pipeline
    Raw enterprise signals enter through sequence-pipeline. The sequencer assigns deterministic ingestion order and injects a W3C traceparent header that propagates through the entire chain. Every subsequent step reads this trace ID; no step drops it.
  2. Policy gate — initial halt-eligibility check
    The policy package evaluates the incoming action against the Covenant Policy ruleset before any further processing. If any rule requires human approval, an ApprovalGate is created and the chain pauses. No soft failures; the gate either passes or blocks.
  3. Measurement — signal scoring + drift detection
    measurement scores the signals against PRISM baselines and emits a DriftReport. The drift score feeds back into the policy evaluation: signals above the configured drift threshold trigger a re-evaluation at the Λ-gate.
  4. Knowledge route — ontology context retrieval
    knowledge and a11oy-knowledge traverse the domain ontology graph to retrieve explanation context for the pending action. This context is attached to the policy decision for audit purposes, not used to override the policy outcome.
  5. QEC integrity — proof-chain lineage verification
    qec-integrity verifies the CSS-QEC cryptographic lineage of the pending action. The verification is anchored to the Lean 4 proofs in szl-holdings/lutar-lean. 24 of 24 lineage tests pass. A failing lineage check halts the chain unconditionally.
  6. Receipt emission — SLSA Provenance v1
    receipt-substrate produces a signed receipt in SLSA Provenance v1 format. The receipt bundles: subjects array, predicate (policy decision + scores), and a DSSE signature. It is independently verifiable via cosign verify-blob or a raw jq pipeline documented in the UDS README.
  7. DSSE wrap — Dead Simple Signing Envelope
    The receipt is wrapped in a DSSE envelope (5-link chain: each link contains subjects[] + predicate + signature). A W3C TraceContext traceparent header is embedded. The envelope is exported as an OTLP span to the UDS mesh observability pipeline. The action is only released to the caller after this final wrap succeeds.
05 / 10

Honest Test Coverage — Real Numbers

Numbers below are grep'd from the repository at the time of this showcase. No rounding. Failures are disclosed, not hidden.

17
test files
find . -name "*.test.ts" | wc -l
248
a11oy assertions
doctrine test run · 2026-05-28
517
total assertions
248 a11oy + 269 UDS substrate
24/24
QEC tests pass
qec_lineage.test.ts · node:assert/strict
6
CI workflows
ci · codeql · dco · sbom · scorecard · slsa
7.0
OpenSSF score
securityscorecards.dev · 2026-05-28
Clarification — what "248 assertions" means

The 248 figure refers to doctrine test assertions in the a11oy Ouroboros runner run (2026-05-28). These are not unit tests; they are runtime invariant checks exercised against the live substrate. The 17 *.test.ts files in this repository contain a subset. The anatomy Space cites "248 includes doctrine tests + assertions" — this is the combined count from the runner, not from *.test.ts alone. Raw test-file assertion count (grep of expect|assert.) is 282 across the 17 files in this repo.

Known failures — not hidden
  • 1 pre-existing failure in packages/a11oy-knowledge tests (TH2 proof-sketch mismatch) — 26 of 27 pass.
  • 4 pre-existing Jest failures in __tests__/ compliance suite — 106 of 110 pass.
  • web/ SPA build cannot start standalone; requires parent monorepo workspace packages.
  • SLSA Level 3 (build-chain) is the target, not the current attained level (Level 1).
a11oy CI coverage gauge — test pass rate across packages
CI coverage gauge — pass rate per package suite. Source: github.com/szl-holdings/a11oy/actions · Doctrine v6
CI workflow coverage
  • ci.yml — Docs CI: markdown, citation files, license
  • codeql.yml — CodeQL static analysis on every push to main
  • dco.yml — Developer Certificate of Origin enforcement
  • sbom.yml — Software Bill of Materials generation
  • scorecard.yml — OpenSSF Scorecard automated run
  • slsa.yml — SLSA provenance attestation for releases
Test runner details
  • packages/a11oy-knowledge: Vitest — 26 / 27 pass
  • __tests__/: Jest / ts-jest — 106 / 110 pass
  • packages/qec-integrity: custom node:assert/strict — 24 / 24 pass
  • web/packages/a11oy-core (vitest): lid-check — 15 tests
  • web/packages/a11oy-core (custom): 7 files, ~67 total tests
  • web/packages/a11oy-core (KS-18): 3 Kochen-Specker tests
06 / 10

Λ-axis Live — Receipts Playground

The a11oy-receipts-playground Space demonstrates the Λ-axis gate in action: you can submit a governed prompt and receive back a DSSE-signed receipt with a traceparent header, showing the policy decision, measurement scores, and QEC integrity status inline.

Live embed from szlholdings-a11oy-receipts-playground.hf.space — if the frame is blank, open in a new tab via the link above.
What the playground shows
  • Submit any governed text prompt through the policy gate
  • Receive a DSSE-signed receipt: subjects array + predicate + signature
  • Inspect the traceparent W3C header embedded in the envelope
  • View the Λ-axis score (geometric-mean normalization of 13 doctrine axes)
  • See halt-eligible flag: Y or N — no intermediate states
a11oy platform Space

The live platform Space at a11oy-platform demonstrates the governed execution fabric with 7 webp context images and the dark-theme UI. It runs the full policy evaluation chain, not a simplified mock.

07 / 10

Competitive Comparison — Governance Matrix

The table below compares a11oy / SZL Holdings against three published AI governance frameworks across eight concrete criteria. Sources are cited directly; ratings are Y / Partial / N based on publicly available documentation as of 2026-05-28. No editorial dismissals.

Governance matrix — SZL vs Anthropic RSP vs OpenAI Preparedness vs Google DeepMind Responsibility across 8 criteria
Governance matrix chart (generated from public documentation). Sources cited in table below · Doctrine v6 · No superlatives
Criterion SZL / a11oy Anthropic RSP OpenAI Preparedness Google DeepMind
Lean-verified invariants Y N N N
Public theorem count Y — 76 theorems N N N
DSSE cryptographic receipts Y N N N
OTel-native observability Y N N Partial
EU AI Act alignment mapped Y Partial Partial Partial
NIST AI RMF mapped Y Partial Partial Partial
Open-source execution fabric Y — BSL-1.1 N N N
SBOM published Y N N N

Sources consulted: Anthropic Responsible Scaling Policy (anthropic.com/responsible-scaling-policy); OpenAI Preparedness Framework (openai.com/preparedness); Google DeepMind Frontier Safety Framework (deepmind.google/frontiers/frontier-safety-framework/). Ratings are based on publicly available text in those documents as of 2026-05-28; internal or unpublished practices of those organizations are not assessed.

Matrix narrative

The three frameworks above represent published, peer-reviewed governance approaches by major AI laboratories. Each addresses model capability thresholds, deployment gates, and safety testing requirements. None publishes a machine-checkable formal proof of their invariants; none emits cryptographically signed execution receipts per action; none publishes a software bill of materials for the governance system itself. SZL / a11oy differs primarily in the cryptographic auditability layer: every action produces a verifiable DSSE receipt, and the termination and monotonicity invariants are proven in Lean 4 under Mathlib v4.13.0. The tradeoff is scope: a11oy does not define model training-time safety requirements or organizational governance structures — it covers the runtime execution fabric only. The comparison above reflects that scope boundary.

08 / 10

What This Is NOT

Explicit scope boundaries. Doctrine v6 requires stating what a system does not do or claim.

Not an LLM checkpoint
a11oy ships no model weights. It is a governance wrapper for an external base model (currently designed around Opus 4.8-class). The base model's safety properties are not provided or verified by a11oy.
Not a training data pipeline
The receipt substrate records execution receipts; it does not produce training labels, RLHF signals, or preference datasets. Receipt data is audit-only.
Not a policy replacement
The Covenant Policy Engine enforces rules programmatically. It does not replace organizational policy review, legal counsel, or human governance decisions. Human approval gates are a deliberate feature, not a temporary workaround.
Not SLSA Level 3 (yet)
Current SLSA attestation is Level 1 for release artifacts. Level 3 (build-chain full provenance) is the stated target. This gap is disclosed in the SLSA workflow and in the repo README.
Not a standalone SPA
The web/ React SPA in this repo cannot run independently. It depends on workspace:* packages from the szl-holdings/platform monorepo. The buildable surface is the standalone packages only.
Not a proof of alignment
The Lean-verified invariants prove termination and Λ-monotonicity of the agent loop, not alignment in the broad sense. Axiom A15 (SHA-256 collision resistance) is an openly disclosed unproven assumption (cited: NIST FIPS 180-4).
Not production-validated
a11oy v19 is a research substrate. It has not been validated in production deployment at scale. The SERIES_A_DILIGENCE.md in the HF model card states this boundary explicitly.
Not open-source (fully)
The license is BSL-1.1, not a fully permissive open-source license. Review the LICENSE file for commercial use terms before adopting this substrate.
09 / 10

Citations & References

Primary citable artifacts for a11oy and the SZL substrate. All DOIs resolve to Zenodo records.

  1. [1]
    Lutar, S. (2026). Formal Verification of Agentic AI Invariants — Ouroboros Thesis v18.0. Zenodo. 206 pp · 76 theorems · 0 errors. doi.org/10.5281/zenodo.20434276
    Thesis DOI: 10.5281/zenodo.20434276
  2. [2]
    SZL Holdings. (2026). a11oy — Governed Agentic Execution Fabric [Software]. Zenodo. doi.org/10.5281/zenodo.20434308
    Software DOI: 10.5281/zenodo.20434308
  3. [3]
    SZL Holdings. (2026). Lutar v14 — Lean 4 Kernel Proofs. Zenodo. doi.org/10.5281/zenodo.20424992
    DOI: 10.5281/zenodo.20424992
  4. [4]
    arXiv preprint [forthcoming]. Λ-monotonicity and termination proofs for bounded-recursion agent loops. Reference will resolve after submission. Pinned thesis version at DOI [1] above.
    arXiv: [submission pending]
  5. [5]
    AGPL fit audit reference. a11oy is licensed under BSL-1.1. The AGPL compatibility assessment and commercial use boundary are documented in LICENSE and the audit notes in the repo root.
  6. [6]
    OpenSSF Scorecard. Automated security scorecard for github.com/szl-holdings/a11oy. Score: 7.0 (2026-05-28). securityscorecards.dev — a11oy report
  7. [7]
    Lean 4 + Mathlib v4.13.0. de Moura, L., & Ullrich, S. (2021). The Lean 4 Theorem Prover and Programming Language. CADE-28. Mathlib community: Mathlib4 docs. a11oy QEC invariants verified against Mathlib v4.13.0, commit lutar-lean @ 23480eba.
  8. [8]
    Bai et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv: 2212.08073. Referenced in a11oy-knowledge alignment schema as innovation 1.
  9. [9]
    Hubinger et al. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv: 2401.05566. Referenced in a11oy-knowledge alignment schema (deceptive alignment detection invariant).
  10. [10]
    Greenblatt et al. (2024). Alignment Faking in Large Language Models. arXiv: 2412.14093. Referenced in a11oy-knowledge alignment schema (alignment-faking detection invariant).
  11. [11]
    Anthropic. (2023). Responsible Scaling Policy. anthropic.com/responsible-scaling-policy . Referenced in competitive comparison matrix (row: EU AI Act / NIST RMF mapping).
  12. [12]
    NIST. (2018). FIPS 180-4: Secure Hash Standard. NIST FIPS 180-4 (PDF) . Cited as the honest open-problem for Axiom A15 (SHA-256 collision resistance) in the thesis. Not assumed proven; treated as a standard citation.