Aggregation-Only Contract Reference

The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.

1. Purpose and Scope

  • Defines the canonical behaviour for advisory_raw and vex_raw collections and the linkset hints they may emit.
  • Applies to every ingestion runtime (StellaOps.Concelier.*, StellaOps.Excititor.*), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance.
  • Complements the high-level architecture in Concelier and Authority enforcement documented in Authority Architecture.
  • Paired guidance: see the guard-rail checkpoints in AOC Guardrails, the implementation reference in AOC Guard Library, and CLI usage that will land in /docs/modules/cli/guides/ as part of Sprint 19 follow-up.

2. Philosophy and Goals

  • Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
  • Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
  • Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
  • Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.

3. Contract Invariants

#InvariantWhat it forbids or requiresEnforcement surfaces
1No derived severity at ingestReject top-level keys such as severity, cvss, effective_status, consensus_provider, risk_score. Raw upstream CVSS remains inside content.raw.Mongo schema validator, AOCWriteGuard, Roslyn analyzer, stella aoc verify.
2No merges or opinionated dedupeEach upstream document persists on its own; ingestion never collapses multiple vendors into one document.Repository interceptors, unit/fixture suites.
3Provenance is mandatorysource.*, upstream.*, and signature metadata must be present; missing provenance triggers ERR_AOC_004.Schema validator, guard, CLI verifier.
4Idempotent upsertsWrites keyed by (vendor, upstream_id, content_hash) either no-op or insert a new revision with supersedes. Duplicate hashes map to the same document.Repository guard, storage unique index, CI smoke tests.
5Append-only revisionsUpdates create a new document with supersedes pointer; no in-place mutation of content.Mongo schema (supersedes format), guard, data migration scripts.
6Linkset onlyIngestion may compute link hints (purls, cpes, IDs) to accelerate joins, but must not transform or infer severity or policy. Observations now persist both canonical linksets (for indexed queries) and raw linksets (preserving upstream order/duplicates) so downstream policy can decide how to normalise. When concelier:features:noMergeEnabled=true, all merge-derived canonicalisation paths must be disabled.Linkset builders reviewed via fixtures/analyzers; raw-vs-canonical parity covered by observation fixtures; analyzer CONCELIER0002 blocks merge API usage.
7Policy-only effective findingsOnly Policy Engine identities can write effective_finding_*; ingestion callers receive ERR_AOC_006 if they attempt it.Authority scopes, Policy Engine guard.
8Schema safetyUnknown top-level keys reject with ERR_AOC_007; timestamps use ISO 8601 UTC strings; tenant is required.Mongo validator, JSON schema tests.
9Clock disciplineCollectors stamp fetched_at and received_at monotonically per batch to support reproducibility windows.Collector contracts, QA fixtures.

4. Raw Schemas

4.1 advisory_raw

FieldTypeNotes
_idstringadvisory_raw:{source}:{upstream_id}:{revision}; deterministic and tenant-scoped.
tenantstringRequired; injected by Authority middleware and asserted by schema validator.
source.vendorstringProvider identifier (e.g., redhat, osv, ghsa).
source.streamstringConnector stream name (csaf, osv, etc.).
source.apistringAbsolute URI of upstream document; stored for traceability.
source.collector_versionstringSemantic version of the collector.
upstream.upstream_idstringVendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID).
upstream.document_versionstringUpstream issued timestamp or revision string.
upstream.fetched_at / received_atstringISO 8601 UTC timestamps recorded by the collector.
upstream.content_hashstringsha256: digest of the raw payload used for idempotency.
upstream.signatureobjectRequired structure storing present, format, key_id, sig; even unsigned payloads set present: false.
content.formatstringSource format (CSAF, OSV, etc.).
content.spec_versionstringUpstream spec version when known.
content.rawobjectFull upstream payload, untouched except for transport normalisation.
identifiersobjectUpstream identifiers (cve, ghsa, aliases, etc.) captured as provided (trimmed, order preserved, duplicates allowed).
linksetobjectJoin hints (see section 4.3).
supersedesstring or nullPoints to previous revision of same upstream doc when content hash changes.

4.2 vex_raw

FieldTypeNotes
_idstringvex_raw:{source}:{upstream_id}:{revision}.
tenantstringRequired; matches advisory collection requirements.
source.*objectSame shape and requirements as advisory_raw.
upstream.*objectIncludes document_version, timestamps, content_hash, and signature.
content.formatstringTypically CycloneDX-VEX or CSAF-VEX.
content.rawobjectEntire upstream VEX payload.
identifiers.statementsarrayNormalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins.
linksetobjectCVEs, GHSA IDs, and PURLs referenced in the document.
supersedesstring or nullSame convention as advisory documents.

4.3 Linkset Fields

  • purls: fully qualified Package URLs extracted from raw ranges or product nodes.
  • cpes: Common Platform Enumerations when upstream docs provide them.
  • aliases: Any alternate advisory identifiers present in the payload.
  • references: Array of { type, url } pairs pointing back to vendor advisories, patches, or exploits.
  • reconciled_from: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.

Canonicalisation rules:

  • Package URLs are rendered in canonical form without qualifiers/subpaths (pkg:type/namespace/name@version).
  • CPE values are normalised to the 2.3 binding (cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*).
  • Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.

4.4 advisory_observations

advisory_observations is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by StellaOps.Concelier.Models.Observations.AdvisoryObservation.

FieldTypeNotes
_idstringDeterministic observation id — {tenant}:{source.vendor}:{upstreamId}:{revision}.
tenantstringLower-case tenant identifier.
source.vendor / source.streamstringConnector identity (e.g., vendor/redhat, ecosystem/osv).
source.apistringAbsolute URI the connector fetched from.
source.collectorVersionstringOptional semantic version of the connector build.
upstream.upstream_idstringAdvisory identifier as issued by the provider (CVE, vendor ID, etc.).
upstream.document_versionstringUpstream revision/version string.
upstream.fetchedAt / upstream.receivedAtdatetimeUTC timestamps recorded by the connector.
upstream.contentHashstringsha256: digest used for idempotency.
upstream.signatureobject{present, format?, keyId?, signature?} describing upstream signature material.
content.format / content.specVersionstringRaw payload format metadata (CSAF, OSV, JSON, etc.).
content.rawobjectFull upstream document stored losslessly (Relaxed Extended JSON).
content.metadataobjectOptional connector-specific metadata (batch ids, hints).
linkset.aliasesarrayConnector-supplied aliases (trimmed, order preserved, duplicates allowed).
linkset.purlsarrayConnector-supplied PURLs (ingestion preserves order and duplicates).
linkset.cpesarrayConnector-supplied CPE URIs (trimmed, order preserved).
linkset.referencesarray{ type, url } pairs (trimmed; ingestion preserves order).
createdAtdatetimeTimestamp when Concelier persisted the observation.
attributesobjectOptional provenance attributes keyed by connector.

5. Error Model

CodeDescriptionHTTP statusSurfaces
ERR_AOC_001Forbidden field detected (severity, cvss, effective data).400Ingestion APIs, CLI verifier, CI guard.
ERR_AOC_002Merge attempt detected (multiple upstream sources fused into one document).400Ingestion APIs, CLI verifier.
ERR_AOC_003Idempotency violation (duplicate without supersedes pointer).409Repository guard, Mongo unique index, CLI verifier.
ERR_AOC_004Missing provenance metadata (source, upstream, signature).422Schema validator, ingestion endpoints.
ERR_AOC_005Signature or checksum mismatch.422Collector validation, CLI verifier.
ERR_AOC_006Attempt to persist derived findings from ingestion context.403Policy engine guard, Authority scopes.
ERR_AOC_007Unknown top-level fields (schema violation).400Mongo validator, CLI verifier.

Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance. The shared guard library (StellaOps.Aoc.AocError) emits consistent payloads (code, message, violations[]) for HTTP APIs, CLI tooling, and verifiers.

6. API and Tooling Interfaces

  • Concelier ingestion (StellaOps.Concelier.WebService)
    • POST /ingest/advisory: accepts upstream payload metadata; server-side guard constructs and persists raw document.
    • GET /advisories/raw/{id} and filterable list endpoints expose raw documents for debugging and offline analysis.
    • POST /aoc/verify: runs guard checks over recent documents and returns summary totals plus first violations.
  • Excititor ingestion (StellaOps.Excititor.WebService) mirrors the same surface for VEX documents.
  • CLI workflows (stella aoc verify, stella sources ingest --dry-run) surface pre-flight verification; documentation will live in /docs/modules/cli/guides/ alongside Sprint 19 CLI updates.
  • Authority scopes: new advisory:ingest, advisory:read, vex:ingest, and vex:read scopes enforce least privilege; see Authority Architecture for scope grammar.

7. Idempotency and Supersedes Rules

  1. Compute content_hash before any transformation; use it with (source.vendor, upstream.upstream_id) to detect duplicates.
  2. If a document with the same hash already exists, skip the write and log a no-op.
  3. When a new hash arrives for an existing upstream document, insert a new record and set supersedes to the previous _id.
  4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
  5. Expose idempotency counters via metrics (ingestion_write_total{result=ok|noop}) to catch regressions early.

8. Migration Playbook

  1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
  2. Snapshot existing collections to _backup_* for rollback safety.
  3. Strip forbidden fields from historical documents into a temporary advisory_view_legacy used only during transition.
  4. Enable Mongo JSON schema validators for advisory_raw and vex_raw.
  5. Run collectors in --dry-run to confirm only allowed keys appear; fix violations before lifting the freeze.
  6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
  7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
  8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.

9. Observability and Diagnostics

  • Metrics: ingestion_write_total{result=ok|reject}, aoc_violation_total{code}, ingestion_signature_verified_total{result}, ingestion_latency_seconds, advisory_revision_count.
  • Traces: spans ingest.fetch, ingest.transform, ingest.write, and aoc.guard with correlation IDs shared across workers.
  • Logs: structured entries must include tenant, source.vendor, upstream.upstream_id, content_hash, and violation_code when applicable.
  • Dashboards: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.

10. Security and Tenancy Checklist

  • Enforce Authority scopes (advisory:ingest, vex:ingest, advisory:read, vex:read) and require tenant claims on every request.
  • Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
  • Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
  • Validate that Policy Engine remains the only identity with permission to write effective_finding_* documents.
  • Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
  • Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.

11. Compliance Checklist

  • [ ] Deterministic guard enabled in Concelier and Excititor repositories.
  • [ ] Mongo validators deployed for advisory_raw and vex_raw.
  • [ ] Authority scopes and tenant enforcement verified via integration tests.
  • [ ] CLI and CI pipelines run stella aoc verify against seeded snapshots.
  • [ ] Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
  • [ ] Offline kit instructions updated to bundle validators and verifier tooling.
  • [ ] Security review recorded covering ingestion, tenancy, and rollback procedures.

Last updated: 2025-10-27 (Sprint 19).