component_architecture_authority.md — Stella Ops Authority (2025Q4)

Consolidates identity and tenancy requirements documented across the AOC, Policy, and Platform guides, along with the dedicated Authority implementation plan.

Scope. Implementation‑ready architecture for Stella Ops Authority: the on‑prem OIDC/OAuth2 service that issues short‑lived, sender‑constrained operational tokens (OpToks) to first‑party services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for who is calling inside a Stella Ops installation. (Entitlement is proven separately by PoE from the cloud Licensing Service; Authority does not issue PoE.)


0) Mission & boundaries

Mission. Provide fast, local, verifiable authentication for Stella Ops microservices and tools by minting very short‑lived OAuth2/OIDC tokens that are sender‑constrained (DPoP or mTLS‑bound). Support RBAC scopes, multi‑tenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).

Boundaries.

  • Authority does not validate entitlements/licensing. That’s enforced by Signer using PoE with the cloud Licensing Service.
  • Authority tokens are operational only (2–5 min TTL) and must not be embedded in long‑lived artifacts or stored in SBOMs.
  • Authority is stateless for validation (JWT) and optional introspection for services that prefer online checks.

1) Protocols & cryptography

  • OIDC Discovery: /.well-known/openid-configuration

  • OAuth2 grant types:

    • Client Credentials (service↔service, with mTLS or private_key_jwt)
    • Device Code (CLI login on headless agents; optional)
    • Authorization Code + PKCE (browser login for UI; optional)
  • Sender constraint options (choose per caller or per audience):

    • DPoP (Demonstration of Proof‑of‑Possession): proof JWT on each HTTP request, bound to the access token via cnf.jkt.
    • OAuth 2.0 mTLS (certificate‑bound tokens): token bound to client certificate thumbprint via cnf.x5t#S256.
  • Signing algorithms: EdDSA (Ed25519) preferred; fallback ES256 (P‑256). Rotation is supported via kid in JWKS.

  • Token format: JWT access tokens (compact), optionally opaque reference tokens for services that insist on introspection.

  • Clock skew tolerance: ±60 s; issue nbf, iat, exp accordingly.


2) Token model

  • Incident mode tokens require the obs:incident scope, a human-supplied incident_reason, and remain valid only while auth_time stays within a five-minute freshness window. Resource servers enforce the same window and persist incident.reason, incident.auth_time, and the fresh-auth verdict in authority.resource.authorize events. Authority exposes /authority/audit/incident so auditors can review recent activations.

2.1 Access token (OpTok) — short‑lived (120–300 s)

Registered claims

iss   = https://authority.<domain>
sub   = <client_id or user_id>
aud   = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava>
exp   = <unix ts>  (<= 300 s from iat)
iat   = <unix ts>
nbf   = iat - 30
jti   = <uuid>
scope = "scanner.scan scanner.export signer.sign ..."

Sender‑constraint (cnf)

  • DPoP:

    "cnf": { "jkt": "<base64url(SHA-256(JWK))>" }
    
  • mTLS:

    "cnf": { "x5t#S256": "<base64url(SHA-256(client_cert_der))>" }
    

Install/tenant context (custom claims)

tid          = <tenant id>               // multi-tenant
inst         = <installation id>        // unique installation
roles        = [ "svc.scanner", "svc.signer", "ui.admin", ... ]
plan?        = <plan name>              // optional hint for UIs; not used for enforcement

Note: Do not copy PoE claims into OpTok; OpTok ≠ entitlement. Only Signer checks PoE.

2.2 Refresh tokens (optional)

  • Default disabled. If enabled (for UI interactive logins), pair with DPoP‑bound refresh tokens or mTLS client sessions; short TTL (≤ 8 h), rotating on use (replay‑safe).

2.3 ID tokens (optional)

  • Issued for UI/browser OIDC flows (Authorization Code + PKCE); not used for service auth.

3) Endpoints & flows

3.1 OIDC discovery & keys

  • GET /.well-known/openid-configuration → endpoints, algs, jwks_uri

  • GET /jwks → JSON Web Key Set (rotating, at least 2 active keys during transition)

    KMS-backed keys. When the signing provider is kms, Authority fetches only the public coordinates (Qx, Qy) and version identifiers from the backing KMS. Private scalars never leave the provider; JWKS entries are produced by re-exporting the public material via the kms.version metadata attached to each key. Retired keys keep the same kms.version metadata so audits can trace which cloud KMS version produced a token.

3.2 Token issuance

  • POST /token

Legacy aliases under /oauth/token are deprecated as of 1 November 2025 and now emit Deprecation/Sunset/Warning headers. See docs/api/authority-legacy-auth-endpoints.md for timelines and migration guidance.

  • Client Credentials (service→service):

    • mTLS: mutual TLS + client_id → bound token (cnf.x5t#S256)
      • security.senderConstraints.mtls.enforceForAudiences forces the mTLS path when requested aud/resource values intersect high-value audiences (defaults include signer). Authority rejects clients attempting to use DPoP/basic secrets for these audiences.
      • Stored certificateBindings are authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g. certificate_binding_subject_mismatch).
    • private_key_jwt: JWT‑based client auth + DPoP header (preferred for tools and CLI)
  • Device Code (CLI): POST /oauth/device/code + POST /oauth/token poll

  • Authorization Code + PKCE (UI): standard

DPoP handshake (example)

  1. Client prepares JWK (ephemeral keypair).

  2. Client sends DPoP proof header with fields:

    htm=POST
    htu=https://authority.../token
    iat=<now>
    jti=<uuid>
    

    signed with the DPoP private key; header carries JWK.

  3. Authority validates proof; issues access token with cnf.jkt=<thumbprint(JWK)>.

  4. Client uses the same DPoP key to sign every subsequent API request to services (Signer, Scanner, …).

mTLS flow

  • Mutual TLS at the connection; Authority extracts client cert, validates chain; token carries cnf.x5t#S256.

3.3 Introspection & revocation (optional)

  • POST /introspect{ active, sub, scope, aud, exp, cnf, ... }
  • POST /revoke → revokes refresh tokens or opaque access tokens.

Requests targeting the legacy /oauth/{introspect|revoke} paths receive deprecation headers and are scheduled for removal after 1 May 2026.

  • Replay prevention: maintain DPoP jti cache (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for high‑value operations).

3.4 UserInfo (optional for UI)

  • GET /userinfo (ID token context).

3.5 Vuln Explorer workflow safeguards

  • Anti-forgery flow — Vuln Explorer’s mutation verbs call

    • POST /vuln/workflow/anti-forgery/issue
    • POST /vuln/workflow/anti-forgery/verify

    Callers must hold vuln:operate scopes. Issued tokens embed the actor, tenant, whitelisted actions, ABAC selectors (environment/owner/business tier), and optional context key/value pairs. Tokens are EdDSA/ES256 signed via the primary Authority signing key and default to a 10‑minute TTL (cap: 30 minutes). Verification enforces nonce reuse prevention, tenant match, and action membership before forwarding the request to Vuln Explorer.

  • Attachment access — Evidence bundles and attachments reference a ledger hash. Vuln Explorer obtains a scoped download token through:

    • POST /vuln/attachments/tokens/issue
    • POST /vuln/attachments/tokens/verify

    These tokens bind the ledger event hash, attachment identifier, optional finding/content metadata, and the actor. They default to a 30‑minute TTL (cap: 4 hours) and require vuln:investigate.

  • Audit trail — Both flows emit vuln.workflow.csrf.* and vuln.attachment.token.* audit records with tenant, actor, ledger hash, nonce, and filtered context metadata so Offline Kit operators can reconcile actions against ledger entries.

  • Configuration

    authority:
      vulnerabilityExplorer:
        workflow:
          antiForgery:
            enabled: true
            audience: "stellaops:vuln-workflow"
            defaultLifetime: "00:10:00"
            maxLifetime: "00:30:00"
            maxContextEntries: 16
            maxContextValueLength: 256
        attachments:
          enabled: true
          defaultLifetime: "00:30:00"
          maxLifetime: "04:00:00"
          payloadType: "application/vnd.stellaops.vuln-attachment-token+json"
          maxMetadataEntries: 16
          maxMetadataValueLength: 512
    

    Air-gapped bundles include the signing key material and policy snapshots required to validate these tokens offline.


4) Audiences, scopes & RBAC

4.1 Audiences

  • signer — only the Signer service should accept tokens with aud=signer.
  • attestor, scanner, concelier, excititor, ui, zastava similarly.

Services must verify aud and sender constraint (DPoP/mTLS) per their policy.

4.2 Core scopes

ScopeServiceOperation
signer.signSignerRequest DSSE signing
attestor.writeAttestorSubmit Rekor entries
scanner.scanScanner.WebServiceSubmit scan jobs
scanner.exportScanner.WebServiceExport SBOMs
scanner.readScanner.WebServiceRead catalog/SBOMs
vex.read / vex.adminExcititorQuery/operate
concelier.read / concelier.exportConcelierQuery/exports
ui.read / ui.adminUIView/admin
zastava.emit / zastava.enforceScanner/ZastavaRuntime events / admission

Roles → scopes mapping is configured centrally (Authority policy) and pushed during token issuance.


5) Storage & state

  • Configuration DB (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any).

  • Cache (Redis):

    • DPoP jti replay cache (short TTL)
    • Nonce store (per resource server, if they demand nonce)
    • Device code pollers, rate limiting buckets
  • JWKS: key material in HSM/KMS or encrypted at rest; JWKS served from memory.


6) Key management & rotation

  • Maintain at least 2 signing keys active during rotation; tokens carry kid.
  • Prefer Ed25519 for compact tokens; maintain ES256 fallback for FIPS contexts.
  • Rotation cadence: 30–90 days; emergency rotation supported.
  • Publish new JWKS before issuing tokens with the new kid to avoid cold‑start validation misses.
  • Keep old keys available at least for max token TTL + 5 minutes.

7) HA & performance

  • Stateless issuance (except device codes/refresh) → scale horizontally behind a load‑balancer.

  • DB only for client metadata and optional flows; token checks are JWT‑local; introspection endpoints hit cache/DB minimally.

  • Targets:

    • Token issuance P95 ≤ 20 ms under warm cache.
    • DPoP proof validation ≤ 1 ms extra per request at resource servers (Signer/Scanner).
    • 99.9% uptime; HPA on CPU/latency.

8) Security posture

  • Strict TLS (1.3 preferred); HSTS; modern cipher suites.
  • mTLS enabled where required (Signer/Attestor paths).
  • Replay protection: DPoP jti cache, nonce support for Signer (add DPoP-Nonce header on 401; clients re‑sign).
  • Rate limits per client & per IP; exponential backoff on failures.
  • Secrets: clients use private_key_jwt or mTLS; never basic secrets over the wire.
  • CSP/CSRF hardening on UI flows; SameSite=Lax cookies; PKCE enforced.
  • Logs redact Authorization and DPoP proofs; store sub, aud, scopes, inst, tid, cnf thumbprints, not full keys.

9) Multi‑tenancy & installations

  • Tenant (tid) and Installation (inst) registries define which audiences/scopes a client can request.
  • Cross‑tenant isolation enforced at issuance (disallow rogue aud), and resource servers must check that tid matches their configured tenant.

10) Admin & operations APIs

All under /admin (mTLS + authority.admin scope).

POST /admin/clients                 # create/update client (confidential/public)
POST /admin/audiences               # register audience resource URIs
POST /admin/roles                   # define role→scope mappings
POST /admin/tenants                 # create tenant/install entries
POST /admin/keys/rotate             # rotate signing key (zero-downtime)
GET  /admin/metrics                 # Prometheus exposition (token issue rates, errors)
GET  /admin/healthz|readyz          # health/readiness

Declared client audiences flow through to the issued JWT aud claim and the token request’s resource indicators. Authority relies on this metadata to enforce DPoP nonce challenges for signer, attestor, and other high-value services without requiring clients to repeat the audience parameter on every request.


11) Integration hard lines (what resource servers must enforce)

Every Stella Ops service that consumes Authority tokens must:

  1. Verify JWT signature (kid in JWKS), iss, aud, exp, nbf.

  2. Enforce sender‑constraint:

    • DPoP: validate DPoP proof (htu, htm, iat, jti) and match cnf.jkt; cache jti for replay defense; honor nonce challenges.
    • mTLS: match presented client cert thumbprint to token cnf.x5t#S256.
  3. Check scopes; optionally map to internal roles.

  4. Check tenant (tid) and installation (inst) as appropriate.

  5. For Signer only: require both OpTok and PoE in the request (enforced by Signer, not Authority).


12) Error surfaces & UX

  • Token endpoint errors follow OAuth2 (invalid_client, invalid_grant, invalid_scope, unauthorized_client).
  • Resource servers use RFC 6750 style (WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…").
  • For DPoP nonce challenges, clients retry with the server‑supplied nonce once.

13) Observability & audit

  • Metrics:

    • authority.tokens_issued_total{grant,aud}
    • authority.dpop_validations_total{result}
    • authority.mtls_bindings_total{result}
    • authority.jwks_rotations_total
    • authority.errors_total{type}
  • Audit log (immutable sink): token issuance (sub, aud, scopes, tid, inst, cnf thumbprint, jti), revocations, admin changes.

  • Plugin telemetry: password-capable plug-ins (Standard, LDAP) emit authority.plugin.<name>.password_verification events via IAuthEventSink, inheriting correlation/client/tenant/network metadata from AuthorityCredentialAuditContext. Each event includes plugin.failed_attempts, plugin.lockout_until, plugin.retry_after_seconds, plugin.failure_code, and any plug-in specific signals so SOC tooling can trace lockouts and rate-limit responses even in air-gapped deployments. Offline Kits ship the plug-in binaries plus the curated manifests (etc/authority.plugins/*.yaml) so these audit flows exist out of the box.

  • Tracing: token flows, DB reads, JWKS cache.


14) Configuration (YAML)

authority:
  issuer: "https://authority.internal"
  signing:
    enabled: true
    activeKeyId: "authority-signing-2025"
    keyPath: "../certificates/authority-signing-2025.pem"
    algorithm: "ES256"
    keySource: "file"
  security:
    rateLimiting:
      token:
        enabled: true
        permitLimit: 30
        window: "00:01:00"
        queueLimit: 0
      authorize:
        enabled: true
        permitLimit: 60
        window: "00:01:00"
        queueLimit: 10
      internal:
        enabled: false
        permitLimit: 5
        window: "00:01:00"
        queueLimit: 0
    senderConstraints:
      dpop:
        enabled: true
        allowedAlgorithms: [ "ES256", "ES384" ]
        proofLifetime: "00:02:00"
        allowedClockSkew: "00:00:30"
        replayWindow: "00:05:00"
        nonce:
          enabled: true
          ttl: "00:10:00"
          maxIssuancePerMinute: 120
          store: "redis"
          redisConnectionString: "redis://authority-redis:6379?ssl=false"
          requiredAudiences:
            - "signer"
            - "attestor"
      mtls:
        enabled: true
        requireChainValidation: true
        rotationGrace: "00:15:00"
        enforceForAudiences:
          - "signer"
        allowedSanTypes:
          - "dns"
          - "uri"
        allowedCertificateAuthorities:
          - "/etc/ssl/mtls/clients-ca.pem"
  clients:
    - clientId: scanner-web
      grantTypes: [ "client_credentials" ]
      audiences: [ "scanner" ]
      auth: { type: "private_key_jwt", jwkFile: "/secrets/scanner-web.jwk" }
      senderConstraint: "dpop"
      scopes: [ "scanner.scan", "scanner.export", "scanner.read" ]
    - clientId: signer
      grantTypes: [ "client_credentials" ]
      audiences: [ "signer" ]
      auth: { type: "mtls" }
      senderConstraint: "mtls"
      scopes: [ "signer.sign" ]
    - clientId: notify-web-dev
      grantTypes: [ "client_credentials" ]
      audiences: [ "notify.dev" ]
      auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" }
      senderConstraint: "dpop"
      scopes: [ "notify.viewer", "notify.operator", "notify.admin" ]
    - clientId: notify-web
      grantTypes: [ "client_credentials" ]
      audiences: [ "notify" ]
      auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" }
      senderConstraint: "dpop"
      scopes: [ "notify.viewer", "notify.operator" ]

15) Testing matrix

  • JWT validation: wrong aud, expired exp, skewed nbf, stale kid.
  • DPoP: invalid htu/htm, replayed jti, stale iat, wrong jkt, nonce dance.
  • mTLS: wrong client cert, wrong CA, thumbprint mismatch.
  • RBAC: scope enforcement per audience; over‑privileged client denied.
  • Rotation: JWKS rotation while load‑testing; zero‑downtime verification.
  • HA: kill one Authority instance; verify issuance continues; JWKS served by peers.
  • Performance: 1k token issuance/sec on 2 cores with Redis enabled for jti caching.

16) Threat model & mitigations (summary)

ThreatVectorMitigation
Token theftCopy of JWTShort TTL, sender‑constraint (DPoP/mTLS); replay blocked by jti cache and nonces
Replay across hostsReuse DPoP proofEnforce htu/htm, iat freshness, jti uniqueness; services may require nonce
ImpersonationFake clientmTLS or private_key_jwt with pinned JWK; client registration & rotation
Key compromiseSigning key leakHSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL
Cross‑tenant abuseScope elevationEnforce aud, tid, inst at issuance and resource servers
Downgrade to bearerStrip DPoPResource servers require DPoP/mTLS based on aud; reject bearer without cnf

17) Deployment & HA

  • Stateless microservice, containerized; run ≥ 2 replicas behind LB.
  • DB: HA Postgres (or MySQL) for clients/roles; Redis for device codes, DPoP nonces/jtis.
  • Secrets: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS.
  • Backups: DB daily; Redis not critical (ephemeral).
  • Disaster recovery: export/import of client registry; JWKS rehydrate from KMS.
  • Compliance: TLS audit; penetration testing for OIDC flows.

18) Implementation notes

  • Reference stack: .NET 10 + OpenIddict 6 (or IdentityServer if licensed) with custom DPoP validator and mTLS binding middleware.
  • Keep the DPoP/JTI cache pluggable; allow Redis/Memcached.
  • Provide client SDKs for C# and Go: DPoP key mgmt, proof generation, nonce handling, token refresh helper.

19) Quick reference — wire examples

Access token (payload excerpt)

{
  "iss": "https://authority.internal",
  "sub": "scanner-web",
  "aud": "signer",
  "exp": 1760668800,
  "iat": 1760668620,
  "nbf": 1760668620,
  "jti": "9d9c3f01-6e1a-49f1-8f77-9b7e6f7e3c50",
  "scope": "signer.sign",
  "tid": "tenant-01",
  "inst": "install-7A2B",
  "cnf": { "jkt": "KcVb2V...base64url..." }
}

DPoP proof header fields (for POST /sign/dsse)

{
  "htu": "https://signer.internal/sign/dsse",
  "htm": "POST",
  "iat": 1760668620,
  "jti": "4b1c9b3c-8a95-4c58-8a92-9c6cfb4a6a0b"
}

Signer validates that hash(JWK) in the proof matches cnf.jkt in the token.


20) Rollout plan

  1. MVP: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, per‑audience scopes.
  2. Add: mTLS‑bound tokens for Signer/Attestor; device code for CLI; optional introspection.
  3. Hardening: DPoP nonce support; full audit pipeline; HA tuning.
  4. UX: Tenant/installation admin UI; role→scope editors; client bootstrap wizards.