feat(server): pluggable request-auth framework (management + runtime) by abhinav-galileo · Pull Request #204 · agentcontrol/agent-control

abhinav-galileo · 2026-04-28T20:25:44Z

Summary

Pluggable request-auth framework that handles both auth flows the system needs:

Management. Online check on every request. The default authorizer authenticates the credential and authorizes the operation; on production this is HttpUpstreamAuthProvider forwarding to a configurable upstream service.
Runtime. Two-phase exchange-then-verify. A target-bearing call presents a long-lived credential plus (target_type, target_id) to a token exchange endpoint; the server mints a short-lived HS256 JWT bound to that target. Subsequent runtime calls verify the JWT locally — no upstream round-trip on the hot path.

Both flows route through the same primitives (Operation vocabulary on endpoints, Principal returned, RequestAuthorizer Protocol installed); a per-operation registry lets a deployment point management ops at one provider and runtime ops at another.

Migrates the /control-bindings endpoint family onto the framework and ships the runtime token exchange endpoint. The runtime resolution path itself (/evaluation etc.) is wired in a follow-up — its provider override (LocalJwtVerifyProvider) is already in place when the runtime secret is configured.

Module layout

server/src/agent_control_server/auth_framework/
  __init__.py                   # public API
  core.py                       # Operation, Principal, RequestAuthorizer, require_operation, registry
  config.py                     # configure_auth_from_env (env-driven setup, both flows)
  runtime_token.py              # HS256 mint / verify helpers
  providers/
    __init__.py
    header.py                   # HeaderAuthProvider + DEFAULT_OPERATION_ACCESS
    http_upstream.py            # HttpUpstreamAuthProvider (forward + parse grant)
    local_jwt.py                # LocalJwtVerifyProvider (hot-path JWT verify)

server/src/agent_control_server/endpoints/
  auth.py                       # POST /api/v1/auth/runtime-token-exchange

auth.py (legacy local credential check) is unchanged; HeaderAuthProvider re-uses _validate_api_key from it. Non-binding routes still go through the legacy router-level gate; their migration happens in follow-up PRs.

Operation vocabulary

class Operation(StrEnum):
    # Wired on endpoints in this PR.
    CONTROL_BINDINGS_READ = "control_bindings.read"
    CONTROL_BINDINGS_WRITE = "control_bindings.write"
    RUNTIME_TOKEN_EXCHANGE = "runtime.token_exchange"

    # Reserved; not yet wired on endpoints.
    CONTROLS_READ = "controls.read"
    CONTROLS_CREATE = "controls.create"
    CONTROLS_UPDATE = "controls.update"
    CONTROLS_DELETE = "controls.delete"
    RUNTIME_USE = "runtime.use"

Per-operation authorizer registry

set_authorizer(authorizer, operation=...) overrides the default for one operation. Without operation=, it becomes the default for every operation that does not have a specific binding. Used to route management ops through one provider and Operation.RUNTIME_USE through LocalJwtVerifyProvider:

set_authorizer(HttpUpstreamAuthProvider(...))                 # default
set_authorizer(LocalJwtVerifyProvider(secret=...),             # override
               operation=Operation.RUNTIME_USE)

require_operation(op) consults the override first, falls back to the default. The OSS path (no override installed) routes everything to HeaderAuthProvider — the no-auth flow (api_key_enabled=False) is preserved end-to-end.

Providers (three ship in-tree)

HeaderAuthProvider — local-credential path, single namespace.

Maps each Operation to one of three access levels (PUBLIC, AUTHENTICATED, ADMIN); single source of truth in DEFAULT_OPERATION_ACCESS.
Reuses the existing local API-key + session-cookie credential check from auth.py, so behavior matches the previous require_admin_key path verbatim.
The no-auth flow (api_key_enabled=False) is preserved: every operation succeeds with a non-admin Principal. Pinned by a regression test.
Always returns DEFAULT_NAMESPACE_KEY. The namespace header lookup branch is preserved but inert until non-binding write endpoints are threaded.

HttpUpstreamAuthProvider — generic upstream-delegating provider.

Forwards caller credentials (X-API-Key, Authorization, Cookie) on a POST to a configurable URL with {operation, context?}.
Optional service-to-service token header for upstream→authorization-service trust.
Parses the upstream response into a Principal: namespace_key, is_admin, caller_id, plus optional grant fields (target_type, target_id, scopes, expires_at) so the runtime token exchange can mint from the same response.
Maps 200 → Principal; 401/403/404 → matching error; 5xx, network errors, and malformed payloads fail closed (503/502).

LocalJwtVerifyProvider — hot-path runtime verifier.

Reads a Bearer token from Authorization, verifies signature against the runtime secret, checks domain == "runtime", the issuer, expiry, and that the token's scope covers the requested Operation.
Returns a Principal with the bound (namespace_key, target_type, target_id) so runtime endpoints inherit the namespace and target binding without re-deriving them.
When the dependency surfaces target_type / target_id via context_builder, the provider also enforces that they match the token's binding — runtime endpoints get the request-target check for free.

Runtime token shape

HS256, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET), issuer agent-control/server. Claims:

Claim	Purpose
`domain`	Pinned to `runtime`; tokens minted here MUST not be accepted on management endpoints.
`namespace_key`	The namespace the token authorizes within. Required for mint and verify; preserved end-to-end so a token minted for org A cannot be used to resolve controls in the default namespace.
`actor_id`	Caller identity surfaced from the upstream grant.
`scopes`	Granted runtime capabilities (e.g., `["runtime.use"]`). The exchange endpoint refuses to mint when the upstream's explicit grant omits `runtime.use`.
`target_type` / `target_id`	Bind the token to one target.
`iat` / `exp`	Bounded lifetime. The local TTL is capped by the upstream grant's `expires_at` so the local token can never outlive its grant.
`jti`	Random identifier; reserved for future revocation.

Runtime token exchange endpoint

POST /api/v1/auth/runtime-token-exchange
{ "target_type": "...", "target_id": "..." }

Authenticated and authorized via Operation.RUNTIME_TOKEN_EXCHANGE through the default authorizer (typically HttpUpstreamAuthProvider in production). The authorizer's context_builder forwards the requested target to the upstream so it can authorize against the right resource.
Refuses with 503 when AGENT_CONTROL_RUNTIME_TOKEN_SECRET is not configured.
Mints a local token from Principal.scopes / Principal.grant_expires_at, capped by the configured TTL (default 300s).
When the provider's Principal carries a target binding, the endpoint verifies it matches the requested target before minting.

Response: { token, expires_at, target_type, target_id, scopes }.

Migrated endpoints

All seven /api/v1/control-bindings* endpoints now use Depends(require_operation(...)):

Method	Path	Operation
PUT	`/control-bindings`	`control_bindings.write`
GET	`/control-bindings`	`control_bindings.read`
GET	`/control-bindings/{binding_id}`	`control_bindings.read`
PATCH	`/control-bindings/{binding_id}`	`control_bindings.write`
DELETE	`/control-bindings/{binding_id}`	`control_bindings.write`
PUT	`/control-bindings/by-key`	`control_bindings.write`
POST	`/control-bindings/by-key:delete`	`control_bindings.write`

New: POST /api/v1/auth/runtime-token-exchange (operation runtime.token_exchange).

Env vars

Var	Default	Purpose
`AGENT_CONTROL_AUTH_MODE`	`header`	Default authorizer: `header` or `http_upstream`.
`AGENT_CONTROL_AUTH_UPSTREAM_URL`	—	Required when mode is `http_upstream`.
`AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS`	`5.0`	Per-request timeout.
`AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN`	—	Optional upstream service token.
`AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER`	`X-Agent-Control-Service-Token`	Header name for the service token.
`AGENT_CONTROL_RUNTIME_TOKEN_SECRET`	—	Required to enable runtime auth + the exchange endpoint.
`AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS`	`300`	Local token TTL ceiling (capped further by the upstream grant).

Out of scope (follow-ups)

Migrate /controls CRUD onto require_operation using the reserved CONTROLS_* operations.
Wire Operation.RUNTIME_USE on the runtime resolution path (/evaluation, etc.) and the SDK side of the runtime exchange. The provider override is already in place when the runtime secret is configured. With feat(server): namespace scoping and control bindings #203's merged-resolver contract on /evaluation, the JWT-verified target binding now narrows the effective set the resolver returns; the verifier's match check is load-bearing for correctness, not just for authorization.
Migrate /agents/initAgent onto require_operation. The HttpUpstreamAuthProvider's context_builder should forward the request's target_type / target_id (added in feat(server): namespace scoping and control bindings #203) to the upstream so the upstream can authorize against the requested resource.
Thread namespace resolution through the rest of the API so the namespace header lookup in HeaderAuthProvider can be turned on safely.
Drop auth.py's require_admin_key once every management endpoint is migrated.

Stacking

Stacked on PR #203 (abhi/data-model-v1); rebased onto its current head 8f806a3 so the merged effective-controls contract (target bindings unioned with direct + policy controls, namespace_key threaded through every join) is the base this PR builds on. GET /control-bindings/effective is gone in #203, so the migration of that route went away with it; the seven surviving binding endpoints are migrated as before. Will rebase onto main once #203 merges.

Test plan

51 framework + endpoint tests:
- Default coverage: every Operation member has a default access mapping (regression guard).
- HeaderAuthProvider: PUBLIC bypass, AUTHENTICATED + ADMIN paths route to the legacy validator with the right require_admin flag, no-auth mode passes admin operations, namespace-header lookup currently inert, unknown operation raises.
- HttpUpstreamAuthProvider: 200 happy path with realistic JSON wire shapes (ISO datetime + JSON array scopes round-trip), service token forwarding, 401/403/404 mapping, 5xx fail-closed, network-error fail-closed, strict-grant rejection on wrong-typed is_admin / malformed scopes / bad expires_at / non-string target fields, partial target grant (target_type only or target_id only) rejected, naive expires_at rejected (no tz info → fail-closed 502 at the parser instead of TypeError later in the mint path).
- require_operation factory: routes through the installed authorizer, per-operation overrides take precedence, clearing an override falls back to the default, get_authorizer raises when nothing is set.
- Lifecycle: reconfiguring without the runtime secret drops the previous LocalJwtVerifyProvider override; teardown clears every authorizer.
- Runtime token mint / verify: round-trip, wrong-secret rejection, expiry rejection, TTL capped by upstream grant, management-domain token refused on runtime verify, missing-namespace rejection, already-expired upstream grant raises UpstreamGrantExpiredError instead of minting a token with an exp in the past (also covers the boundary case where expires_at == issued_at).
- LocalJwtVerifyProvider: target-bound Principal, namespace carried from token, missing token → 401, wrong scope → 403, non-Bearer header → 401, target-context match enforcement (mismatch on type or id → 403).
- Exchange endpoint: 503 without secret, mint when configured, target mismatch rejected (400), missing target rejected (422), grant-without-runtime-use rejected (no privilege escalation), target context forwarded to authorizer, non-default namespace propagates into the token, full exchange-then-verify round trip, already-expired upstream grant surfaces as 502 (distinct from the 503 misconfigured-server path) so the public status reflects which side the operator should investigate.
Full server suite: 672 passed (was 621 on feat(server): namespace scoping and control bindings #203 head; +51 from new tests).
make lint clean.
make typecheck clean.
make sdk-ts-generate-check clean.
TS SDK regenerated alongside the new endpoint (auth-runtime-token-exchange, request/response models).

codecov · 2026-04-28T20:29:35Z

Codecov Report

❌ Patch coverage is 89.45055% with 48 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../src/agent_control_server/auth_framework/config.py	73.80%	22 Missing ⚠️
...ent_control_server/auth_framework/runtime_token.py	85.33%	11 Missing ⚠️
server/src/agent_control_server/endpoints/auth.py	87.75%	6 Missing ⚠️
...l_server/auth_framework/providers/http_upstream.py	96.62%	3 Missing ⚠️
...ntrol_server/auth_framework/providers/local_jwt.py	91.66%	3 Missing ⚠️
...agent_control_server/endpoints/control_bindings.py	82.35%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

namrataghadi-galileo · 2026-04-29T21:51:18Z

+        )
+
+    actor_id = principal.caller_id or "anonymous"
+    if principal.scopes:


Reject explicit empty upstream scopes

When the HTTP upstream returns an explicit empty scopes array, _UpstreamGrant becomes principal.scopes == (), so this falsey check falls into the local default and mints a token with runtime.use. That is the privilege escalation the comment is trying to avoid when an upstream grant omits runtime.use; the exchange needs to distinguish an unscoped local provider from an explicit upstream grant with no scopes before defaulting.

namrataghadi-galileo · 2026-04-29T21:51:57Z

    dependencies=[Depends(require_api_key)],
 )
 app.include_router(
+    # The auth framework on each endpoint owns authentication and


Mounting these framework-protected routers without any FastAPI Security dependency removes the APIKeyHeader requirement from generated OpenAPI; require_operation only accepts Request, so make openapi-spec emits no security entry for /api/v1/control-bindings and the auth exchange route below. API docs and downstream generators will treat these protected operations as unauthenticated even though runtime still requires credentials.

namrataghadi-galileo · 2026-04-29T21:52:46Z

    return (this._agents ??= new Agents(this._options));
  }

+  private _auth?: Auth;


The new Auth group is only added to the generated AgentControlSDK, but the package export uses AgentControlClient from src/index.ts, and that wrapper still exposes only the existing groups. Consumers importing agent-control cannot call runtimeTokenExchange through the public client even though this generated getter exists; add the matching wrapper getter/type export.

…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.

Endpoints declare a generic Operation; an installed RequestAuthorizer decides whether the request is allowed and returns the resolved Principal (namespace + admin flag + caller id). Two providers ship in-tree: - HeaderAuthProvider: OSS / single-namespace default. Maps each Operation to one of three access levels (PUBLIC / AUTHENTICATED / ADMIN) and reuses the legacy local credential check; behavior matches the previous require_admin_key path verbatim. V1 ignores the X-Namespace-Key header and always returns the default namespace because non-binding write endpoints still hardcode it; the branch is preserved for a follow-up that lifts the lock. - HttpUpstreamAuthProvider: forwards caller credentials to a configurable upstream URL. Maps 401/403/404 directly; fail-closed (503) on 5xx and network errors; rejects malformed principals (502). Control-binding endpoints now declare CONTROL_BINDINGS_READ / CONTROL_BINDINGS_WRITE via require_operation(...) and read the resolved namespace from the returned Principal. The router is mounted without the legacy router-level gate so the framework owns authentication and authorization end-to-end. Reserved Operation members for controls.* and runtime.use are defined but not yet wired; their migrations land in follow-up PRs.

Rename so the framework's vocabulary is factual: - OssAccessLevel -> AccessLevel - OSS_OPERATION_ACCESS -> DEFAULT_OPERATION_ACCESS - Comments / docstrings: replace "OSS / single-namespace" framing with factual descriptions of the local-credential path. Drop the unjustified MANAGEMENT_ prefix on environment variables; this PR only configures one auth flow: - AGENT_CONTROL_MANAGEMENT_AUTH_MODE -> AGENT_CONTROL_AUTH_MODE - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_URL -> AGENT_CONTROL_AUTH_UPSTREAM_URL - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_TIMEOUT_SECONDS -> AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER Add a regression test for the no-auth flow: when api_key_enabled is False, even admin operations succeed with a non-admin Principal, matching the pre-framework local-auth behavior.

Completes the framework's auth coverage. Management and runtime are genuinely different protocols, and they now route through different authorizers via the per-operation registry: - Per-operation override on the registry. set_authorizer(authorizer, operation=...) overrides the default for one operation; calls without operation= become the default for everything else. Used to point Operation.RUNTIME_USE at LocalJwtVerifyProvider while leaving the default authorizer (header or http_upstream) for management. - Runtime token mint/verify. HS256 JWT, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET), short TTL capped by the upstream grant's expiry. domain="runtime" claim pins the token to the runtime path. Issuer is agent-control/server. - LocalJwtVerifyProvider verifies the Bearer token, checks the scope covers the requested Operation, and returns a Principal with the bound (target_type, target_id) so endpoints can match the request target. - POST /api/v1/auth/runtime-token-exchange. Authenticates via the default authorizer (typically HttpUpstreamAuthProvider in production, which forwards the credential to the configured upstream) and mints a local runtime token from the resulting Principal. Refuses with 503 when the runtime secret is not configured. - Principal grew target_type, target_id, scopes, grant_expires_at fields so providers can surface the upstream grant's binding and the exchange endpoint can mint a token from it. HttpUpstreamAuthProvider parses the matching optional fields from the upstream JSON response. - Configuration: AGENT_CONTROL_AUTH_* configures the default authorizer; AGENT_CONTROL_RUNTIME_TOKEN_SECRET (+ optional AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS) enables the runtime override. Without the secret, runtime endpoints fall through to the default authorizer. Tests: 18 new unit + integration tests covering the registry overrides, token round-trip / wrong-secret / expired / wrong-domain rejection, JWT-verify provider behavior (target binding, missing token, wrong scope, non-Bearer header), and the exchange endpoint (503 without secret, mint when configured, target mismatch, missing target, context forwarded to authorizer, full exchange-then-verify round trip). The TypeScript SDK regenerates with the new endpoint surface (runtime-token-exchange) — committed alongside.

…es/grant Five hardening changes prompted by review: - Runtime tokens carry namespace_key. mint_runtime_token now requires it; the JWT payload includes it; verify_runtime_token rejects tokens without it; LocalJwtVerifyProvider returns the token's namespace on the resulting Principal instead of always defaulting. Otherwise a token minted for org A would resolve runtime controls in the default namespace once /evaluation is wired to RUNTIME_USE. - Exchange endpoint refuses to add runtime.use to a grant that omits it. If the upstream returned an explicit scope set without runtime.use, the credential is not authorized for runtime use on this target — minting one anyway would be privilege escalation. Defaulting to runtime.use is preserved only when the provider returned no scoped grant (e.g., local header path). - HttpUpstreamAuthProvider parses the upstream response with a strict Pydantic model (strict=True). Wrong-typed is_admin, malformed scopes, bad expires_at, and non-string target fields fail closed with 502 instead of being silently coerced or dropped. Unknown fields are still tolerated so the upstream can evolve. - LocalJwtVerifyProvider enforces target context match when the dependency surfaces it. Future runtime endpoints can declare a context_builder that extracts target_type/target_id from the request; the provider verifies the token's binding matches and rejects with 403 otherwise. - Auth provider lifecycle. configure_auth_from_env tracks installed providers; teardown_auth (called from FastAPI lifespan shutdown) closes any aclose-able providers — releases the HttpUpstreamAuthProvider's owned httpx.AsyncClient. Tests: nine new cases covering token-namespace round-trip, target context mismatch on type and id, strict grant rejection across each malformed field, the privilege-escalation guard, and a full non-default-namespace round trip through the exchange endpoint.

… on reconfigure Two follow-up fixes from review: - HttpUpstreamAuthProvider validates against the raw response bytes via _UpstreamGrant.model_validate_json instead of round-tripping through response.json() and model_validate. Pydantic's JSON parser accepts ISO datetimes and JSON arrays (the actual wire shapes any HTTP service produces) while strict=True still rejects type-coercion bugs like "false" -> True or non-string entries in scopes. Adds a regression test that pins the JSON wire shape: ISO expires_at + array scopes now round-trip correctly. - configure_auth_from_env clears any prior default and operation overrides before installing fresh ones; teardown_auth clears them too. Without this, removing the runtime token secret between two configure calls left the previous LocalJwtVerifyProvider override installed on Operation.RUNTIME_USE — silent inconsistency where the config path said runtime should fall through but the registry disagreed. Adds a regression test that exercises the full configure-then-reconfigure path.

A target binding is only meaningful as a (target_type, target_id) pair. The previous schema allowed each field independently, so a malformed grant carrying only target_type would pass type validation and the exchange endpoint's per-field equality check would fall through (the upstream's None never trips the != against the request body), letting the endpoint mint a token bound to whatever target_id the request asked for. Add a model validator on _UpstreamGrant that fails closed when exactly one of the two fields is set; both supplied or both omitted is the only acceptable shape. Pydantic's ValidationError surfaces as 502 like every other malformed-grant case. Tests cover both half-supplied shapes (target_type only and target_id only). Also drop two stale comments referring to upstream-specific implementation choices that bled in earlier — the framework is generic.

Two distinct timing-related fail-closed gaps: 1. Pydantic with strict=True still accepts a naive ISO datetime for the upstream's expires_at because strict only enforces types, not tz. Comparing the resulting naive datetime against datetime.now(UTC) at mint time raises TypeError and surfaces as a 500. Add a field validator on _UpstreamGrant.expires_at that rejects naive datetimes, so a malformed grant fails closed with a 502 alongside the rest of the strict-grant rejections. 2. mint_runtime_token would happily mint when upstream_expires_at <= issued_at, returning a 200 with an exp claim already in the past. Introduce UpstreamGrantExpiredError(RuntimeTokenError) and raise it in that case. The exchange endpoint maps this distinct error class to a 502 (upstream returned bad data) rather than the existing 503 (server misconfigured), so the public status reflects which side the operator should investigate. Tests: - _UpstreamGrant rejects naive expires_at -> 502 (parser fail-closed). - mint_runtime_token raises UpstreamGrantExpiredError when the grant is already past or exactly at issued_at. - Exchange endpoint surfaces the expired grant as 502 (vs 503 for the misconfigured-server path).

…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.

… startup, advertise APIKeyHeader Five review issues against the auth framework: 1. Empty upstream scopes: the exchange endpoint previously fell back to minting a runtime.use token whenever principal.scopes was falsey, which is the same shape an upstream produces by returning an explicit ``"scopes": []``. The fallback is removed; the endpoint now requires runtime.use to be present in principal.scopes for every provider. HeaderAuthProvider explicitly grants runtime.use only when authorizing Operation.RUNTIME_TOKEN_EXCHANGE, so the local path keeps its V1 behavior while upstream privilege escalation is closed off. 2. Runtime config consolidation: AGENT_CONTROL_RUNTIME_TOKEN_SECRET and the TTL are now parsed once at startup into a frozen RuntimeAuthConfig that the mint side and the LocalJwtVerifyProvider verify side both read. configure_auth_from_env raises at startup on misconfiguration instead of producing a runtime 500 from an invalid TTL or a too-short secret. 3. Runtime token secret strength: HS256 needs >= 32 bytes of secret material; values shorter than that are rejected at startup. 4. RUNTIME_USE fallback warning: when no runtime secret is configured the LocalJwtVerifyProvider override is not installed (V1 behavior unchanged), but the startup log now warns that RUNTIME_USE will fall through to the default authorizer, giving operators a clear signal to either configure the secret or accept the long-lived-credential trust model. 5. OpenAPI security entries: the framework-protected routers (/control-bindings, /auth) are now mounted with the existing non-validating get_api_key_from_header Security extractor as a router-level dependency. require_operation still owns runtime authentication and authorization; the Security dependency exists purely so the generated OpenAPI spec advertises X-API-Key on these routes for downstream SDK generation. Confirmed: server/.generated/ openapi.json now lists ``security: [{APIKeyHeader: []}]`` on every framework-protected operation. The TypeScript wrapper AgentControlClient is also extended with an ``auth`` getter so the runtimeTokenExchange method generated under the Auth group is reachable through the public client. A new fixture (``runtime_config_enabled``) replaces the previous os.environ patching in test_runtime_token_exchange_endpoint.py so tests exercise the same config singleton production uses; one new test pins the empty-scope rejection.

…ding routes as namespace-wide Two review issues: 1. ``mint_runtime_token`` now rejects a naive ``upstream_expires_at`` with ``RuntimeTokenError`` instead of letting the comparison against ``datetime.now(UTC)`` raise a raw ``TypeError`` (which surfaces as a 500). The HTTP-upstream parser already rejects timezone-less ``expires_at`` on the wire, but custom authorizers and tests can still call the helper directly; the lower-level API is now self-contained. 2. The four binding-id-based routes (GET/PATCH/DELETE ``/control-bindings/{binding_id}``) are documented as namespace-wide in the OpenAPI summary and docstrings. Per-target authorization is not possible on these routes today because ``require_operation`` is single-pass and the target identifiers are only discoverable after the binding row is loaded. Clients whose authorization model needs per-target permissions are explicitly steered to the natural-key endpoints (``PUT /by-key``, ``POST /by-key:delete``) and the target-filtered list, all of which forward ``(target_type, target_id)`` to the authorizer. Two-phase auth for the by-id routes is tracked as a separate follow-up. Also: TypeScript SDK regenerated to pick up the new endpoint summaries.

…ten tzinfo guard Two review issues: 1. Binding endpoints previously used ``principal.namespace_key`` for the row's storage namespace. With HeaderAuthProvider this was always the default namespace, so the V1 contract held; with HttpUpstreamAuthProvider returning an org-scoped namespace, binding writes would land in that namespace while initAgent / GET /agents/{name}/controls / /evaluation still resolved through ``get_namespace_key`` (V1 default), making target-bound controls invisible to runtime resolution. The seven binding endpoints now read storage namespace from ``get_namespace_key`` so writes and reads stay in lockstep until auth-derived namespace resolution lands across every endpoint. The auth chain still runs via ``require_operation`` for authentication and authorization; the resolved Principal is no longer used to pick the storage namespace. 2. The ``mint_runtime_token`` tzinfo guard now also checks ``utcoffset() is None`` so a custom ``tzinfo`` subclass that returns None from ``utcoffset()`` is rejected at the helper boundary instead of raising a raw ``TypeError`` from the comparison below. TypeScript SDK regenerated to pick up the binding-endpoint docstring updates.

namrataghadi-galileo · 2026-04-29T23:33:43Z

+    return RuntimeAuthConfig(secret=secret, ttl_seconds=_load_runtime_ttl_seconds())
+
+
+def _load_runtime_ttl_seconds() -> int:


P1 — Major
No upper cap on AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS

_load_runtime_ttl_seconds validates > 0 but sets no maximum. A misconfigured AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS=999999999 or a copy-paste accident mints tokens valid for decades; the point of short-lived tokens is defeated. Enforce a sane cap at startup (e.g., 86400 s = 1 day). The upstream_expires_at ceiling in mint_runtime_token only helps when the upstream surfaces an expiry.

namrataghadi-galileo · 2026-04-29T23:36:09Z

+                resource="Resource",
+                hint="Verify the resource exists in the requested namespace.",
+            )
+        # Fail closed on 5xx and unexpected statuses.


HttpUpstreamAuthProvider silently maps all non-200/401/403/404 to 503

A 400 (bad request in the auth call), 422, or 429 (upstream rate-limited) from the upstream all become 503 Authorization service returned an unexpected response. Rate-limit errors in particular are completely hidden from the operator. At minimum, 429 should become a distinct error or be surfaced as a hint in the 503 body.

abhinav-galileo changed the title ~~feat(server): pluggable request-auth framework + migrate control bindings~~ feat(server): pluggable request-auth framework (management + runtime) Apr 28, 2026

abhinav-galileo marked this pull request as ready for review April 28, 2026 21:46

abhinav-galileo requested review from lan17 and namrataghadi-galileo April 28, 2026 21:46

abhinav-galileo force-pushed the abhi/management-auth-framework branch from b87b27f to 8ecb871 Compare April 29, 2026 18:56

namrataghadi-galileo reviewed Apr 29, 2026

View reviewed changes

abhinav-galileo force-pushed the abhi/management-auth-framework branch from 70c8229 to e5f9654 Compare April 29, 2026 22:42

abhinav-galileo force-pushed the abhi/management-auth-framework branch from e5f9654 to 84db093 Compare April 29, 2026 23:14

abhinav-galileo added 11 commits April 29, 2026 19:27

abhinav-galileo force-pushed the abhi/management-auth-framework branch from 84db093 to 7698c07 Compare April 29, 2026 23:31

namrataghadi-galileo reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): pluggable request-auth framework (management + runtime)#204

feat(server): pluggable request-auth framework (management + runtime)#204
abhinav-galileo wants to merge 11 commits intoabhi/data-model-v1from
abhi/management-auth-framework

abhinav-galileo commented Apr 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

namrataghadi-galileo Apr 29, 2026

Uh oh!

namrataghadi-galileo Apr 29, 2026

Uh oh!

namrataghadi-galileo Apr 29, 2026

Uh oh!

namrataghadi-galileo Apr 29, 2026

Uh oh!

namrataghadi-galileo Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return RuntimeAuthConfig(secret=secret, ttl_seconds=_load_runtime_ttl_seconds())


		def _load_runtime_ttl_seconds() -> int:

Conversation

abhinav-galileo commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Module layout

Operation vocabulary

Per-operation authorizer registry

Providers (three ship in-tree)

Runtime token shape

Runtime token exchange endpoint

Migrated endpoints

Env vars

Out of scope (follow-ups)

Stacking

Test plan

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

namrataghadi-galileo Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

namrataghadi-galileo Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

namrataghadi-galileo Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

namrataghadi-galileo Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

namrataghadi-galileo Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhinav-galileo commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading