feat(server): pluggable request-auth framework (management + runtime)#204
feat(server): pluggable request-auth framework (management + runtime)#204abhinav-galileo wants to merge 11 commits intoabhi/data-model-v1from
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
b87b27f to
8ecb871
Compare
| ) | ||
|
|
||
| actor_id = principal.caller_id or "anonymous" | ||
| if principal.scopes: |
There was a problem hiding this comment.
Reject explicit empty upstream scopes
When the HTTP upstream returns an explicit empty scopes array, _UpstreamGrant becomes principal.scopes == (), so this falsey check falls into the local default and mints a token with runtime.use. That is the privilege escalation the comment is trying to avoid when an upstream grant omits runtime.use; the exchange needs to distinguish an unscoped local provider from an explicit upstream grant with no scopes before defaulting.
| dependencies=[Depends(require_api_key)], | ||
| ) | ||
| app.include_router( | ||
| # The auth framework on each endpoint owns authentication and |
There was a problem hiding this comment.
Mounting these framework-protected routers without any FastAPI Security dependency removes the APIKeyHeader requirement from generated OpenAPI; require_operation only accepts Request, so make openapi-spec emits no security entry for /api/v1/control-bindings and the auth exchange route below. API docs and downstream generators will treat these protected operations as unauthenticated even though runtime still requires credentials.
| return (this._agents ??= new Agents(this._options)); | ||
| } | ||
|
|
||
| private _auth?: Auth; |
There was a problem hiding this comment.
The new Auth group is only added to the generated AgentControlSDK, but the package export uses AgentControlClient from src/index.ts, and that wrapper still exposes only the existing groups. Consumers importing agent-control cannot call runtimeTokenExchange through the public client even though this generated getter exists; add the matching wrapper getter/type export.
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
70c8229 to
e5f9654
Compare
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
e5f9654 to
84db093
Compare
Endpoints declare a generic Operation; an installed RequestAuthorizer decides whether the request is allowed and returns the resolved Principal (namespace + admin flag + caller id). Two providers ship in-tree: - HeaderAuthProvider: OSS / single-namespace default. Maps each Operation to one of three access levels (PUBLIC / AUTHENTICATED / ADMIN) and reuses the legacy local credential check; behavior matches the previous require_admin_key path verbatim. V1 ignores the X-Namespace-Key header and always returns the default namespace because non-binding write endpoints still hardcode it; the branch is preserved for a follow-up that lifts the lock. - HttpUpstreamAuthProvider: forwards caller credentials to a configurable upstream URL. Maps 401/403/404 directly; fail-closed (503) on 5xx and network errors; rejects malformed principals (502). Control-binding endpoints now declare CONTROL_BINDINGS_READ / CONTROL_BINDINGS_WRITE via require_operation(...) and read the resolved namespace from the returned Principal. The router is mounted without the legacy router-level gate so the framework owns authentication and authorization end-to-end. Reserved Operation members for controls.* and runtime.use are defined but not yet wired; their migrations land in follow-up PRs.
Rename so the framework's vocabulary is factual: - OssAccessLevel -> AccessLevel - OSS_OPERATION_ACCESS -> DEFAULT_OPERATION_ACCESS - Comments / docstrings: replace "OSS / single-namespace" framing with factual descriptions of the local-credential path. Drop the unjustified MANAGEMENT_ prefix on environment variables; this PR only configures one auth flow: - AGENT_CONTROL_MANAGEMENT_AUTH_MODE -> AGENT_CONTROL_AUTH_MODE - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_URL -> AGENT_CONTROL_AUTH_UPSTREAM_URL - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_TIMEOUT_SECONDS -> AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER Add a regression test for the no-auth flow: when api_key_enabled is False, even admin operations succeed with a non-admin Principal, matching the pre-framework local-auth behavior.
Completes the framework's auth coverage. Management and runtime are genuinely different protocols, and they now route through different authorizers via the per-operation registry: - Per-operation override on the registry. set_authorizer(authorizer, operation=...) overrides the default for one operation; calls without operation= become the default for everything else. Used to point Operation.RUNTIME_USE at LocalJwtVerifyProvider while leaving the default authorizer (header or http_upstream) for management. - Runtime token mint/verify. HS256 JWT, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET), short TTL capped by the upstream grant's expiry. domain="runtime" claim pins the token to the runtime path. Issuer is agent-control/server. - LocalJwtVerifyProvider verifies the Bearer token, checks the scope covers the requested Operation, and returns a Principal with the bound (target_type, target_id) so endpoints can match the request target. - POST /api/v1/auth/runtime-token-exchange. Authenticates via the default authorizer (typically HttpUpstreamAuthProvider in production, which forwards the credential to the configured upstream) and mints a local runtime token from the resulting Principal. Refuses with 503 when the runtime secret is not configured. - Principal grew target_type, target_id, scopes, grant_expires_at fields so providers can surface the upstream grant's binding and the exchange endpoint can mint a token from it. HttpUpstreamAuthProvider parses the matching optional fields from the upstream JSON response. - Configuration: AGENT_CONTROL_AUTH_* configures the default authorizer; AGENT_CONTROL_RUNTIME_TOKEN_SECRET (+ optional AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS) enables the runtime override. Without the secret, runtime endpoints fall through to the default authorizer. Tests: 18 new unit + integration tests covering the registry overrides, token round-trip / wrong-secret / expired / wrong-domain rejection, JWT-verify provider behavior (target binding, missing token, wrong scope, non-Bearer header), and the exchange endpoint (503 without secret, mint when configured, target mismatch, missing target, context forwarded to authorizer, full exchange-then-verify round trip). The TypeScript SDK regenerates with the new endpoint surface (runtime-token-exchange) — committed alongside.
…es/grant Five hardening changes prompted by review: - Runtime tokens carry namespace_key. mint_runtime_token now requires it; the JWT payload includes it; verify_runtime_token rejects tokens without it; LocalJwtVerifyProvider returns the token's namespace on the resulting Principal instead of always defaulting. Otherwise a token minted for org A would resolve runtime controls in the default namespace once /evaluation is wired to RUNTIME_USE. - Exchange endpoint refuses to add runtime.use to a grant that omits it. If the upstream returned an explicit scope set without runtime.use, the credential is not authorized for runtime use on this target — minting one anyway would be privilege escalation. Defaulting to runtime.use is preserved only when the provider returned no scoped grant (e.g., local header path). - HttpUpstreamAuthProvider parses the upstream response with a strict Pydantic model (strict=True). Wrong-typed is_admin, malformed scopes, bad expires_at, and non-string target fields fail closed with 502 instead of being silently coerced or dropped. Unknown fields are still tolerated so the upstream can evolve. - LocalJwtVerifyProvider enforces target context match when the dependency surfaces it. Future runtime endpoints can declare a context_builder that extracts target_type/target_id from the request; the provider verifies the token's binding matches and rejects with 403 otherwise. - Auth provider lifecycle. configure_auth_from_env tracks installed providers; teardown_auth (called from FastAPI lifespan shutdown) closes any aclose-able providers — releases the HttpUpstreamAuthProvider's owned httpx.AsyncClient. Tests: nine new cases covering token-namespace round-trip, target context mismatch on type and id, strict grant rejection across each malformed field, the privilege-escalation guard, and a full non-default-namespace round trip through the exchange endpoint.
… on reconfigure Two follow-up fixes from review: - HttpUpstreamAuthProvider validates against the raw response bytes via _UpstreamGrant.model_validate_json instead of round-tripping through response.json() and model_validate. Pydantic's JSON parser accepts ISO datetimes and JSON arrays (the actual wire shapes any HTTP service produces) while strict=True still rejects type-coercion bugs like "false" -> True or non-string entries in scopes. Adds a regression test that pins the JSON wire shape: ISO expires_at + array scopes now round-trip correctly. - configure_auth_from_env clears any prior default and operation overrides before installing fresh ones; teardown_auth clears them too. Without this, removing the runtime token secret between two configure calls left the previous LocalJwtVerifyProvider override installed on Operation.RUNTIME_USE — silent inconsistency where the config path said runtime should fall through but the registry disagreed. Adds a regression test that exercises the full configure-then-reconfigure path.
A target binding is only meaningful as a (target_type, target_id) pair. The previous schema allowed each field independently, so a malformed grant carrying only target_type would pass type validation and the exchange endpoint's per-field equality check would fall through (the upstream's None never trips the != against the request body), letting the endpoint mint a token bound to whatever target_id the request asked for. Add a model validator on _UpstreamGrant that fails closed when exactly one of the two fields is set; both supplied or both omitted is the only acceptable shape. Pydantic's ValidationError surfaces as 502 like every other malformed-grant case. Tests cover both half-supplied shapes (target_type only and target_id only). Also drop two stale comments referring to upstream-specific implementation choices that bled in earlier — the framework is generic.
Two distinct timing-related fail-closed gaps: 1. Pydantic with strict=True still accepts a naive ISO datetime for the upstream's expires_at because strict only enforces types, not tz. Comparing the resulting naive datetime against datetime.now(UTC) at mint time raises TypeError and surfaces as a 500. Add a field validator on _UpstreamGrant.expires_at that rejects naive datetimes, so a malformed grant fails closed with a 502 alongside the rest of the strict-grant rejections. 2. mint_runtime_token would happily mint when upstream_expires_at <= issued_at, returning a 200 with an exp claim already in the past. Introduce UpstreamGrantExpiredError(RuntimeTokenError) and raise it in that case. The exchange endpoint maps this distinct error class to a 502 (upstream returned bad data) rather than the existing 503 (server misconfigured), so the public status reflects which side the operator should investigate. Tests: - _UpstreamGrant rejects naive expires_at -> 502 (parser fail-closed). - mint_runtime_token raises UpstreamGrantExpiredError when the grant is already past or exactly at issued_at. - Exchange endpoint surfaces the expired grant as 502 (vs 503 for the misconfigured-server path).
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
… startup, advertise APIKeyHeader
Five review issues against the auth framework:
1. Empty upstream scopes: the exchange endpoint previously fell back to
minting a runtime.use token whenever principal.scopes was falsey,
which is the same shape an upstream produces by returning an explicit
``"scopes": []``. The fallback is removed; the endpoint now requires
runtime.use to be present in principal.scopes for every provider.
HeaderAuthProvider explicitly grants runtime.use only when authorizing
Operation.RUNTIME_TOKEN_EXCHANGE, so the local path keeps its V1
behavior while upstream privilege escalation is closed off.
2. Runtime config consolidation: AGENT_CONTROL_RUNTIME_TOKEN_SECRET and
the TTL are now parsed once at startup into a frozen RuntimeAuthConfig
that the mint side and the LocalJwtVerifyProvider verify side both
read. configure_auth_from_env raises at startup on misconfiguration
instead of producing a runtime 500 from an invalid TTL or a too-short
secret.
3. Runtime token secret strength: HS256 needs >= 32 bytes of secret
material; values shorter than that are rejected at startup.
4. RUNTIME_USE fallback warning: when no runtime secret is configured
the LocalJwtVerifyProvider override is not installed (V1 behavior
unchanged), but the startup log now warns that RUNTIME_USE will fall
through to the default authorizer, giving operators a clear signal
to either configure the secret or accept the long-lived-credential
trust model.
5. OpenAPI security entries: the framework-protected routers
(/control-bindings, /auth) are now mounted with the existing
non-validating get_api_key_from_header Security extractor as a
router-level dependency. require_operation still owns runtime
authentication and authorization; the Security dependency exists
purely so the generated OpenAPI spec advertises X-API-Key on these
routes for downstream SDK generation. Confirmed: server/.generated/
openapi.json now lists ``security: [{APIKeyHeader: []}]`` on every
framework-protected operation.
The TypeScript wrapper AgentControlClient is also extended with an
``auth`` getter so the runtimeTokenExchange method generated under the
Auth group is reachable through the public client.
A new fixture (``runtime_config_enabled``) replaces the previous
os.environ patching in test_runtime_token_exchange_endpoint.py so tests
exercise the same config singleton production uses; one new test pins
the empty-scope rejection.
…ding routes as namespace-wide
Two review issues:
1. ``mint_runtime_token`` now rejects a naive ``upstream_expires_at``
with ``RuntimeTokenError`` instead of letting the comparison against
``datetime.now(UTC)`` raise a raw ``TypeError`` (which surfaces as a
500). The HTTP-upstream parser already rejects timezone-less
``expires_at`` on the wire, but custom authorizers and tests can
still call the helper directly; the lower-level API is now
self-contained.
2. The four binding-id-based routes (GET/PATCH/DELETE
``/control-bindings/{binding_id}``) are documented as namespace-wide
in the OpenAPI summary and docstrings. Per-target authorization is
not possible on these routes today because ``require_operation`` is
single-pass and the target identifiers are only discoverable after
the binding row is loaded. Clients whose authorization model needs
per-target permissions are explicitly steered to the natural-key
endpoints (``PUT /by-key``, ``POST /by-key:delete``) and the
target-filtered list, all of which forward
``(target_type, target_id)`` to the authorizer. Two-phase auth for
the by-id routes is tracked as a separate follow-up.
Also: TypeScript SDK regenerated to pick up the new endpoint summaries.
…ten tzinfo guard
Two review issues:
1. Binding endpoints previously used ``principal.namespace_key`` for
the row's storage namespace. With HeaderAuthProvider this was always
the default namespace, so the V1 contract held; with
HttpUpstreamAuthProvider returning an org-scoped namespace, binding
writes would land in that namespace while initAgent / GET
/agents/{name}/controls / /evaluation still resolved through
``get_namespace_key`` (V1 default), making target-bound controls
invisible to runtime resolution. The seven binding endpoints now
read storage namespace from ``get_namespace_key`` so writes and
reads stay in lockstep until auth-derived namespace resolution
lands across every endpoint. The auth chain still runs via
``require_operation`` for authentication and authorization; the
resolved Principal is no longer used to pick the storage namespace.
2. The ``mint_runtime_token`` tzinfo guard now also checks
``utcoffset() is None`` so a custom ``tzinfo`` subclass that returns
None from ``utcoffset()`` is rejected at the helper boundary
instead of raising a raw ``TypeError`` from the comparison below.
TypeScript SDK regenerated to pick up the binding-endpoint docstring
updates.
84db093 to
7698c07
Compare
| return RuntimeAuthConfig(secret=secret, ttl_seconds=_load_runtime_ttl_seconds()) | ||
|
|
||
|
|
||
| def _load_runtime_ttl_seconds() -> int: |
There was a problem hiding this comment.
P1 — Major
No upper cap on AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS
_load_runtime_ttl_seconds validates > 0 but sets no maximum. A misconfigured AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS=999999999 or a copy-paste accident mints tokens valid for decades; the point of short-lived tokens is defeated. Enforce a sane cap at startup (e.g., 86400 s = 1 day). The upstream_expires_at ceiling in mint_runtime_token only helps when the upstream surfaces an expiry.
| resource="Resource", | ||
| hint="Verify the resource exists in the requested namespace.", | ||
| ) | ||
| # Fail closed on 5xx and unexpected statuses. |
There was a problem hiding this comment.
HttpUpstreamAuthProvider silently maps all non-200/401/403/404 to 503
A 400 (bad request in the auth call), 422, or 429 (upstream rate-limited) from the upstream all become 503 Authorization service returned an unexpected response. Rate-limit errors in particular are completely hidden from the operator. At minimum, 429 should become a distinct error or be surfaced as a hint in the 503 body.
Summary
Pluggable request-auth framework that handles both auth flows the system needs:
HttpUpstreamAuthProviderforwarding to a configurable upstream service.(target_type, target_id)to a token exchange endpoint; the server mints a short-lived HS256 JWT bound to that target. Subsequent runtime calls verify the JWT locally — no upstream round-trip on the hot path.Both flows route through the same primitives (
Operationvocabulary on endpoints,Principalreturned,RequestAuthorizerProtocol installed); a per-operation registry lets a deployment point management ops at one provider and runtime ops at another.Migrates the
/control-bindingsendpoint family onto the framework and ships the runtime token exchange endpoint. The runtime resolution path itself (/evaluationetc.) is wired in a follow-up — its provider override (LocalJwtVerifyProvider) is already in place when the runtime secret is configured.Module layout
auth.py(legacy local credential check) is unchanged;HeaderAuthProviderre-uses_validate_api_keyfrom it. Non-binding routes still go through the legacy router-level gate; their migration happens in follow-up PRs.Operation vocabulary
Per-operation authorizer registry
set_authorizer(authorizer, operation=...)overrides the default for one operation. Withoutoperation=, it becomes the default for every operation that does not have a specific binding. Used to route management ops through one provider andOperation.RUNTIME_USEthroughLocalJwtVerifyProvider:require_operation(op)consults the override first, falls back to the default. The OSS path (no override installed) routes everything toHeaderAuthProvider— the no-auth flow (api_key_enabled=False) is preserved end-to-end.Providers (three ship in-tree)
HeaderAuthProvider— local-credential path, single namespace.Operationto one of three access levels (PUBLIC,AUTHENTICATED,ADMIN); single source of truth inDEFAULT_OPERATION_ACCESS.auth.py, so behavior matches the previousrequire_admin_keypath verbatim.api_key_enabled=False) is preserved: every operation succeeds with a non-adminPrincipal. Pinned by a regression test.DEFAULT_NAMESPACE_KEY. The namespace header lookup branch is preserved but inert until non-binding write endpoints are threaded.HttpUpstreamAuthProvider— generic upstream-delegating provider.X-API-Key,Authorization,Cookie) on a POST to a configurable URL with{operation, context?}.Principal:namespace_key,is_admin,caller_id, plus optional grant fields (target_type,target_id,scopes,expires_at) so the runtime token exchange can mint from the same response.200→Principal;401/403/404→ matching error;5xx, network errors, and malformed payloads fail closed (503/502).LocalJwtVerifyProvider— hot-path runtime verifier.Authorization, verifies signature against the runtime secret, checksdomain == "runtime", the issuer, expiry, and that the token's scope covers the requestedOperation.Principalwith the bound(namespace_key, target_type, target_id)so runtime endpoints inherit the namespace and target binding without re-deriving them.target_type/target_idviacontext_builder, the provider also enforces that they match the token's binding — runtime endpoints get the request-target check for free.Runtime token shape
HS256, dedicated secret (
AGENT_CONTROL_RUNTIME_TOKEN_SECRET), issueragent-control/server. Claims:domainruntime; tokens minted here MUST not be accepted on management endpoints.namespace_keyactor_idscopes["runtime.use"]). The exchange endpoint refuses to mint when the upstream's explicit grant omitsruntime.use.target_type/target_idiat/expexpires_atso the local token can never outlive its grant.jtiRuntime token exchange endpoint
Operation.RUNTIME_TOKEN_EXCHANGEthrough the default authorizer (typicallyHttpUpstreamAuthProviderin production). The authorizer'scontext_builderforwards the requested target to the upstream so it can authorize against the right resource.AGENT_CONTROL_RUNTIME_TOKEN_SECRETis not configured.Principal.scopes/Principal.grant_expires_at, capped by the configured TTL (default 300s).Principalcarries a target binding, the endpoint verifies it matches the requested target before minting.Response:
{ token, expires_at, target_type, target_id, scopes }.Migrated endpoints
All seven
/api/v1/control-bindings*endpoints now useDepends(require_operation(...)):/control-bindingscontrol_bindings.write/control-bindingscontrol_bindings.read/control-bindings/{binding_id}control_bindings.read/control-bindings/{binding_id}control_bindings.write/control-bindings/{binding_id}control_bindings.write/control-bindings/by-keycontrol_bindings.write/control-bindings/by-key:deletecontrol_bindings.writeNew:
POST /api/v1/auth/runtime-token-exchange(operationruntime.token_exchange).Env vars
AGENT_CONTROL_AUTH_MODEheaderheaderorhttp_upstream.AGENT_CONTROL_AUTH_UPSTREAM_URLhttp_upstream.AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS5.0AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKENAGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADERX-Agent-Control-Service-TokenAGENT_CONTROL_RUNTIME_TOKEN_SECRETAGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS300Out of scope (follow-ups)
/controlsCRUD ontorequire_operationusing the reservedCONTROLS_*operations.Operation.RUNTIME_USEon the runtime resolution path (/evaluation, etc.) and the SDK side of the runtime exchange. The provider override is already in place when the runtime secret is configured. With feat(server): namespace scoping and control bindings #203's merged-resolver contract on/evaluation, the JWT-verified target binding now narrows the effective set the resolver returns; the verifier's match check is load-bearing for correctness, not just for authorization./agents/initAgentontorequire_operation. TheHttpUpstreamAuthProvider'scontext_buildershould forward the request'starget_type/target_id(added in feat(server): namespace scoping and control bindings #203) to the upstream so the upstream can authorize against the requested resource.HeaderAuthProvidercan be turned on safely.auth.py'srequire_admin_keyonce every management endpoint is migrated.Stacking
Stacked on PR #203 (
abhi/data-model-v1); rebased onto its current head8f806a3so the merged effective-controls contract (target bindings unioned with direct + policy controls, namespace_key threaded through every join) is the base this PR builds on.GET /control-bindings/effectiveis gone in #203, so the migration of that route went away with it; the seven surviving binding endpoints are migrated as before. Will rebase ontomainonce #203 merges.Test plan
Operationmember has a default access mapping (regression guard).HeaderAuthProvider: PUBLIC bypass, AUTHENTICATED + ADMIN paths route to the legacy validator with the rightrequire_adminflag, no-auth mode passes admin operations, namespace-header lookup currently inert, unknown operation raises.HttpUpstreamAuthProvider: 200 happy path with realistic JSON wire shapes (ISO datetime + JSON array scopes round-trip), service token forwarding, 401/403/404 mapping, 5xx fail-closed, network-error fail-closed, strict-grant rejection on wrong-typedis_admin/ malformedscopes/ badexpires_at/ non-string target fields, partial target grant (target_typeonly ortarget_idonly) rejected, naiveexpires_atrejected (no tz info → fail-closed 502 at the parser instead ofTypeErrorlater in the mint path).require_operationfactory: routes through the installed authorizer, per-operation overrides take precedence, clearing an override falls back to the default,get_authorizerraises when nothing is set.LocalJwtVerifyProvideroverride; teardown clears every authorizer.UpstreamGrantExpiredErrorinstead of minting a token with anexpin the past (also covers the boundary case whereexpires_at == issued_at).LocalJwtVerifyProvider: target-boundPrincipal, namespace carried from token, missing token → 401, wrong scope → 403, non-Bearer header → 401, target-context match enforcement (mismatch on type or id → 403).make lintclean.make typecheckclean.make sdk-ts-generate-checkclean.auth-runtime-token-exchange, request/response models).