feat(server): namespace scoping and control bindings#203
feat(server): namespace scoping and control bindings#203abhinav-galileo merged 34 commits intomainfrom
Conversation
Adds a namespace_key column to agents, controls, policies, and the three association tables. Replaces single-column uniqueness with namespace-scoped composite uniqueness, and converts association-table foreign keys to composite same-namespace foreign keys. Adds a control_bindings table for attaching controls to opaque external targets, with an optional agent_name selector for narrower overrides inside a target. Two binding shapes are supported via partial unique indexes: target-default (agent_name IS NULL) and target-agent. OSS and single-namespace deployments are preserved by the 'default' server default on every namespace_key column. Existing endpoint and service code is unchanged; default-namespace behavior is fully backward compatible.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… indexes - control_bindings.id: change migration column type from BigInteger to Integer to match the ORM model and the convention of every other id column in the schema. - control_bindings.agent_name: change column type from Text to String(255) and add a check constraint requiring NULL or the same format/length as agents.name. Bindings may still predate agent registration; callers must normalize before insert. - Add plain natural-key indexes ix_agents_name and ix_policies_name to preserve name-only lookup performance while service code is still namespace-blind. The new composite primary keys and unique constraints lead with namespace_key, so name-only queries no longer have a leading-column index without these. - Document that soft deletes on a control do not cascade to bindings (the runtime resolver excludes soft-deleted controls). - Add tests for the soft-delete survival path and malformed-agent_name rejection.
Adds ControlBindingsService.resolve_effective_controls, which returns the active control set for a target-bearing request. Two binding shapes are considered: target-default (agent_name IS NULL) and target-agent. For each control_id, the most-specific binding wins (target-agent beats target-default); a winning binding with enabled=False excludes the control. Soft-deleted controls are filtered out.
Adds CRUD endpoints under /control-bindings backed by ControlBindingsService.
- PUT /control-bindings create binding (admin)
- GET /control-bindings list with optional target/agent/control filters
- GET /control-bindings/{id} single-binding detail
- PATCH /control-bindings/{id} toggle enabled (admin)
- DELETE /control-bindings/{id} delete binding (admin)
Adds CONTROL_BINDING_NOT_FOUND and CONTROL_BINDING_CONFLICT error codes,
the matching request/response Pydantic types, and a get_namespace_key
dependency that returns the default namespace and is overridable for
deployment-specific namespace resolution.
Service create/update/delete enforce same-namespace integrity by checking
the parent control belongs to the request's namespace; uniqueness
violations are translated into 409 conflicts.
EvaluationRequest gains optional target_type/target_id fields. When both are supplied the evaluation endpoint resolves the effective control set from control_bindings (no agents row required); otherwise it uses the existing agent-attached path. The two paths do not silently merge. Adds ControlBindingsService.resolve_runtime_controls and a shared parse_runtime_controls helper to avoid duplicating the Control to RuntimeControl conversion across services.
Adds idempotent attach/detach endpoints addressed by the natural key (target_type, target_id, agent_name?, control_id): - PUT /control-bindings/by-key upsert (creates or updates enabled) - POST /control-bindings/by-key:delete delete (returns deleted=False if missing) Useful for callers that want to attach a control without first checking whether a binding already exists. Backed by ControlBindingsService.upsert_by_natural_key and delete_by_natural_key.
Adds optional target_type/target_id parameters to evaluate_controls, check_evaluation, and check_evaluation_with_local. When supplied, both fields are included on the EvaluationRequest sent to the server, which routes the request through the target-bearing resolution path. Both fields must be supplied together; the server enforces this via the EvaluationRequest model validator.
- SDK target-bearing requests now bypass cached agent-attached controls
and call the server unconditionally. The cached controls (from
initAgent) are agent-attachment data; target-bearing requests must
resolve from control_bindings only, which the server enforces, but
the SDK was previously short-circuiting against the cache when no
applicable server controls were present.
- agent_name on control-binding requests is normalized and validated
at the API boundary using the same rules as agents.name. Mixed-case
or whitespace-padded values are accepted and normalized; values that
fail the format/length rules are rejected with 422 instead of leaking
to the database check constraint as 500/conflict.
- ControlBinding.updated_at is refreshed on UPDATE via SQLAlchemy
onupdate. PATCH /control-bindings/{id} and idempotent natural-key
upserts now reflect the updated timestamp on subsequent reads.
Each binding row now attaches one control to one target inside a namespace. Per-agent overrides and exemptions within a target are out of scope at this stage; both the migration and the ControlBinding model docstring document the two forward paths if and when those become a product requirement (re-add agent_name with a partial-index pair, or merge target-bearing resolution with agent_controls). Net simplifications: - One unique constraint instead of a partial-index pair on (agent_name IS NULL / IS NOT NULL). - No agent-name CHECK constraint or normalize_optional_agent_name validator on binding requests. - Resolver returns the target-level control set directly; no most- specific-wins logic, no winners dict. - Pydantic request/response models lose the agent_name field; the list endpoint loses the agent_name query parameter. - SDK target-bearing path is unchanged (it never carried agent_name on bindings). Schema/code/test/doc all stay aligned. Migration round-trip verified locally; full server suite passes (601 tests), lint clean, typecheck clean.
…Request The class docstring still described the upsert natural key as (target_type, target_id, agent_name, control_id). Update to match the V1 shape: (target_type, target_id, control_id).
…oints Adds the auto-generated bindings for the new /control-bindings surface (create, list, get, patch, delete, upsert-by-key, delete-by-key) and refreshes the evaluation models/sdk to include the optional target_type / target_id fields. Also adds the method-name overrides under sdks/typescript/overlays/method-names.overlay.yaml.
lan17
left a comment
There was a problem hiding this comment.
Thanks for the work here. The overall direction makes sense, but I found a couple issues that should be fixed before merge.
One P1 could not be placed inline because it points at an unchanged file:
[P1] Control deletion ignores target bindings (server/src/agent_control_server/endpoints/controls.py:921-925)
delete_control only checks policy and agent associations before soft-deleting a control. A control that is actively attached through control_bindings can be deleted with force=false, which disables protection for that target while leaving the binding row behind. The lifecycle/in-use checks should include target bindings, and force=true should have explicit binding semantics.
The rest of the findings are inline.
Adds GET /api/v1/control-bindings/effective which returns the effective control set for a target in the same shape as InitAgentResponse.controls. The Python SDK now fetches this list when a request is target-bearing and runs it through the existing local-vs-server execution split, so controls with execution='sdk' bound to a target run locally instead of being silently dropped on the server side. Adds the EffectiveTargetControlsResponse model, regenerates the TypeScript client to include the new endpoint, and adds endpoint-level tests covering the bound-controls, disabled-binding, and empty-result cases plus an SDK test that exercises the local-eval path end-to-end. The previous _evaluate_target_bearing short-circuit is gone; target- bearing requests now go through check_evaluation_with_local with the target-bound controls in place of the cached agent set.
delete_control previously only inspected agent and policy associations, so a control attached via control_bindings could be soft-deleted with force=false. The binding row would remain pointing at a deleted control, silently disabling protection on the target. The lifecycle check now also lists target bindings: force=false rejects deletion with CONTROL_IN_USE listing the binding IDs, and force=true removes the bindings before soft-deleting the control. The detached binding IDs are surfaced in the response under detached_target_bindings.
upsert_by_natural_key previously did SELECT then INSERT, so two concurrent calls for the same (namespace_key, target_type, target_id, control_id) could both miss the existing row, then one would hit the unique-constraint violation and surface as an unhandled IntegrityError (500) even though the endpoint is documented as idempotent. The loser of the race now catches IntegrityError, rolls back its insert, re-reads the winning row, and applies its requested enabled value as an update. Both calls return successfully; the create flag is true only for the caller whose insert actually wrote the row.
list_bindings previously returned every binding in the namespace, which grows linearly with attached targets. Switched to cursor-based pagination matching the existing list_controls_page idiom: cursor + limit query params, results ordered by ID descending (newest first), pagination metadata returned alongside the binding list. ListControlBindingsResponse now carries a PaginationInfo block (limit, total, next_cursor, has_more). Default limit is 20, max 100, mirroring the controls list endpoint. The TypeScript client is regenerated to include the new query params.
The * marker before target_type/target_id also made the previously positional-or-keyword arguments trace_id, span_id, and event_agent_name keyword-only. External callers passing them positionally would break with TypeError. Move * to immediately before target_type so the new fields are keyword-only while existing arguments retain their positional contract.
…gle query Add (namespace_key, control_id) index used by list_bindings(control_id=...) and the cascade path from controls. Collapse resolve_effective_controls into a single JOIN against control_bindings instead of two round trips.
Match the existing String(255) convention used for agent_name and control/policy names. Bounds index key size and prevents pathological long values across all namespace-scoped tables.
The downgrade refuses to run when cross-namespace duplicates exist on agents, policies, or live controls; only the agents path was tested. Add policies and live-controls cases plus a soft-deleted positive case that verifies the deleted_at filter still allows downgrade.
Spell out that integrators can declare any FastAPI-resolvable dependency in their override signature, and include a JWT-claim example so the extension point is concrete instead of abstract.
Two related gaps left by the namespace migration: - Add idx_controls_namespace_name_active to the duplicate-name conflict set so concurrent name collisions surface as 409 instead of 500. Parametrize the IntegrityError tests across both index names. - Restore a non-unique partial index ix_controls_name on controls(name) WHERE deleted_at IS NULL, mirroring the natural-key indexes added for agents and policies. Existing service code still does name-only Control lookups; without this index those go to a sequential scan post-migration.
Endpoints used a Depends(get_namespace_key) seam, but only the binding and evaluation endpoints honored it; controls/agents/policies still wrote and queried under the DB default. Overriding the seam in a deployment broke binding creation with CONTROL_NOT_FOUND. Drop the seam for V1: endpoints resolve to DEFAULT_NAMESPACE_KEY directly. Schema and services remain namespace-scoped so a future change can thread a single resolver through every write path together.
f90f8f0 dropped the seam, but a single resolver function on the read side is still useful: every namespace-scoped endpoint funnels through one call site, so a future change can switch every reader to a real per-request resolver in one place. V1 returns DEFAULT_NAMESPACE_KEY unconditionally and documents that overriding is unsupported until controls/agents/policies endpoints are threaded.
Mirrors the agent-bound flow: a per-target LRU cache populated lazily on first evaluation, kept fresh by a daemon thread that refetches each cached entry on a fixed interval. Public API parallels the agent-bound side: refresh_target_controls / refresh_target_controls_async for explicit refresh, invalidate_target_controls_cache for explicit drop. init() takes target_controls_refresh_interval_seconds (default 60s, 0 disables); shutdown stops the loop alongside the policy refresh loop.
A process-wide LRU cache keyed only by (target_type, target_id) cannot distinguish entries written under different SDK sessions. Re-initing against a different server or API key would have served controls from the previous identity until the entry was evicted or overwritten, including races where an in-flight refresh lands after init() proceeds. Tie cache lifetime to the session boundary: init() and shutdown call cache.reset(), which both clears every entry and advances an internal epoch token. Writers capture the epoch before fetching and pass it to put(); writes whose epoch no longer matches are rejected silently. Both write paths (refresh worker and lazy fetch) are covered.
check_evaluation_with_local accepts an arbitrary AgentControlClient. Reusing the global cache for a client whose base_url or api_key does not match the active SDK session would let controls fetched against one server serve evaluations against another. Skip the cache entirely in that case and fetch live; only the init()-managed session populates or reads the shared cache.
Patch coverage on the previous push fell below the repo target. The
gap was concentrated in two places, both genuinely useful to test:
- SDK target-controls polling and refresh APIs:
- invalidate_target_controls_cache (single-key, both args, no args)
- refresh_target_controls_async (empty cache, multi-key fetch,
per-target failure isolation, stale-write rejection after reset)
- refresh_target_controls (sync wrapper from sync and async contexts)
- _start_target_controls_refresh_loop / _stop_target_controls_refresh_loop
(round trip, zero-interval no-op)
- _target_controls_refresh_worker (cache-empty short-circuit,
refresh-known-keys integration, session-unset skip, exception
isolation)
- ControlBindingsService.upsert_by_natural_key IntegrityError branch:
the existing test exercised the SELECT-then-UPDATE fast path; the
new test simulates the race where both transactions miss on SELECT,
one INSERT trips the unique index, and the loser must roll back,
re-fetch the winner, and apply the requested enabled value.
|
I think there is still a broader contract issue here: both |
initAgent, GET /agents/{name}/controls, and POST /evaluation now resolve
the same de-duplicated effective set: direct attachments, policy-derived
controls, and (when target context is supplied) controls bound to that
target via enabled bindings in the same namespace.
Server
- Add optional target_type/target_id to InitAgentRequest with paired-target
validation. initAgent merges target bindings into its returned controls
and no longer short-circuits to an empty list on agent creation, so a
newly registered agent picks up pre-existing bindings.
- list_agent_controls accepts target_type/target_id; the GET endpoint
enforces the paired-target rule and threads namespace_key through.
- ControlService.list_controls_for_agent / list_runtime_controls_for_agent
/ _list_db_controls_for_agent now require namespace_key and accept
optional target params; every joined table (agent_controls,
agent_policies, policy_controls, ControlBinding, Control) is filtered
on namespace_key.
- /evaluation collapses to one resolver: ControlService.list_runtime_controls_for_agent
with namespace + target. The agent row is required on every request.
- Drop GET /control-bindings/effective and the
ControlBindingsService.resolve_effective_controls /
resolve_runtime_controls helpers; the merged ControlService path is
authoritative.
SDK (Python)
- init() takes target_type/target_id; both must be supplied together.
The values flow into state and ride on the registration call and on
every subsequent /agents/{name}/controls poll.
- Drop the per-target controls cache, refresh worker, refresh API, and
invalidate API; one polling loop and one publish path remain.
- check_evaluation, check_evaluation_with_local, and evaluate_controls
default target_type/target_id from state when omitted.
- _reset_state and shutdown clear target context.
- agents.register_agent / list_agent_controls forward target params.
Drops EffectiveTargetControlsResponse from the model exports and
regenerates the TypeScript SDK against the new spec.
…rget The session control cache (state.server_controls) is fetched for the target context fixed at init() time. A per-call target override that disagrees with the session target would drive local-first evaluation against the wrong cached set and could return safe without contacting the server. The V1 contract is one target per SDK session. Add a shared _resolve_session_target helper that defaults missing per-call targets from state and rejects mismatches with a clear ValueError pointing callers at re-init. Apply at all three entry points (check_evaluation, check_evaluation_with_local, evaluate_controls) so the contract is uniform across the public API. Also update EvaluationRequest's target_type/target_id descriptions: the server now merges target bindings into the agent + policy effective set rather than resolving from bindings alone. The TypeScript SDK regenerates the new wording.
A session initialized via init() without target context still has a control cache fetched for that no-target context. The previous mismatch check only fired when state.target_type was set, so a caller could pass target_type/target_id per call on a no-target session and have those values accepted - then evaluate against the wrong cache, potentially returning safe without contacting the server. Treat (None, None) as a valid session target. Use state.current_agent as the active-session sentinel so the rule applies inside an init()'d session (any per-call target must equal the session target) but is skipped outside one (lower-level direct-client flows still work). Add a regression test covering the no-target session path and update the existing test_per_call_target_must_match_session_target to model an active session by patching state.current_agent.
Runnable example showing the V1 contract end-to-end: - init(target_type='env', target_id='prod') returns the merged effective set (agent's direct attachments + bindings for the supplied target). - @control() decorator runs against that merged set automatically. - evaluate_controls(...) defaults its target context from the session. - A per-call target that disagrees with the session target is rejected with a clear ValueError. setup_controls.py provisions the agent, two controls, attaches one directly, and binds the other to (env, prod) via the natural-key upsert endpoint (idempotent on re-run). demo_agent.py walks through the four phases and prints the expected outcome at each step. Indexed in examples/README.md alongside the other framework demos.
…-binding evaluator gate
Three review issues against the merged-resolver contract:
1. Cross-namespace agent lookup: GET /agents/{name}/controls now passes
namespace_key into _get_agent_or_404. An agent that exists only in
another namespace surfaces as 404 instead of returning a 200 with the
wrong/empty effective set. The lookup is opt-in at the helper so other
call sites that don't yet thread namespace through stay unchanged.
2. List-binding cursor type: server emits next_cursor as a string, so the
GET /control-bindings cursor parameter accepts a string and parses it
to int internally. Round-trip with PaginationInfo.next_cursor now works
end-to-end through the generated TypeScript SDK; previously the int
typing on cursor failed client-side validation when fed back from
pagination.next_cursor.
3. Agent-scoped evaluators on target bindings: ControlBindingsService now
rejects controls whose condition tree references agent-scoped
evaluators (agent_name:evaluator) at binding creation time. Target
bindings have no specific agent to validate against, so a binding can
apply a control to any agent that later evaluates against the target;
accepting agent-scoped references would surface as a runtime
evaluation failure instead of a clear 400 at attach time. New
ErrorCode.CONTROL_BINDING_INCOMPATIBLE.
Also:
- SDK check_evaluation / check_evaluation_with_local are no longer
session-bound. They take their own client (and controls); session
target enforcement lives only on evaluate_controls. The shared
validator is split: _validate_target_pair (both-or-neither) for the
caller-owned helpers, _resolve_session_target (default + reject
mismatch) for the session-bound entry point. Tests for the
session-target rules move to evaluate_controls.
- TypeScript SDK regenerated to match the new cursor type.
- One regression test in test_target_merged_contract pins the
cross-namespace 404.
…ace, expose controlBindings
Three review issues:
1. Savepoint scoping for the upsert race: ``upsert_by_natural_key`` now
wraps the conflicting insert in ``begin_nested()`` so a unique-
constraint collision rolls back the SAVEPOINT only. The previous
``session.rollback()`` would discard every pending change in the
surrounding transaction once anything composed this service after a
prior flush.
2. Namespace-scope agent endpoints end-to-end. ``_get_agent_or_404``
now requires ``namespace_key`` and is non-disclosing across
namespaces. The 11 callers thread ``namespace_key=Depends(get_namespace_key)``
through every signature; agent_policies / agent_controls reads,
inserts, and deletes filter by namespace_key. Policy lookups in the
association routes also filter by namespace_key, and
``ControlService.get_active_control_or_404 / list_controls_for_policy
/ add_control_to_agent / remove_control_from_agent`` accept
``namespace_key`` so the service layer is no longer namespace-blind.
A regression test pins that cross-namespace agent association calls
surface 404, mirroring the pattern from the GET /agents/{name}/
controls case.
3. TypeScript client wrapper exposes the new ``controlBindings`` API
alongside ``agents``, ``controls``, etc., so consumers using the
public ``AgentControlClient`` no longer have to reach into the
generated internals.
…g race
Two review issues:
1. ``create_binding`` now wraps the conflicting insert in a SAVEPOINT
via ``begin_nested()`` so a duplicate-natural-key collision rolls
back only that insert. Mirrors the upsert path so neither service
method discards unrelated flushed work in a caller's transaction.
2. Plain agent metadata reads — ``GET /agents/{name}``,
``GET /agents``, ``GET /agents/{name}/evaluators``,
``GET /agents/{name}/evaluators/{evaluator_name}`` — now scope by
``namespace_key`` so duplicate names across namespaces (allowed by
this migration) cannot leak rows from another namespace. The list
endpoint additionally namespace-scopes the count, the page query,
and the cursor-row lookup so pagination cannot redirect through a
foreign-namespace agent.
TypeScript SDK regenerated to pick up the new docstrings.
…#204) ## Summary Pluggable request-auth framework that handles both auth flows the system needs: - **Management.** Online check on every request. The default authorizer authenticates the credential and authorizes the operation; in production this is `HttpUpstreamAuthProvider` forwarding to a configurable upstream service. - **Runtime.** Two-phase exchange-then-verify. A target-bearing call presents a long-lived credential plus `(target_type, target_id)` to a token exchange endpoint; the server mints a short-lived HS256 JWT bound to that target. Subsequent runtime calls verify the JWT locally, with no upstream round-trip on the hot path. Both flows route through the same primitives (`Operation` vocabulary on endpoints, `Principal` returned, `RequestAuthorizer` Protocol installed); a per-operation registry lets a deployment point management ops at one provider and runtime ops at another. Migrates the `/control-bindings` endpoint family onto the framework and ships the runtime token exchange endpoint. The runtime resolution path itself (`/evaluation` etc.) is wired in a follow-up; its provider override (`LocalJwtVerifyProvider`) is already in place when the runtime secret is configured. ## Module layout ``` server/src/agent_control_server/auth_framework/ __init__.py # public API core.py # Operation, Principal, RequestAuthorizer, require_operation, registry config.py # configure_auth_from_env, RuntimeAuthConfig, set_runtime_auth_config runtime_token.py # HS256 mint / verify helpers, UpstreamGrantExpiredError providers/ __init__.py header.py # HeaderAuthProvider + DEFAULT_OPERATION_ACCESS http_upstream.py # HttpUpstreamAuthProvider (forward + parse grant) local_jwt.py # LocalJwtVerifyProvider (hot-path JWT verify) server/src/agent_control_server/endpoints/ auth.py # POST /api/v1/auth/runtime-token-exchange ``` `auth.py` (legacy local credential check) is unchanged; `HeaderAuthProvider` re-uses `_validate_api_key` from it. Non-binding routes still go through the legacy router-level gate; their migration happens in follow-up PRs. ## Operation vocabulary ```python class Operation(StrEnum): # Wired on endpoints in this PR. CONTROL_BINDINGS_READ = "control_bindings.read" CONTROL_BINDINGS_WRITE = "control_bindings.write" RUNTIME_TOKEN_EXCHANGE = "runtime.token_exchange" # Reserved; not yet wired on endpoints. CONTROLS_READ = "controls.read" CONTROLS_CREATE = "controls.create" CONTROLS_UPDATE = "controls.update" CONTROLS_DELETE = "controls.delete" RUNTIME_USE = "runtime.use" ``` ## Per-operation authorizer registry `set_authorizer(authorizer, operation=...)` overrides the default for one operation. Without `operation=`, it becomes the default for every operation that does not have a specific binding. Used to route management ops through one provider and `Operation.RUNTIME_USE` through `LocalJwtVerifyProvider`: ```python set_authorizer(HttpUpstreamAuthProvider(...)) # default set_authorizer(LocalJwtVerifyProvider(secret=...), # override operation=Operation.RUNTIME_USE) ``` `require_operation(op)` consults the override first, falls back to the default. The local-credential path (no override installed) routes everything to `HeaderAuthProvider`; the no-auth flow (`api_key_enabled=False`) is preserved end-to-end. `require_operation` accepts an optional `context_builder` so the endpoint can surface request-shaped context (path / query / body fields) to the authorizer. The body-bearing binding endpoints, the target-filtered list endpoint, and the runtime token exchange endpoint all forward `(target_type, target_id)` so an upstream that resolves the target's owning project has the identifiers it needs to make a project-level decision. ## Providers (three ship in-tree) **`HeaderAuthProvider`**: local-credential path, single namespace. - Maps each `Operation` to one of three access levels (`PUBLIC`, `AUTHENTICATED`, `ADMIN`); single source of truth in `DEFAULT_OPERATION_ACCESS`. - Reuses the existing local API-key + session-cookie credential check from `auth.py`, so behavior matches the previous `require_admin_key` path verbatim. - Returns a normalized `runtime.use` scope only for `Operation.RUNTIME_TOKEN_EXCHANGE`, so the exchange endpoint can uniformly require `runtime.use` in `principal.scopes` across every provider; there is no implicit fallback that could escalate an upstream-supplied empty scope grant. - The no-auth flow (`api_key_enabled=False`) is preserved: every operation succeeds with a non-admin `Principal`. Pinned by a regression test. - Always returns `DEFAULT_NAMESPACE_KEY`. The namespace header lookup branch is preserved but inert until non-binding write endpoints are threaded. **`HttpUpstreamAuthProvider`**: generic upstream-delegating provider. - Forwards caller credentials (`X-API-Key`, `Authorization`, `Cookie`) on a POST to a configurable URL with `{operation, context?}`. - Optional service-to-service token header for upstream trust. - Parses the upstream response into a `Principal`: `namespace_key`, `is_admin`, `caller_id`, plus optional grant fields (`target_type`, `target_id`, `scopes`, `expires_at`) so the runtime token exchange can mint from the same response. - Maps `200` to `Principal`; `401` / `403` / `404` to matching error; `5xx`, network errors, malformed payloads, naive (`tzinfo`-less) `expires_at`, and partial target grants (only one of `target_type` / `target_id`) all fail closed (502/503). **`LocalJwtVerifyProvider`**: hot-path runtime verifier. - Reads a Bearer token from `Authorization`, verifies signature against the runtime secret, checks `domain == "runtime"`, the issuer, expiry, and that the token's scope covers the requested `Operation`. - Returns a `Principal` with the bound `(namespace_key, target_type, target_id)` so runtime endpoints inherit the namespace and target binding without re-deriving them. - When the dependency surfaces `target_type` / `target_id` via `context_builder`, the provider also enforces that they match the token's binding; runtime endpoints get the request-target check for free. ## Runtime token shape HS256, dedicated secret (`AGENT_CONTROL_RUNTIME_TOKEN_SECRET`), issuer `agent-control/server`. Claims: | Claim | Purpose | |---|---| | `domain` | Pinned to `runtime`; tokens minted here MUST not be accepted on management endpoints. | | `namespace_key` | The namespace the token authorizes within. Required for mint and verify; preserved end-to-end so a token minted for one namespace cannot be used to resolve controls in another. | | `actor_id` | Caller identity surfaced from the upstream grant. | | `scopes` | Granted runtime capabilities (e.g., `["runtime.use"]`). The exchange endpoint refuses to mint when `principal.scopes` does not contain `runtime.use`, including the case where the upstream's grant explicitly lists an empty scope set. | | `target_type` / `target_id` | Bind the token to one target. | | `iat` / `exp` | Bounded lifetime. The local TTL is capped by the upstream grant's `expires_at` so the local token can never outlive its grant. | | `jti` | Random identifier; reserved for future revocation. | `mint_runtime_token` rejects an `upstream_expires_at` whose `tzinfo is None` or whose `utcoffset()` is `None` with `RuntimeTokenError` so a custom authorizer that supplies a naive datetime surfaces as a typed auth error rather than a raw `TypeError` deeper in the comparison. ## Runtime token exchange endpoint ``` POST /api/v1/auth/runtime-token-exchange { "target_type": "...", "target_id": "..." } ``` - Authenticated and authorized via `Operation.RUNTIME_TOKEN_EXCHANGE` through the default authorizer (typically `HttpUpstreamAuthProvider` in production). The authorizer's `context_builder` forwards the requested target to the upstream so it can authorize against the right resource. - Refuses with 503 when `AGENT_CONTROL_RUNTIME_TOKEN_SECRET` is not configured. - Mints a local token from `Principal.scopes` / `Principal.grant_expires_at`, capped by the configured TTL (default 300s). - When the provider's `Principal` carries a target binding, the endpoint verifies it matches the requested target before minting. - An upstream grant whose `expires_at` is already in the past surfaces as 502 (`UpstreamGrantExpiredError`), distinct from the 503 misconfigured-server path so the public status reflects which side the operator should investigate. Response: `{ token, expires_at, target_type, target_id, scopes }`. ## Storage namespace under the framework The migrated binding endpoints take the storage `namespace_key` from `get_namespace_key` (the same resolver the rest of the server uses), not from `principal.namespace_key`. The auth chain still runs through `require_operation` for authentication and authorization, but the row's namespace is sourced from the resolver so binding writes and runtime reads stay in lockstep until auth-derived namespace resolution lands across `/controls`, `/policies`, `/agents`, and `/evaluation` together. The principal's namespace is observed (and used by `LocalJwtVerifyProvider` for its own contract) but is not used to pick the row's storage namespace at this stage. ## Migrated endpoints All seven `/api/v1/control-bindings*` endpoints now use `Depends(require_operation(...))`: | Method | Path | Operation | Context forwarded | |---|---|---|---| | PUT | `/control-bindings` | `control_bindings.write` | body: `target_type`, `target_id` | | GET | `/control-bindings` | `control_bindings.read` | query: `target_type`, `target_id` (when present) | | GET | `/control-bindings/{binding_id}` | `control_bindings.read` | N/A (namespace-wide) | | PATCH | `/control-bindings/{binding_id}` | `control_bindings.write` | N/A (namespace-wide) | | DELETE | `/control-bindings/{binding_id}` | `control_bindings.write` | N/A (namespace-wide) | | PUT | `/control-bindings/by-key` | `control_bindings.write` | body: `target_type`, `target_id` | | POST | `/control-bindings/by-key:delete` | `control_bindings.write` | body: `target_type`, `target_id` | The four binding-id-based routes are documented as namespace-wide: their target identifiers are not available before the binding row is loaded, and `require_operation` is single-pass. Clients whose authorization model requires per-target permissions are steered to the natural-key endpoints and the target-filtered list, all of which forward the target to the authorizer. Two-phase auth on the by-id routes is a follow-up. New: `POST /api/v1/auth/runtime-token-exchange` (operation `runtime.token_exchange`). The framework-protected routers (`/control-bindings`, `/auth`) are mounted with the existing non-validating `get_api_key_from_header` Security extractor as a router-level dependency. `require_operation` still owns runtime authentication and authorization; the Security dependency exists purely so the generated OpenAPI spec advertises `X-API-Key` on these routes for downstream SDK generation. ## Generated client The TypeScript wrapper exposes both `auth` and `controlBindings` getters alongside the existing surface, so consumers using the public client can call `runtimeTokenExchange` and the binding API without reaching into the generated internals. ## Env vars | Var | Default | Purpose | |---|---|---| | `AGENT_CONTROL_AUTH_MODE` | `header` | Default authorizer: `header` or `http_upstream`. | | `AGENT_CONTROL_AUTH_UPSTREAM_URL` | none | Required when mode is `http_upstream`. | | `AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS` | `5.0` | Per-request timeout. | | `AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN` | none | Optional upstream service token. | | `AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER` | `X-Agent-Control-Service-Token` | Header name for the service token. | | `AGENT_CONTROL_RUNTIME_TOKEN_SECRET` | none | Required to enable runtime auth + the exchange endpoint. Validated at startup; rejected if shorter than 32 bytes. | | `AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS` | `300` | Local token TTL ceiling (capped further by the upstream grant). Validated at startup. | `configure_auth_from_env` parses both runtime fields once at startup into a frozen `RuntimeAuthConfig`. The exchange endpoint and `LocalJwtVerifyProvider` read the same object, so the mint and verify sides cannot drift apart on a process. When the runtime secret is absent, `RUNTIME_USE` falls through to the default authorizer; this is logged at WARNING so an operator can immediately see what trust model is in effect. `RUNTIME_USE` is reserved and not wired to `/evaluation` in this PR, so this fallback does not affect the runtime hot path yet. The follow-up that wires runtime endpoints should explicitly choose legacy fallback or fail-closed JWT-only behavior. ## Out of scope (follow-ups) - Migrate `/controls` CRUD onto `require_operation` using the reserved `CONTROLS_*` operations. - Wire `Operation.RUNTIME_USE` on the runtime resolution path (`/evaluation`, etc.) and the SDK side of the runtime exchange. The provider override is already in place when the runtime secret is configured. - Migrate `/agents/initAgent` onto `require_operation`. The `HttpUpstreamAuthProvider`'s `context_builder` should forward the request's `target_type` / `target_id` to the upstream so the upstream can authorize against the requested resource. - Auth-derived `get_namespace_key` so the binding endpoints can use the principal's namespace for storage along with the rest of the server. - Two-phase auth for the four binding-id-based routes (GET/PATCH/DELETE `/control-bindings/{binding_id}`) so they can forward target context to the upstream. - Drop `auth.py`'s `require_admin_key` once every management endpoint is migrated. ## Stacking Stacked on **PR #203** (`abhi/data-model-v1`); rebased onto its current head `8adc328` so the merged effective-controls contract, namespace-threaded agent endpoints, and savepoint-protected binding writes are the base this PR builds on. Will rebase onto `main` once #203 merges. ## Test plan - [x] 55 framework + endpoint tests covering: - Default coverage: every `Operation` member has a default access mapping (regression guard). - `HeaderAuthProvider`: PUBLIC bypass, AUTHENTICATED + ADMIN paths route to the legacy validator with the right `require_admin` flag, no-auth mode passes admin operations, namespace-header lookup currently inert, unknown operation raises, normalized `runtime.use` scope returned for `RUNTIME_TOKEN_EXCHANGE`. - `HttpUpstreamAuthProvider`: 200 happy path with realistic JSON wire shapes (ISO datetime + JSON array scopes round-trip), service token forwarding, 401/403/404 mapping, 5xx fail-closed, network-error fail-closed, strict-grant rejection on wrong-typed `is_admin` / malformed `scopes` / bad `expires_at` / non-string target fields, partial target grant rejected, naive `expires_at` rejected. - `require_operation` factory: routes through the installed authorizer, per-operation overrides take precedence, clearing an override falls back to the default, `get_authorizer` raises when nothing is set. - Lifecycle: reconfiguring without the runtime secret drops the previous `LocalJwtVerifyProvider` override; teardown clears every authorizer; secret shorter than 32 bytes raises at startup; invalid TTL raises at startup. - Runtime token mint / verify: round-trip, wrong-secret rejection, expiry rejection, TTL capped by upstream grant, management-domain token refused on runtime verify, missing-namespace rejection, already-expired upstream grant raises `UpstreamGrantExpiredError`, naive `upstream_expires_at` raises `RuntimeTokenError`. - `LocalJwtVerifyProvider`: target-bound `Principal`, namespace carried from token, missing token returns 401, wrong scope returns 403, non-Bearer header returns 401, target-context match enforcement (mismatch on type or id returns 403). - Exchange endpoint: 503 without secret, mint when configured, target mismatch rejected (400), missing target rejected (422), grant-without-runtime-use rejected (no privilege escalation), explicit empty-scope grant rejected (no fallback escalation), target context forwarded to authorizer, non-default namespace propagates into the token, full exchange-then-verify round trip, already-expired upstream grant surfaces as 502 distinct from the 503 misconfigured-server path. - [x] Full server suite: 676 passed. - [x] `make lint` clean. - [x] `make typecheck` clean. - [x] `make sdk-ts-generate-check` clean. - [x] TypeScript SDK regenerated alongside the new endpoint (`auth-runtime-token-exchange`, request/response models, `Auth` and `ControlBindings` groups exposed via the public client).
Summary
Adds the namespace-scoping data model and a single merged
effective-controls contract that
initAgent,GET /agents/{name}/controls, andPOST /evaluationall share.namespace_key VARCHAR(255) NOT NULL DEFAULT 'default'onagents,controls,policies,agent_controls,agent_policies,policy_controls,control_bindings.uniqueness; association-table foreign keys are composite
same-namespace foreign keys (Postgres-enforced).
control_bindingstable for attaching controls to opaqueexternal targets. One binding shape: each row attaches one
control to one target inside a namespace, uniqueness on
(namespace_key, target_type, target_id, control_id). Theenabledflag is a soft toggle: disabled bindings are preserved but excluded
from the effective set.
ControlService.list_controls_for_agent(and its runtime cousin) returns the de-duplicated union of the
agent's direct controls, policy-derived controls, and (when target
context is supplied) controls attached to that target via enabled
bindings in the same namespace.
initAgent,GET /agents/{name}/controls?target_type=...&target_id=..., andPOST /evaluationall call into this resolver and return the sameset for the same inputs.
/control-bindings): full CRUD plus idempotentnatural-key upsert/delete (
PUT /control-bindings/by-key,POST /control-bindings/by-key:delete). Cursor-based pagination onlist with an opaque string cursor that round-trips cleanly to
clients. Natural key is
(target_type, target_id, control_id).initAgentaccepts optional top-leveltarget_type/target_id.Bindings can pre-exist the agent row, so a newly created agent
registering with target context picks up pre-existing bindings on
its first response (no second round-trip).
init(target_type=..., target_id=...)stores it onstateand forwards it on theregistration call and on every subsequent
/agents/{name}/controlspoll. The single existing policy refresh loop carries the merged
set; there is no separate target-controls cache or refresh worker.
point (
evaluate_controls).check_evaluationandcheck_evaluation_with_localaccept their own client and controlsand run only the both-or-neither validation, so callers using those
helpers are not implicitly bound to a previous
init().Namespace scoping
Every effective-controls query filters every joined table on
namespace_keyexplicitly. Composite FKs prevent cross-namespacewrites; explicit query scoping prevents reads from spanning namespaces
in the presence of namespace-collision attacks or compromised callers.
Both layers are required.
_get_agent_or_404requiresnamespace_key; an agent that existsonly in another namespace surfaces as 404 (non-disclosing). Every
agent endpoint that resolves an agent threads the namespace through:
initAgent,GET /agents/{name},GET /agents,GET /agents/{name}/evaluators,GET /agents/{name}/evaluators/{evaluator_name},GET /agents/{name}/controls,PATCH /agents/{name}./agents/{name}/policies*routes (add / set / list / get /remove / remove-all / delete) and the corresponding
agent_policiesreads, writes, and deletes.
/agents/{name}/controls/{control_id}routes (add / remove) andthe corresponding
agent_controlsreads, writes, and deletes.redirect through a foreign-namespace agent of the same name.
ControlService.get_active_control_or_404,list_controls_for_policy,add_control_to_agent, andremove_control_from_agentacceptnamespace_keyso the service layer is no longer namespace-blind onthe migrated paths. Policy lookups in the agent association routes
also filter by namespace.
The initial release ships namespace plumbing at the schema level.
Endpoints route through a single
get_namespace_keydependency thatalways returns the default namespace; overriding it is not supported
yet because
/controlsand/policieswrite endpoints still writeunder the default namespace, and an override here would create rows
the existing endpoints cannot find. The initial release honors
get_namespace_keyon the effective-control paths and the migratedagent association paths; full
/controlsand/policiesnamespace-aware writes are follow-up work.
Single-namespace deployments are preserved by the
'default'serverdefault. Plain
ix_agents_name,ix_policies_name, andix_controls_name(partial ondeleted_at IS NULL) indexes preservename-only lookup performance during the rollout window.
The migration is reversible.
downgrade()aborts with a clear errorif cross-namespace duplicate names exist on
agents,policies, orlive
controls, since restoring global single-column uniquenesswould conflict. Soft-deleted control duplicates do not block
downgrade.
Initial-release contract
The sessionful SDK path supports one active
init()context. Callersthat need multiple agents or targets in one process should use the
lower-level helpers, or separate sessions once multi-session support
exists.
Notes on
control_bindingsON DELETE CASCADEon the parent control fires only on harddeletes. Soft-deleted controls (
deleted_at IS NOT NULL) keeptheir bindings; the resolver excludes soft-deleted controls.
delete_controlrejects with 409 when the control has activepolicy associations, direct agent associations, or active target
bindings unless
force=true, in which case all three classes ofattachment are detached as part of the soft-delete lifecycle.
updated_atrefreshes on every UPDATE via SQLAlchemyonupdate.(namespace_key, control_id)index covers the cascade path andlist_bindings(control_id=...)filtering.idx_controls_namespace_name_activeis recognized as aname-conflict constraint, so concurrent duplicate-name races
surface as 409, not 500.
ControlBindingsService.create_bindingandupsert_by_natural_keywrap their inserts in
begin_nested()so a unique-constraintcollision rolls back the SAVEPOINT only: the surrounding
transaction is intact, and a caller that composed the service
after another flush does not lose its prior writes.
agent-scoped evaluators (
agent_name:evaluator_name). Bindingshave no specific agent to validate the reference against, so
accepting them would surface as a runtime evaluation failure on
the first call rather than a clear 400 at attach time. New error
code
CONTROL_BINDING_INCOMPATIBLE.intentionally out of scope at this stage. Two forward paths are
documented in code (migration comment plus
ControlBindingdocstring):
agent_namecolumn with a partial-index pairand an
enabled-aware most-specific-wins resolver; supportsboth per-agent additions and per-agent exemptions.
agent_controlstable at runtime; supports per-agent additionsonly, since
agent_controlshas noenabledflag.Generated client
The TypeScript wrapper exposes the new
controlBindingsgetteralongside the existing
agents,controls,evaluation,evaluators,observability,policies, andsystemgetters, soconsumers using the public client can manage bindings without
reaching into the generated internals.
Out of scope (follow-up PRs)
get_namespace_keythrough/controlsand/policieswrite endpoints / services.
get_namespace_keyresolution.control_versionsandcontrol_execution_events.Test plan
control_bindingstable created; downgrade restores originals;upgrade/downgrade round-trip; downgrade rejects cross-namespace
duplicates on agents, policies, and live controls; allows
soft-deleted duplicates.
within a namespace rejected; soft-deleted control names reusable
within a namespace; cross-namespace foreign keys rejected on every
association table.
control_bindingstable: same control bindable to differenttargets; duplicate
(namespace, target_type, target_id, control_id)rejected; cross-namespace
control_idrejected;ON DELETE CASCADEon hard delete; bindings survive parent softdelete.
controls; de-duplication when a control is attached through both
paths; disabled binding excluded; soft-deleted controls excluded;
namespace isolation; absence of target context omits bindings.
initAgent: target params merge into the returned controls;newly created agent with target context picks up pre-existing
bindings; partial target pair rejected (422).
GET /agents/{name}/controls?target_type=...&target_id=...returns the same merged set as
initAgent; partial target pairrejected (400); cross-namespace agent surfaces as 404.
surface as 404, mirroring the effective-controls path.
/evaluation: target context flows through the same mergedresolver (target + agent + policy); 404 when the agent is not
registered; partial target pair rejected (422).
string cursor round-trips) / patch / delete; non-admin write
rejection; natural-key upsert (idempotent, updates
enabledandupdated_at, handles concurrent insert race via SAVEPOINT);natural-key delete (idempotent); duplicate create surfaces as 409
with the surrounding transaction intact.
evaluators with 400
CONTROL_BINDING_INCOMPATIBLE.delete_controllifecycle: refuses with 409 when policy /agent / binding attachments exist and
force=false; detaches allthree on
force=true.partial-unique index names trigger
CONTROL_NAME_CONFLICT(409)instead of 500.
init(target_type=..., target_id=...)stores sessiontarget and forwards it on registration and on every refresh;
partial target pair rejected; per-call target overrides default
from state and reject mismatches with the session target on the
session-bound
evaluate_controls; standalonecheck_evaluation/check_evaluation_with_localare not session-bound and run onlythe both-or-neither check.
make lintclean.make typecheckclean.make sdk-ts-generate-checkclean.