feat(server): namespace scoping and control bindings by abhinav-galileo · Pull Request #203 · agentcontrol/agent-control

abhinav-galileo · 2026-04-27T20:49:03Z

Summary

Adds the namespace-scoping data model and a single merged
effective-controls contract that initAgent,
GET /agents/{name}/controls, and POST /evaluation all share.

namespace_key VARCHAR(255) NOT NULL DEFAULT 'default' on agents,
controls, policies, agent_controls, agent_policies,
policy_controls, control_bindings.
Single-column uniqueness replaced by namespace-scoped composite
uniqueness; association-table foreign keys are composite
same-namespace foreign keys (Postgres-enforced).
New control_bindings table for attaching controls to opaque
external targets. One binding shape: each row attaches one
control to one target inside a namespace, uniqueness on
(namespace_key, target_type, target_id, control_id). The enabled
flag is a soft toggle: disabled bindings are preserved but excluded
from the effective set.
Single merged resolver. ControlService.list_controls_for_agent
(and its runtime cousin) returns the de-duplicated union of the
agent's direct controls, policy-derived controls, and (when target
context is supplied) controls attached to that target via enabled
bindings in the same namespace. initAgent,
GET /agents/{name}/controls?target_type=...&target_id=..., and
POST /evaluation all call into this resolver and return the same
set for the same inputs.
Management API (/control-bindings): full CRUD plus idempotent
natural-key upsert/delete (PUT /control-bindings/by-key,
POST /control-bindings/by-key:delete). Cursor-based pagination on
list with an opaque string cursor that round-trips cleanly to
clients. Natural key is (target_type, target_id, control_id).
initAgent accepts optional top-level target_type / target_id.
Bindings can pre-exist the agent row, so a newly created agent
registering with target context picks up pre-existing bindings on
its first response (no second round-trip).
Python SDK target context is fixed per session: init(target_type=..., target_id=...) stores it on state and forwards it on the
registration call and on every subsequent /agents/{name}/controls
poll. The single existing policy refresh loop carries the merged
set; there is no separate target-controls cache or refresh worker.
Session-target enforcement lives only on the session-bound entry
point (evaluate_controls). check_evaluation and
check_evaluation_with_local accept their own client and controls
and run only the both-or-neither validation, so callers using those
helpers are not implicitly bound to a previous init().

Namespace scoping

Every effective-controls query filters every joined table on
namespace_key explicitly. Composite FKs prevent cross-namespace
writes; explicit query scoping prevents reads from spanning namespaces
in the presence of namespace-collision attacks or compromised callers.
Both layers are required.

_get_agent_or_404 requires namespace_key; an agent that exists
only in another namespace surfaces as 404 (non-disclosing). Every
agent endpoint that resolves an agent threads the namespace through:

initAgent, GET /agents/{name}, GET /agents,
GET /agents/{name}/evaluators,
GET /agents/{name}/evaluators/{evaluator_name},
GET /agents/{name}/controls, PATCH /agents/{name}.
All /agents/{name}/policies* routes (add / set / list / get /
remove / remove-all / delete) and the corresponding agent_policies
reads, writes, and deletes.
All /agents/{name}/controls/{control_id} routes (add / remove) and
the corresponding agent_controls reads, writes, and deletes.
The list-cursor lookup is namespace-scoped so pagination cannot
redirect through a foreign-namespace agent of the same name.

ControlService.get_active_control_or_404, list_controls_for_policy,
add_control_to_agent, and remove_control_from_agent accept
namespace_key so the service layer is no longer namespace-blind on
the migrated paths. Policy lookups in the agent association routes
also filter by namespace.

The initial release ships namespace plumbing at the schema level.
Endpoints route through a single get_namespace_key dependency that
always returns the default namespace; overriding it is not supported
yet because /controls and /policies write endpoints still write
under the default namespace, and an override here would create rows
the existing endpoints cannot find. The initial release honors
get_namespace_key on the effective-control paths and the migrated
agent association paths; full /controls and /policies
namespace-aware writes are follow-up work.

Single-namespace deployments are preserved by the 'default' server
default. Plain ix_agents_name, ix_policies_name, and
ix_controls_name (partial on deleted_at IS NULL) indexes preserve
name-only lookup performance during the rollout window.

The migration is reversible. downgrade() aborts with a clear error
if cross-namespace duplicate names exist on agents, policies, or
live controls, since restoring global single-column uniqueness
would conflict. Soft-deleted control duplicates do not block
downgrade.

Initial-release contract

SDK init target is fixed per session.
initAgent and GET /agents/{name}/controls return the effective controls for that session context.
Runtime /evaluation uses the same merged resolution.
Dynamic per-request target switching, inheritance, DAGs, and target-agent overrides remain out of scope.

The sessionful SDK path supports one active init() context. Callers
that need multiple agents or targets in one process should use the
lower-level helpers, or separate sessions once multi-session support
exists.

Notes on `control_bindings`

ON DELETE CASCADE on the parent control fires only on hard
deletes. Soft-deleted controls (deleted_at IS NOT NULL) keep
their bindings; the resolver excludes soft-deleted controls.
delete_control rejects with 409 when the control has active
policy associations, direct agent associations, or active target
bindings unless force=true, in which case all three classes of
attachment are detached as part of the soft-delete lifecycle.
updated_at refreshes on every UPDATE via SQLAlchemy onupdate.
A (namespace_key, control_id) index covers the cascade path and
list_bindings(control_id=...) filtering.
idx_controls_namespace_name_active is recognized as a
name-conflict constraint, so concurrent duplicate-name races
surface as 409, not 500.
Concurrent natural-key writes are safe. Both
ControlBindingsService.create_binding and upsert_by_natural_key
wrap their inserts in begin_nested() so a unique-constraint
collision rolls back the SAVEPOINT only: the surrounding
transaction is intact, and a caller that composed the service
after another flush does not lose its prior writes.
Target bindings reject controls whose condition tree references
agent-scoped evaluators (agent_name:evaluator_name). Bindings
have no specific agent to validate the reference against, so
accepting them would surface as a runtime evaluation failure on
the first call rather than a clear 400 at attach time. New error
code CONTROL_BINDING_INCOMPATIBLE.
Per-agent overrides and exemptions within a target are
intentionally out of scope at this stage. Two forward paths are
documented in code (migration comment plus ControlBinding
docstring):
- re-introduce an agent_name column with a partial-index pair
  and an enabled-aware most-specific-wins resolver; supports
  both per-agent additions and per-agent exemptions.
- or merge target-bearing resolution with the existing
  agent_controls table at runtime; supports per-agent additions
  only, since agent_controls has no enabled flag.

Generated client

The TypeScript wrapper exposes the new controlBindings getter
alongside the existing agents, controls, evaluation,
evaluators, observability, policies, and system getters, so
consumers using the public client can manage bindings without
reaching into the generated internals.

Out of scope (follow-up PRs)

Threading get_namespace_key through /controls and /policies
write endpoints / services.
Auth-derived get_namespace_key resolution.
Namespace scoping for control_versions and
control_execution_events.
Per-agent overrides and exemptions within a target.

Test plan

Adds a namespace_key column to agents, controls, policies, and the three association tables. Replaces single-column uniqueness with namespace-scoped composite uniqueness, and converts association-table foreign keys to composite same-namespace foreign keys. Adds a control_bindings table for attaching controls to opaque external targets, with an optional agent_name selector for narrower overrides inside a target. Two binding shapes are supported via partial unique indexes: target-default (agent_name IS NULL) and target-agent. OSS and single-namespace deployments are preserved by the 'default' server default on every namespace_key column. Existing endpoint and service code is unchanged; default-namespace behavior is fully backward compatible.

codecov · 2026-04-27T20:52:31Z

Codecov Report

❌ Patch coverage is 94.14634% with 24 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../agent_control_server/services/control_bindings.py	93.38%	9 Missing ⚠️
sdks/python/src/agent_control/agents.py	50.00%	7 Missing ⚠️
models/src/agent_control_models/evaluation.py	70.00%	3 Missing ⚠️
models/src/agent_control_models/server.py	94.33%	3 Missing ⚠️
...agent_control_server/endpoints/control_bindings.py	96.87%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

… indexes - control_bindings.id: change migration column type from BigInteger to Integer to match the ORM model and the convention of every other id column in the schema. - control_bindings.agent_name: change column type from Text to String(255) and add a check constraint requiring NULL or the same format/length as agents.name. Bindings may still predate agent registration; callers must normalize before insert. - Add plain natural-key indexes ix_agents_name and ix_policies_name to preserve name-only lookup performance while service code is still namespace-blind. The new composite primary keys and unique constraints lead with namespace_key, so name-only queries no longer have a leading-column index without these. - Document that soft deletes on a control do not cascade to bindings (the runtime resolver excludes soft-deleted controls). - Add tests for the soft-delete survival path and malformed-agent_name rejection.

Adds ControlBindingsService.resolve_effective_controls, which returns the active control set for a target-bearing request. Two binding shapes are considered: target-default (agent_name IS NULL) and target-agent. For each control_id, the most-specific binding wins (target-agent beats target-default); a winning binding with enabled=False excludes the control. Soft-deleted controls are filtered out.

Adds CRUD endpoints under /control-bindings backed by ControlBindingsService. - PUT /control-bindings create binding (admin) - GET /control-bindings list with optional target/agent/control filters - GET /control-bindings/{id} single-binding detail - PATCH /control-bindings/{id} toggle enabled (admin) - DELETE /control-bindings/{id} delete binding (admin) Adds CONTROL_BINDING_NOT_FOUND and CONTROL_BINDING_CONFLICT error codes, the matching request/response Pydantic types, and a get_namespace_key dependency that returns the default namespace and is overridable for deployment-specific namespace resolution. Service create/update/delete enforce same-namespace integrity by checking the parent control belongs to the request's namespace; uniqueness violations are translated into 409 conflicts.

EvaluationRequest gains optional target_type/target_id fields. When both are supplied the evaluation endpoint resolves the effective control set from control_bindings (no agents row required); otherwise it uses the existing agent-attached path. The two paths do not silently merge. Adds ControlBindingsService.resolve_runtime_controls and a shared parse_runtime_controls helper to avoid duplicating the Control to RuntimeControl conversion across services.

Adds idempotent attach/detach endpoints addressed by the natural key (target_type, target_id, agent_name?, control_id): - PUT /control-bindings/by-key upsert (creates or updates enabled) - POST /control-bindings/by-key:delete delete (returns deleted=False if missing) Useful for callers that want to attach a control without first checking whether a binding already exists. Backed by ControlBindingsService.upsert_by_natural_key and delete_by_natural_key.

Adds optional target_type/target_id parameters to evaluate_controls, check_evaluation, and check_evaluation_with_local. When supplied, both fields are included on the EvaluationRequest sent to the server, which routes the request through the target-bearing resolution path. Both fields must be supplied together; the server enforces this via the EvaluationRequest model validator.

- SDK target-bearing requests now bypass cached agent-attached controls and call the server unconditionally. The cached controls (from initAgent) are agent-attachment data; target-bearing requests must resolve from control_bindings only, which the server enforces, but the SDK was previously short-circuiting against the cache when no applicable server controls were present. - agent_name on control-binding requests is normalized and validated at the API boundary using the same rules as agents.name. Mixed-case or whitespace-padded values are accepted and normalized; values that fail the format/length rules are rejected with 422 instead of leaking to the database check constraint as 500/conflict. - ControlBinding.updated_at is refreshed on UPDATE via SQLAlchemy onupdate. PATCH /control-bindings/{id} and idempotent natural-key upserts now reflect the updated timestamp on subsequent reads.

Each binding row now attaches one control to one target inside a namespace. Per-agent overrides and exemptions within a target are out of scope at this stage; both the migration and the ControlBinding model docstring document the two forward paths if and when those become a product requirement (re-add agent_name with a partial-index pair, or merge target-bearing resolution with agent_controls). Net simplifications: - One unique constraint instead of a partial-index pair on (agent_name IS NULL / IS NOT NULL). - No agent-name CHECK constraint or normalize_optional_agent_name validator on binding requests. - Resolver returns the target-level control set directly; no most- specific-wins logic, no winners dict. - Pydantic request/response models lose the agent_name field; the list endpoint loses the agent_name query parameter. - SDK target-bearing path is unchanged (it never carried agent_name on bindings). Schema/code/test/doc all stay aligned. Migration round-trip verified locally; full server suite passes (601 tests), lint clean, typecheck clean.

…Request The class docstring still described the upsert natural key as (target_type, target_id, agent_name, control_id). Update to match the V1 shape: (target_type, target_id, control_id).

…oints Adds the auto-generated bindings for the new /control-bindings surface (create, list, get, patch, delete, upsert-by-key, delete-by-key) and refreshes the evaluation models/sdk to include the optional target_type / target_id fields. Also adds the method-name overrides under sdks/typescript/overlays/method-names.overlay.yaml.

lan17

Thanks for the work here. The overall direction makes sense, but I found a couple issues that should be fixed before merge.

One P1 could not be placed inline because it points at an unchanged file:

[P1] Control deletion ignores target bindings (server/src/agent_control_server/endpoints/controls.py:921-925)

delete_control only checks policy and agent associations before soft-deleting a control. A control that is actively attached through control_bindings can be deleted with force=false, which disables protection for that target while leaving the binding row behind. The lifecycle/in-use checks should include target bindings, and force=true should have explicit binding semantics.

The rest of the findings are inline.

Adds GET /api/v1/control-bindings/effective which returns the effective control set for a target in the same shape as InitAgentResponse.controls. The Python SDK now fetches this list when a request is target-bearing and runs it through the existing local-vs-server execution split, so controls with execution='sdk' bound to a target run locally instead of being silently dropped on the server side. Adds the EffectiveTargetControlsResponse model, regenerates the TypeScript client to include the new endpoint, and adds endpoint-level tests covering the bound-controls, disabled-binding, and empty-result cases plus an SDK test that exercises the local-eval path end-to-end. The previous _evaluate_target_bearing short-circuit is gone; target- bearing requests now go through check_evaluation_with_local with the target-bound controls in place of the cached agent set.

delete_control previously only inspected agent and policy associations, so a control attached via control_bindings could be soft-deleted with force=false. The binding row would remain pointing at a deleted control, silently disabling protection on the target. The lifecycle check now also lists target bindings: force=false rejects deletion with CONTROL_IN_USE listing the binding IDs, and force=true removes the bindings before soft-deleting the control. The detached binding IDs are surfaced in the response under detached_target_bindings.

upsert_by_natural_key previously did SELECT then INSERT, so two concurrent calls for the same (namespace_key, target_type, target_id, control_id) could both miss the existing row, then one would hit the unique-constraint violation and surface as an unhandled IntegrityError (500) even though the endpoint is documented as idempotent. The loser of the race now catches IntegrityError, rolls back its insert, re-reads the winning row, and applies its requested enabled value as an update. Both calls return successfully; the create flag is true only for the caller whose insert actually wrote the row.

list_bindings previously returned every binding in the namespace, which grows linearly with attached targets. Switched to cursor-based pagination matching the existing list_controls_page idiom: cursor + limit query params, results ordered by ID descending (newest first), pagination metadata returned alongside the binding list. ListControlBindingsResponse now carries a PaginationInfo block (limit, total, next_cursor, has_more). Default limit is 20, max 100, mirroring the controls list endpoint. The TypeScript client is regenerated to include the new query params.

The * marker before target_type/target_id also made the previously positional-or-keyword arguments trace_id, span_id, and event_agent_name keyword-only. External callers passing them positionally would break with TypeError. Move * to immediately before target_type so the new fields are keyword-only while existing arguments retain their positional contract.

…gle query Add (namespace_key, control_id) index used by list_bindings(control_id=...) and the cascade path from controls. Collapse resolve_effective_controls into a single JOIN against control_bindings instead of two round trips.

Match the existing String(255) convention used for agent_name and control/policy names. Bounds index key size and prevents pathological long values across all namespace-scoped tables.

The downgrade refuses to run when cross-namespace duplicates exist on agents, policies, or live controls; only the agents path was tested. Add policies and live-controls cases plus a soft-deleted positive case that verifies the deleted_at filter still allows downgrade.

Spell out that integrators can declare any FastAPI-resolvable dependency in their override signature, and include a JWT-claim example so the extension point is concrete instead of abstract.

Two related gaps left by the namespace migration: - Add idx_controls_namespace_name_active to the duplicate-name conflict set so concurrent name collisions surface as 409 instead of 500. Parametrize the IntegrityError tests across both index names. - Restore a non-unique partial index ix_controls_name on controls(name) WHERE deleted_at IS NULL, mirroring the natural-key indexes added for agents and policies. Existing service code still does name-only Control lookups; without this index those go to a sequential scan post-migration.

Endpoints used a Depends(get_namespace_key) seam, but only the binding and evaluation endpoints honored it; controls/agents/policies still wrote and queried under the DB default. Overriding the seam in a deployment broke binding creation with CONTROL_NOT_FOUND. Drop the seam for V1: endpoints resolve to DEFAULT_NAMESPACE_KEY directly. Schema and services remain namespace-scoped so a future change can thread a single resolver through every write path together.

f90f8f0 dropped the seam, but a single resolver function on the read side is still useful: every namespace-scoped endpoint funnels through one call site, so a future change can switch every reader to a real per-request resolver in one place. V1 returns DEFAULT_NAMESPACE_KEY unconditionally and documents that overriding is unsupported until controls/agents/policies endpoints are threaded.

Mirrors the agent-bound flow: a per-target LRU cache populated lazily on first evaluation, kept fresh by a daemon thread that refetches each cached entry on a fixed interval. Public API parallels the agent-bound side: refresh_target_controls / refresh_target_controls_async for explicit refresh, invalidate_target_controls_cache for explicit drop. init() takes target_controls_refresh_interval_seconds (default 60s, 0 disables); shutdown stops the loop alongside the policy refresh loop.

A process-wide LRU cache keyed only by (target_type, target_id) cannot distinguish entries written under different SDK sessions. Re-initing against a different server or API key would have served controls from the previous identity until the entry was evicted or overwritten, including races where an in-flight refresh lands after init() proceeds. Tie cache lifetime to the session boundary: init() and shutdown call cache.reset(), which both clears every entry and advances an internal epoch token. Writers capture the epoch before fetching and pass it to put(); writes whose epoch no longer matches are rejected silently. Both write paths (refresh worker and lazy fetch) are covered.

check_evaluation_with_local accepts an arbitrary AgentControlClient. Reusing the global cache for a client whose base_url or api_key does not match the active SDK session would let controls fetched against one server serve evaluations against another. Skip the cache entirely in that case and fetch live; only the init()-managed session populates or reads the shared cache.

Patch coverage on the previous push fell below the repo target. The gap was concentrated in two places, both genuinely useful to test: - SDK target-controls polling and refresh APIs: - invalidate_target_controls_cache (single-key, both args, no args) - refresh_target_controls_async (empty cache, multi-key fetch, per-target failure isolation, stale-write rejection after reset) - refresh_target_controls (sync wrapper from sync and async contexts) - _start_target_controls_refresh_loop / _stop_target_controls_refresh_loop (round trip, zero-interval no-op) - _target_controls_refresh_worker (cache-empty short-circuit, refresh-known-keys integration, session-unset skip, exception isolation) - ControlBindingsService.upsert_by_natural_key IntegrityError branch: the existing test exercised the SELECT-then-UPDATE fast path; the new test simulates the race where both transactions miss on SELECT, one INSERT trips the unique index, and the loser must roll back, re-fetch the winner, and apply the requested enabled value.

lan17 · 2026-04-29T00:35:50Z

I think there is still a broader contract issue here: both initAgent and GET /agents/{agent}/controls should return all controls assigned to the agent by any means, not just direct agent links and policy-derived controls. With the new target-binding mechanism, those two agent control surfaces should include controls that are effectively assigned through target bindings as well, otherwise the SDK/API can report an incomplete control set for the agent while target-bearing evaluation sees a different set.

initAgent, GET /agents/{name}/controls, and POST /evaluation now resolve the same de-duplicated effective set: direct attachments, policy-derived controls, and (when target context is supplied) controls bound to that target via enabled bindings in the same namespace. Server - Add optional target_type/target_id to InitAgentRequest with paired-target validation. initAgent merges target bindings into its returned controls and no longer short-circuits to an empty list on agent creation, so a newly registered agent picks up pre-existing bindings. - list_agent_controls accepts target_type/target_id; the GET endpoint enforces the paired-target rule and threads namespace_key through. - ControlService.list_controls_for_agent / list_runtime_controls_for_agent / _list_db_controls_for_agent now require namespace_key and accept optional target params; every joined table (agent_controls, agent_policies, policy_controls, ControlBinding, Control) is filtered on namespace_key. - /evaluation collapses to one resolver: ControlService.list_runtime_controls_for_agent with namespace + target. The agent row is required on every request. - Drop GET /control-bindings/effective and the ControlBindingsService.resolve_effective_controls / resolve_runtime_controls helpers; the merged ControlService path is authoritative. SDK (Python) - init() takes target_type/target_id; both must be supplied together. The values flow into state and ride on the registration call and on every subsequent /agents/{name}/controls poll. - Drop the per-target controls cache, refresh worker, refresh API, and invalidate API; one polling loop and one publish path remain. - check_evaluation, check_evaluation_with_local, and evaluate_controls default target_type/target_id from state when omitted. - _reset_state and shutdown clear target context. - agents.register_agent / list_agent_controls forward target params. Drops EffectiveTargetControlsResponse from the model exports and regenerates the TypeScript SDK against the new spec.

…rget The session control cache (state.server_controls) is fetched for the target context fixed at init() time. A per-call target override that disagrees with the session target would drive local-first evaluation against the wrong cached set and could return safe without contacting the server. The V1 contract is one target per SDK session. Add a shared _resolve_session_target helper that defaults missing per-call targets from state and rejects mismatches with a clear ValueError pointing callers at re-init. Apply at all three entry points (check_evaluation, check_evaluation_with_local, evaluate_controls) so the contract is uniform across the public API. Also update EvaluationRequest's target_type/target_id descriptions: the server now merges target bindings into the agent + policy effective set rather than resolving from bindings alone. The TypeScript SDK regenerates the new wording.

A session initialized via init() without target context still has a control cache fetched for that no-target context. The previous mismatch check only fired when state.target_type was set, so a caller could pass target_type/target_id per call on a no-target session and have those values accepted - then evaluate against the wrong cache, potentially returning safe without contacting the server. Treat (None, None) as a valid session target. Use state.current_agent as the active-session sentinel so the rule applies inside an init()'d session (any per-call target must equal the session target) but is skipped outside one (lower-level direct-client flows still work). Add a regression test covering the no-target session path and update the existing test_per_call_target_must_match_session_target to model an active session by patching state.current_agent.

Runnable example showing the V1 contract end-to-end: - init(target_type='env', target_id='prod') returns the merged effective set (agent's direct attachments + bindings for the supplied target). - @control() decorator runs against that merged set automatically. - evaluate_controls(...) defaults its target context from the session. - A per-call target that disagrees with the session target is rejected with a clear ValueError. setup_controls.py provisions the agent, two controls, attaches one directly, and binds the other to (env, prod) via the natural-key upsert endpoint (idempotent on re-run). demo_agent.py walks through the four phases and prints the expected outcome at each step. Indexed in examples/README.md alongside the other framework demos.

…-binding evaluator gate Three review issues against the merged-resolver contract: 1. Cross-namespace agent lookup: GET /agents/{name}/controls now passes namespace_key into _get_agent_or_404. An agent that exists only in another namespace surfaces as 404 instead of returning a 200 with the wrong/empty effective set. The lookup is opt-in at the helper so other call sites that don't yet thread namespace through stay unchanged. 2. List-binding cursor type: server emits next_cursor as a string, so the GET /control-bindings cursor parameter accepts a string and parses it to int internally. Round-trip with PaginationInfo.next_cursor now works end-to-end through the generated TypeScript SDK; previously the int typing on cursor failed client-side validation when fed back from pagination.next_cursor. 3. Agent-scoped evaluators on target bindings: ControlBindingsService now rejects controls whose condition tree references agent-scoped evaluators (agent_name:evaluator) at binding creation time. Target bindings have no specific agent to validate against, so a binding can apply a control to any agent that later evaluates against the target; accepting agent-scoped references would surface as a runtime evaluation failure instead of a clear 400 at attach time. New ErrorCode.CONTROL_BINDING_INCOMPATIBLE. Also: - SDK check_evaluation / check_evaluation_with_local are no longer session-bound. They take their own client (and controls); session target enforcement lives only on evaluate_controls. The shared validator is split: _validate_target_pair (both-or-neither) for the caller-owned helpers, _resolve_session_target (default + reject mismatch) for the session-bound entry point. Tests for the session-target rules move to evaluate_controls. - TypeScript SDK regenerated to match the new cursor type. - One regression test in test_target_merged_contract pins the cross-namespace 404.

…ace, expose controlBindings Three review issues: 1. Savepoint scoping for the upsert race: ``upsert_by_natural_key`` now wraps the conflicting insert in ``begin_nested()`` so a unique- constraint collision rolls back the SAVEPOINT only. The previous ``session.rollback()`` would discard every pending change in the surrounding transaction once anything composed this service after a prior flush. 2. Namespace-scope agent endpoints end-to-end. ``_get_agent_or_404`` now requires ``namespace_key`` and is non-disclosing across namespaces. The 11 callers thread ``namespace_key=Depends(get_namespace_key)`` through every signature; agent_policies / agent_controls reads, inserts, and deletes filter by namespace_key. Policy lookups in the association routes also filter by namespace_key, and ``ControlService.get_active_control_or_404 / list_controls_for_policy / add_control_to_agent / remove_control_from_agent`` accept ``namespace_key`` so the service layer is no longer namespace-blind. A regression test pins that cross-namespace agent association calls surface 404, mirroring the pattern from the GET /agents/{name}/ controls case. 3. TypeScript client wrapper exposes the new ``controlBindings`` API alongside ``agents``, ``controls``, etc., so consumers using the public ``AgentControlClient`` no longer have to reach into the generated internals.

…g race Two review issues: 1. ``create_binding`` now wraps the conflicting insert in a SAVEPOINT via ``begin_nested()`` so a duplicate-natural-key collision rolls back only that insert. Mirrors the upsert path so neither service method discards unrelated flushed work in a caller's transaction. 2. Plain agent metadata reads — ``GET /agents/{name}``, ``GET /agents``, ``GET /agents/{name}/evaluators``, ``GET /agents/{name}/evaluators/{evaluator_name}`` — now scope by ``namespace_key`` so duplicate names across namespaces (allowed by this migration) cannot leak rows from another namespace. The list endpoint additionally namespace-scopes the count, the page query, and the cursor-row lookup so pagination cannot redirect through a foreign-namespace agent. TypeScript SDK regenerated to pick up the new docstrings.

All three flagged items are addressed and replied to inline: target-bound SDK controls (f6deaed), atomic natural-key upsert (31e6491), paginated binding list (dc980df). Dismissing the stale review so the later approval can take effect; happy to re-open the discussion if anything still looks wrong.

…#204) ## Summary Pluggable request-auth framework that handles both auth flows the system needs: - **Management.** Online check on every request. The default authorizer authenticates the credential and authorizes the operation; in production this is `HttpUpstreamAuthProvider` forwarding to a configurable upstream service. - **Runtime.** Two-phase exchange-then-verify. A target-bearing call presents a long-lived credential plus `(target_type, target_id)` to a token exchange endpoint; the server mints a short-lived HS256 JWT bound to that target. Subsequent runtime calls verify the JWT locally, with no upstream round-trip on the hot path. Both flows route through the same primitives (`Operation` vocabulary on endpoints, `Principal` returned, `RequestAuthorizer` Protocol installed); a per-operation registry lets a deployment point management ops at one provider and runtime ops at another. Migrates the `/control-bindings` endpoint family onto the framework and ships the runtime token exchange endpoint. The runtime resolution path itself (`/evaluation` etc.) is wired in a follow-up; its provider override (`LocalJwtVerifyProvider`) is already in place when the runtime secret is configured. ## Module layout ``` server/src/agent_control_server/auth_framework/ __init__.py # public API core.py # Operation, Principal, RequestAuthorizer, require_operation, registry config.py # configure_auth_from_env, RuntimeAuthConfig, set_runtime_auth_config runtime_token.py # HS256 mint / verify helpers, UpstreamGrantExpiredError providers/ __init__.py header.py # HeaderAuthProvider + DEFAULT_OPERATION_ACCESS http_upstream.py # HttpUpstreamAuthProvider (forward + parse grant) local_jwt.py # LocalJwtVerifyProvider (hot-path JWT verify) server/src/agent_control_server/endpoints/ auth.py # POST /api/v1/auth/runtime-token-exchange ``` `auth.py` (legacy local credential check) is unchanged; `HeaderAuthProvider` re-uses `_validate_api_key` from it. Non-binding routes still go through the legacy router-level gate; their migration happens in follow-up PRs. ## Operation vocabulary ```python class Operation(StrEnum): # Wired on endpoints in this PR. CONTROL_BINDINGS_READ = "control_bindings.read" CONTROL_BINDINGS_WRITE = "control_bindings.write" RUNTIME_TOKEN_EXCHANGE = "runtime.token_exchange" # Reserved; not yet wired on endpoints. CONTROLS_READ = "controls.read" CONTROLS_CREATE = "controls.create" CONTROLS_UPDATE = "controls.update" CONTROLS_DELETE = "controls.delete" RUNTIME_USE = "runtime.use" ``` ## Per-operation authorizer registry `set_authorizer(authorizer, operation=...)` overrides the default for one operation. Without `operation=`, it becomes the default for every operation that does not have a specific binding. Used to route management ops through one provider and `Operation.RUNTIME_USE` through `LocalJwtVerifyProvider`: ```python set_authorizer(HttpUpstreamAuthProvider(...)) # default set_authorizer(LocalJwtVerifyProvider(secret=...), # override operation=Operation.RUNTIME_USE) ``` `require_operation(op)` consults the override first, falls back to the default. The local-credential path (no override installed) routes everything to `HeaderAuthProvider`; the no-auth flow (`api_key_enabled=False`) is preserved end-to-end. `require_operation` accepts an optional `context_builder` so the endpoint can surface request-shaped context (path / query / body fields) to the authorizer. The body-bearing binding endpoints, the target-filtered list endpoint, and the runtime token exchange endpoint all forward `(target_type, target_id)` so an upstream that resolves the target's owning project has the identifiers it needs to make a project-level decision. ## Providers (three ship in-tree) **`HeaderAuthProvider`**: local-credential path, single namespace. - Maps each `Operation` to one of three access levels (`PUBLIC`, `AUTHENTICATED`, `ADMIN`); single source of truth in `DEFAULT_OPERATION_ACCESS`. - Reuses the existing local API-key + session-cookie credential check from `auth.py`, so behavior matches the previous `require_admin_key` path verbatim. - Returns a normalized `runtime.use` scope only for `Operation.RUNTIME_TOKEN_EXCHANGE`, so the exchange endpoint can uniformly require `runtime.use` in `principal.scopes` across every provider; there is no implicit fallback that could escalate an upstream-supplied empty scope grant. - The no-auth flow (`api_key_enabled=False`) is preserved: every operation succeeds with a non-admin `Principal`. Pinned by a regression test. - Always returns `DEFAULT_NAMESPACE_KEY`. The namespace header lookup branch is preserved but inert until non-binding write endpoints are threaded. **`HttpUpstreamAuthProvider`**: generic upstream-delegating provider. - Forwards caller credentials (`X-API-Key`, `Authorization`, `Cookie`) on a POST to a configurable URL with `{operation, context?}`. - Optional service-to-service token header for upstream trust. - Parses the upstream response into a `Principal`: `namespace_key`, `is_admin`, `caller_id`, plus optional grant fields (`target_type`, `target_id`, `scopes`, `expires_at`) so the runtime token exchange can mint from the same response. - Maps `200` to `Principal`; `401` / `403` / `404` to matching error; `5xx`, network errors, malformed payloads, naive (`tzinfo`-less) `expires_at`, and partial target grants (only one of `target_type` / `target_id`) all fail closed (502/503). **`LocalJwtVerifyProvider`**: hot-path runtime verifier. - Reads a Bearer token from `Authorization`, verifies signature against the runtime secret, checks `domain == "runtime"`, the issuer, expiry, and that the token's scope covers the requested `Operation`. - Returns a `Principal` with the bound `(namespace_key, target_type, target_id)` so runtime endpoints inherit the namespace and target binding without re-deriving them. - When the dependency surfaces `target_type` / `target_id` via `context_builder`, the provider also enforces that they match the token's binding; runtime endpoints get the request-target check for free. ## Runtime token shape HS256, dedicated secret (`AGENT_CONTROL_RUNTIME_TOKEN_SECRET`), issuer `agent-control/server`. Claims: | Claim | Purpose | |---|---| | `domain` | Pinned to `runtime`; tokens minted here MUST not be accepted on management endpoints. | | `namespace_key` | The namespace the token authorizes within. Required for mint and verify; preserved end-to-end so a token minted for one namespace cannot be used to resolve controls in another. | | `actor_id` | Caller identity surfaced from the upstream grant. | | `scopes` | Granted runtime capabilities (e.g., `["runtime.use"]`). The exchange endpoint refuses to mint when `principal.scopes` does not contain `runtime.use`, including the case where the upstream's grant explicitly lists an empty scope set. | | `target_type` / `target_id` | Bind the token to one target. | | `iat` / `exp` | Bounded lifetime. The local TTL is capped by the upstream grant's `expires_at` so the local token can never outlive its grant. | | `jti` | Random identifier; reserved for future revocation. | `mint_runtime_token` rejects an `upstream_expires_at` whose `tzinfo is None` or whose `utcoffset()` is `None` with `RuntimeTokenError` so a custom authorizer that supplies a naive datetime surfaces as a typed auth error rather than a raw `TypeError` deeper in the comparison. ## Runtime token exchange endpoint ``` POST /api/v1/auth/runtime-token-exchange { "target_type": "...", "target_id": "..." } ``` - Authenticated and authorized via `Operation.RUNTIME_TOKEN_EXCHANGE` through the default authorizer (typically `HttpUpstreamAuthProvider` in production). The authorizer's `context_builder` forwards the requested target to the upstream so it can authorize against the right resource. - Refuses with 503 when `AGENT_CONTROL_RUNTIME_TOKEN_SECRET` is not configured. - Mints a local token from `Principal.scopes` / `Principal.grant_expires_at`, capped by the configured TTL (default 300s). - When the provider's `Principal` carries a target binding, the endpoint verifies it matches the requested target before minting. - An upstream grant whose `expires_at` is already in the past surfaces as 502 (`UpstreamGrantExpiredError`), distinct from the 503 misconfigured-server path so the public status reflects which side the operator should investigate. Response: `{ token, expires_at, target_type, target_id, scopes }`. ## Storage namespace under the framework The migrated binding endpoints take the storage `namespace_key` from `get_namespace_key` (the same resolver the rest of the server uses), not from `principal.namespace_key`. The auth chain still runs through `require_operation` for authentication and authorization, but the row's namespace is sourced from the resolver so binding writes and runtime reads stay in lockstep until auth-derived namespace resolution lands across `/controls`, `/policies`, `/agents`, and `/evaluation` together. The principal's namespace is observed (and used by `LocalJwtVerifyProvider` for its own contract) but is not used to pick the row's storage namespace at this stage. ## Migrated endpoints All seven `/api/v1/control-bindings*` endpoints now use `Depends(require_operation(...))`: | Method | Path | Operation | Context forwarded | |---|---|---|---| | PUT | `/control-bindings` | `control_bindings.write` | body: `target_type`, `target_id` | | GET | `/control-bindings` | `control_bindings.read` | query: `target_type`, `target_id` (when present) | | GET | `/control-bindings/{binding_id}` | `control_bindings.read` | N/A (namespace-wide) | | PATCH | `/control-bindings/{binding_id}` | `control_bindings.write` | N/A (namespace-wide) | | DELETE | `/control-bindings/{binding_id}` | `control_bindings.write` | N/A (namespace-wide) | | PUT | `/control-bindings/by-key` | `control_bindings.write` | body: `target_type`, `target_id` | | POST | `/control-bindings/by-key:delete` | `control_bindings.write` | body: `target_type`, `target_id` | The four binding-id-based routes are documented as namespace-wide: their target identifiers are not available before the binding row is loaded, and `require_operation` is single-pass. Clients whose authorization model requires per-target permissions are steered to the natural-key endpoints and the target-filtered list, all of which forward the target to the authorizer. Two-phase auth on the by-id routes is a follow-up. New: `POST /api/v1/auth/runtime-token-exchange` (operation `runtime.token_exchange`). The framework-protected routers (`/control-bindings`, `/auth`) are mounted with the existing non-validating `get_api_key_from_header` Security extractor as a router-level dependency. `require_operation` still owns runtime authentication and authorization; the Security dependency exists purely so the generated OpenAPI spec advertises `X-API-Key` on these routes for downstream SDK generation. ## Generated client The TypeScript wrapper exposes both `auth` and `controlBindings` getters alongside the existing surface, so consumers using the public client can call `runtimeTokenExchange` and the binding API without reaching into the generated internals. ## Env vars | Var | Default | Purpose | |---|---|---| | `AGENT_CONTROL_AUTH_MODE` | `header` | Default authorizer: `header` or `http_upstream`. | | `AGENT_CONTROL_AUTH_UPSTREAM_URL` | none | Required when mode is `http_upstream`. | | `AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS` | `5.0` | Per-request timeout. | | `AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN` | none | Optional upstream service token. | | `AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER` | `X-Agent-Control-Service-Token` | Header name for the service token. | | `AGENT_CONTROL_RUNTIME_TOKEN_SECRET` | none | Required to enable runtime auth + the exchange endpoint. Validated at startup; rejected if shorter than 32 bytes. | | `AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS` | `300` | Local token TTL ceiling (capped further by the upstream grant). Validated at startup. | `configure_auth_from_env` parses both runtime fields once at startup into a frozen `RuntimeAuthConfig`. The exchange endpoint and `LocalJwtVerifyProvider` read the same object, so the mint and verify sides cannot drift apart on a process. When the runtime secret is absent, `RUNTIME_USE` falls through to the default authorizer; this is logged at WARNING so an operator can immediately see what trust model is in effect. `RUNTIME_USE` is reserved and not wired to `/evaluation` in this PR, so this fallback does not affect the runtime hot path yet. The follow-up that wires runtime endpoints should explicitly choose legacy fallback or fail-closed JWT-only behavior. ## Out of scope (follow-ups) - Migrate `/controls` CRUD onto `require_operation` using the reserved `CONTROLS_*` operations. - Wire `Operation.RUNTIME_USE` on the runtime resolution path (`/evaluation`, etc.) and the SDK side of the runtime exchange. The provider override is already in place when the runtime secret is configured. - Migrate `/agents/initAgent` onto `require_operation`. The `HttpUpstreamAuthProvider`'s `context_builder` should forward the request's `target_type` / `target_id` to the upstream so the upstream can authorize against the requested resource. - Auth-derived `get_namespace_key` so the binding endpoints can use the principal's namespace for storage along with the rest of the server. - Two-phase auth for the four binding-id-based routes (GET/PATCH/DELETE `/control-bindings/{binding_id}`) so they can forward target context to the upstream. - Drop `auth.py`'s `require_admin_key` once every management endpoint is migrated. ## Stacking Stacked on **PR #203** (`abhi/data-model-v1`); rebased onto its current head `8adc328` so the merged effective-controls contract, namespace-threaded agent endpoints, and savepoint-protected binding writes are the base this PR builds on. Will rebase onto `main` once #203 merges. ## Test plan - [x] 55 framework + endpoint tests covering: - Default coverage: every `Operation` member has a default access mapping (regression guard). - `HeaderAuthProvider`: PUBLIC bypass, AUTHENTICATED + ADMIN paths route to the legacy validator with the right `require_admin` flag, no-auth mode passes admin operations, namespace-header lookup currently inert, unknown operation raises, normalized `runtime.use` scope returned for `RUNTIME_TOKEN_EXCHANGE`. - `HttpUpstreamAuthProvider`: 200 happy path with realistic JSON wire shapes (ISO datetime + JSON array scopes round-trip), service token forwarding, 401/403/404 mapping, 5xx fail-closed, network-error fail-closed, strict-grant rejection on wrong-typed `is_admin` / malformed `scopes` / bad `expires_at` / non-string target fields, partial target grant rejected, naive `expires_at` rejected. - `require_operation` factory: routes through the installed authorizer, per-operation overrides take precedence, clearing an override falls back to the default, `get_authorizer` raises when nothing is set. - Lifecycle: reconfiguring without the runtime secret drops the previous `LocalJwtVerifyProvider` override; teardown clears every authorizer; secret shorter than 32 bytes raises at startup; invalid TTL raises at startup. - Runtime token mint / verify: round-trip, wrong-secret rejection, expiry rejection, TTL capped by upstream grant, management-domain token refused on runtime verify, missing-namespace rejection, already-expired upstream grant raises `UpstreamGrantExpiredError`, naive `upstream_expires_at` raises `RuntimeTokenError`. - `LocalJwtVerifyProvider`: target-bound `Principal`, namespace carried from token, missing token returns 401, wrong scope returns 403, non-Bearer header returns 401, target-context match enforcement (mismatch on type or id returns 403). - Exchange endpoint: 503 without secret, mint when configured, target mismatch rejected (400), missing target rejected (422), grant-without-runtime-use rejected (no privilege escalation), explicit empty-scope grant rejected (no fallback escalation), target context forwarded to authorizer, non-default namespace propagates into the token, full exchange-then-verify round trip, already-expired upstream grant surfaces as 502 distinct from the 503 misconfigured-server path. - [x] Full server suite: 676 passed. - [x] `make lint` clean. - [x] `make typecheck` clean. - [x] `make sdk-ts-generate-check` clean. - [x] TypeScript SDK regenerated alongside the new endpoint (`auth-runtime-token-exchange`, request/response models, `Auth` and `ControlBindings` groups exposed via the public client).

abhinav-galileo added 9 commits April 27, 2026 17:29

docs(models): fix stale natural-key reference on UpsertControlBinding…

d69ed77

…Request The class docstring still described the upsert natural key as (target_type, target_id, agent_name, control_id). Update to match the V1 shape: (target_type, target_id, control_id).

abhinav-galileo marked this pull request as ready for review April 28, 2026 16:00

abhinav-galileo requested review from lan17 and namrataghadi-galileo April 28, 2026 16:00

lan17 previously requested changes Apr 28, 2026

View reviewed changes

Comment thread server/src/agent_control_server/endpoints/evaluation.py Outdated

Comment thread server/src/agent_control_server/services/control_bindings.py

Comment thread server/src/agent_control_server/services/control_bindings.py

abhinav-galileo added 14 commits April 28, 2026 14:14

feat(server): cap namespace_key length at 255 chars

beee400

Match the existing String(255) convention used for agent_name and control/policy names. Bounds index key size and prevents pathological long values across all namespace-scoped tables.

docs(server): show override shape on get_namespace_key

d1f5bf7

Spell out that integrators can declare any FastAPI-resolvable dependency in their override signature, and include a JWT-claim example so the extension point is concrete instead of abstract.

abhinav-galileo requested a review from lan17 April 28, 2026 19:51

abhinav-galileo mentioned this pull request Apr 28, 2026

feat(server): pluggable request-auth framework (management + runtime) #204

Merged

6 tasks

abhinav-galileo added 4 commits April 29, 2026 13:51