Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/security/audit-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,40 @@ The `source` field distinguishes in-process enforcer events from subprocess prox

When the inbound A2A request carries the orchestrator's correlation headers (`X-Workflow-ID`, `X-Workflow-Stage-ID`, `X-Workflow-Step-ID`, `X-Invocation-Caller`), every audit event emitted during that invocation is tagged with the matching `workflow_id` / `stage_id` / `step_id` / `invocation_caller` fields. Header names are vendor-neutral so any A2A-compatible orchestrator can populate them. Direct A2A invocations (no orchestrator) omit the fields entirely — emitted JSON is byte-identical to the pre-correlation shape. See [Workflow correlation IDs](workflow-correlation.md) for the full reference, including outbound propagation for agent-to-agent flows.

### Tenancy stamping

For deployments where one or more agents serve multiple orgs or workspaces, every audit event can be stamped with `org_id` and `workspace_id` top-level fields so downstream consumers can filter by tenancy without joining against `auth_verify`. Two layers, highest precedence first:

| Layer | Source | When it wins |
|-------|--------|--------------|
| Per-request override | `X-Forge-Org-ID` / `X-Forge-Workspace-ID` request headers | Always — when present, override the static stamp |
| Deployment-time stamp | `FORGE_ORG_ID` / `FORGE_WORKSPACE_ID` env vars | When the request carries no override headers |

The deployment-time stamp is read once at agent startup and applied via `AuditLogger.WithTenancy(...)`. It covers every emitted event — startup banners (`agent_card_published`, `policy_loaded`, `audit_export_status`) AND per-invocation events (`session_start`, `llm_call`, `guardrail_check`, `invocation_complete`, etc.). The per-request override only kicks in inside the request scope; startup banners always reflect the env stamp.

```yaml
# Initializ platform deployment manifest — static-tenancy case
env:
- name: FORGE_ORG_ID
value: "org_abc123"
- name: FORGE_WORKSPACE_ID
value: "ws_xyz789"
```

```sh
# Multi-tenant routing case — the orchestrator picks per request
curl -X POST https://agent.example.com/ \
-H 'X-Forge-Org-ID: org_def456' \
-H 'X-Forge-Workspace-ID: ws_pqr012' \
...
```

Both fields use `omitempty`. Deployments that set neither env nor header keep emitting the pre-tenancy JSON shape verbatim — no schema version bump.

The top-level `org_id` is distinct from `auth_verify.fields.org_id`, which carries whatever the inbound auth token claimed (provider-derived). The top-level value is the operator's declared tenancy, trusted because the deployment / orchestrator set it. Both can be present on the same `auth_verify` event when they're different identifiers (e.g., the token came from a federated identity but the agent is deployed into a specific workspace).

See [Tenancy stamping reference](tenancy.md) for the precedence rules and the agent-to-agent propagation helper.

### Token usage and execution duration

Every `llm_call` audit event carries the normalized token counts the provider returned in its response metadata, plus the wall-clock time spent in the provider call. Field naming aligns with [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) (`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens`) so audit consumers can correlate Forge audit events with OTel traces without a translation table.
Expand Down
84 changes: 84 additions & 0 deletions docs/security/tenancy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Tenancy Stamping"
description: "Stamping org_id and workspace_id on every audit event from env + headers."
order: 9
---

## Tenancy Stamping

For multi-tenant deployments, every Forge audit event can carry top-level `org_id` and `workspace_id` keys so SIEM / audit-warehouse consumers filter by tenancy without joining against `auth_verify` rows. See issue #157.

## Two layers

The same agent process supports both the static-deployment case (one
agent serves one workspace) and the multi-tenant routing case (one
agent serves many workspaces, the orchestrator picks per request).

| Layer | Source | Wins when |
|-------|--------|-----------|
| 1 — Explicit on event | `AuditEvent.OrgID` / `AuditEvent.WorkspaceID` set before emit | Always — caller-owned event takes precedence over every fallback |
| 2 — Per-request override | `X-Forge-Org-ID` / `X-Forge-Workspace-ID` request headers | Inside the request scope when present; ctx falls through to layer 3 otherwise |
| 3 — Deployment-time stamp | `FORGE_ORG_ID` / `FORGE_WORKSPACE_ID` env vars | Whenever the higher layers carry no value |

Each field is resolved independently. A request that overrides only `X-Forge-Org-ID` still lets the env stamp fill in `workspace_id`.

## Static tenancy (one agent per workspace)

The simplest case: deploy one Forge agent into one workspace, declare the tenancy via env, never set headers. Every emitted event — including startup banners — carries the stamp.

```yaml
# Kubernetes deployment fragment
env:
- name: FORGE_ORG_ID
value: "org_abc123"
- name: FORGE_WORKSPACE_ID
value: "ws_xyz789"
```

The audit stream then looks like:

```json
{"ts":"2026-06-14T10:00:00Z","event":"agent_card_published","schema_version":"1.0","org_id":"org_abc123","workspace_id":"ws_xyz789","fields":{...}}
{"ts":"2026-06-14T10:00:05Z","event":"session_start","schema_version":"1.0","seq":1,"correlation_id":"...","task_id":"...","org_id":"org_abc123","workspace_id":"ws_xyz789"}
{"ts":"2026-06-14T10:00:08Z","event":"llm_call","schema_version":"1.0","seq":2,"correlation_id":"...","task_id":"...","model":"...","provider":"...","org_id":"org_abc123","workspace_id":"ws_xyz789"}
```

SIEM filter: `org_id = "org_abc123" AND workspace_id = "ws_xyz789"`.

## Per-request routing (one agent serves many workspaces)

For deployments where one Forge agent fronts many workspaces and the orchestrator routes per request, set the env vars to a default tenancy (or leave them empty) and have the orchestrator send the override headers on every request:

```sh
curl -X POST https://agent.example.com/ \
-H 'X-Forge-Org-ID: org_def456' \
-H 'X-Forge-Workspace-ID: ws_pqr012' \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":"1","method":"tasks/send","params":{...}}'
```

Every audit event emitted during that request carries `"org_id":"org_def456","workspace_id":"ws_pqr012"`. The next request from a different workspace gets its own stamp.

Startup banners (`agent_card_published`, `policy_loaded`, `audit_export_status`) still reflect the env stamp because they have no request context.

## Outbound propagation (agent-to-agent flows)

When one Forge agent calls another (via the egress proxy with explicit propagation), the helper `coreruntime.TenancyContextFromContext(ctx).ApplyToHTTPHeaders(req.Header)` writes both headers onto the outbound request. The downstream agent picks them up at its A2A boundary the same way.

Auto-propagation is NOT built into the egress proxy. The agent only propagates tenancy when it knows the target is a tenancy-aware Forge peer. This mirrors the workflow-header behavior: explicit only, to avoid leaking tenancy to unrelated third-party APIs.

## Backwards compatibility

Both `org_id` and `workspace_id` use `omitempty`. Deployments that set neither env nor header keep emitting the pre-tenancy JSON shape verbatim. Consumers that ignore unknown keys continue to work unchanged. The audit schema version is **not** bumped — additive optional fields are schema-compatible per the documented policy.

## Distinct from auth_verify.fields.org_id

The auth provider chain resolves an `Identity.OrgID` from the inbound bearer token (whatever the issuer claims) and stamps it on `auth_verify.fields.org_id` for back-compat. That value reflects the *user's* org from their identity token.

The top-level `org_id` documented here is the **deployment's** declared tenancy — the operator's explicit assertion of where this agent runs. The two can differ legitimately (federated identity, cross-tenant invocation) and downstream consumers should treat them as independent signals. Both can be present on the same `auth_verify` event.

## See also

- [Audit Logging](audit-logging.md) — full event catalog
- [Workflow correlation IDs](workflow-correlation.md) — the sibling FWS-2 header system (`X-Workflow-*`)
- [Authentication](authentication.md) — where `Identity.OrgID` comes from
23 changes: 23 additions & 0 deletions forge-cli/runtime/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,14 @@ func (r *Runner) Run(ctx context.Context) error {
// pre-FWS-7 compatible.
auditLogger := coreruntime.NewAuditLoggerFromConfig(r.cfg.AuditExport)
auditLogger.SetOpsLogger(r.logger)
// Deployment-time tenancy stamp (#157). FORGE_ORG_ID /
// FORGE_WORKSPACE_ID are read once here and stamped on every
// emitted event — startup banners (agent_card_published,
// policy_loaded) AND per-invocation events all get the stamp.
// Per-request X-Forge-Org-ID / X-Forge-Workspace-ID headers
// (picked up in the A2A handlers) override the static stamp.
// Empty env → empty stamp → fields omitted (backward compatible).
auditLogger.WithTenancy(os.Getenv("FORGE_ORG_ID"), os.Getenv("FORGE_WORKSPACE_ID"))

// 4a. Build guardrail checker (DB mode → file mode → defaults) and
// wire the audit logger so every mask/block/warn decision lands on
Expand Down Expand Up @@ -1579,6 +1587,9 @@ func (r *Runner) registerRESTHandlers(srv *server.Server, executor coreruntime.A
// WorkflowContext → fields omitted (backward compat).
ctx := coreruntime.WithWorkflowContext(req.Context(),
coreruntime.WorkflowContextFromHTTPHeaders(req.Header))
// Same for tenancy override headers (#157).
ctx = coreruntime.WithTenancyContext(ctx,
coreruntime.TenancyContextFromHTTPHeaders(req.Header))
task, snap, err := r.executeTask(ctx, params, store, executor, guardrails, egressClient, auditLogger)
if err != nil {
writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()})
Expand Down Expand Up @@ -1640,6 +1651,9 @@ func (r *Runner) registerRESTHandlers(srv *server.Server, executor coreruntime.A
// tagging via EmitFromContext.
ctx = coreruntime.WithWorkflowContext(ctx,
coreruntime.WorkflowContextFromHTTPHeaders(req.Header))
// Same for tenancy override headers (#157).
ctx = coreruntime.WithTenancyContext(ctx,
coreruntime.TenancyContextFromHTTPHeaders(req.Header))
// Per-invocation usage accumulator + invocation_complete on exit.
// See issue #87 / FWS-3.
restSSEAcc := coreruntime.NewLLMUsageAccumulator()
Expand Down Expand Up @@ -2417,6 +2431,11 @@ func makeAuthAuditCallback(auditLogger *coreruntime.AuditLogger) func(*http.Requ
// workflow tags. Empty when the orchestrator didn't send them
// — fields then omit (backward compat).
wc := coreruntime.WorkflowContextFromHTTPHeaders(req.Header)
// Same for the per-request tenancy override (#157). When
// absent, the AuditLogger's static deployment-time stamp still
// kicks in via plain Emit so auth events match the rest of
// the stream's org_id / workspace_id columns.
tc := coreruntime.TenancyContextFromHTTPHeaders(req.Header)

if err == nil && id != nil {
// Success → auth_verify.
Expand All @@ -2437,6 +2456,8 @@ func makeAuthAuditCallback(auditLogger *coreruntime.AuditLogger) func(*http.Requ
StageID: wc.StageID,
StepID: wc.StepID,
InvocationCaller: wc.InvocationCaller,
OrgID: tc.OrgID,
WorkspaceID: tc.WorkspaceID,
Fields: fields,
})
return
Expand All @@ -2450,6 +2471,8 @@ func makeAuthAuditCallback(auditLogger *coreruntime.AuditLogger) func(*http.Requ
StageID: wc.StageID,
StepID: wc.StepID,
InvocationCaller: wc.InvocationCaller,
OrgID: tc.OrgID,
WorkspaceID: tc.WorkspaceID,
Fields: map[string]any{
"reason": authFailReason(err),
"token_kind": tokenKind,
Expand Down
8 changes: 8 additions & 0 deletions forge-cli/server/a2a_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,14 @@ func (s *Server) handleJSONRPC(w http.ResponseWriter, r *http.Request) {
ctx = coreruntime.WithWorkflowContext(ctx,
coreruntime.WorkflowContextFromHTTPHeaders(r.Header))

// Extract per-request tenancy override headers (#157) at the same
// boundary so EmitFromContext can prefer them over the static
// deployment-time stamp installed via AuditLogger.WithTenancy.
// Absent headers produce an IsZero TenancyContext — the static
// stamp wins, or fields omit when no stamp is installed either.
ctx = coreruntime.WithTenancyContext(ctx,
coreruntime.TenancyContextFromHTTPHeaders(r.Header))

// Phase 3 (#104) — open the inbound dispatch span. Span name
// mirrors the JSON-RPC method ("a2a.tasks/send", "a2a.tasks/get",
// "a2a.tasks/cancel") so backend dashboards key by the same
Expand Down
103 changes: 103 additions & 0 deletions forge-core/runtime/audit.go
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,29 @@ type AuditEvent struct {
// or upstream agent in an agent-to-agent flow).
InvocationCaller string `json:"invocation_caller,omitempty"`

// OrgID + WorkspaceID stamp the tenancy this agent run belongs
// to. Sourced from one of three layers (highest precedence first):
//
// 1. Explicit value set on the event before emit.
// 2. Per-request override headers parsed at the A2A boundary
// (X-Forge-Org-ID / X-Forge-Workspace-ID) and stashed on the
// context via WithTenancyContext.
// 3. Deployment-time stamp installed on the AuditLogger via
// WithTenancy(orgID, workspaceID) — typically populated from
// FORGE_ORG_ID / FORGE_WORKSPACE_ID at agent startup.
//
// Both keys use omitempty so deployments that don't set tenancy
// keep emitting the pre-tenancy JSON shape verbatim. The
// AuditSchemaVersion is NOT bumped — additive optional fields are
// schema-compatible per the documented policy. See issue #157.
//
// Distinct from the auth-derived `auth_verify.fields.org_id`,
// which continues to carry whatever the inbound token claimed.
// The top-level OrgID here is the operator's declared tenancy,
// trusted because the deployment / orchestrator set it.
OrgID string `json:"org_id,omitempty"`
WorkspaceID string `json:"workspace_id,omitempty"`

// LLM call attribution (llm_call, llm_call_cancelled, invocation_complete).
Model string `json:"model,omitempty"`
Provider string `json:"provider,omitempty"`
Expand Down Expand Up @@ -252,6 +275,46 @@ type AuditLogger struct {
sinks []Sink
logOnce map[string]bool // sink_name → first-error-already-logged for that sink
opsLog Logger // optional structured logger for sink-error reporting; nil disables

// Static tenancy stamp, installed once at agent startup via
// WithTenancy(). Populated from FORGE_ORG_ID / FORGE_WORKSPACE_ID
// in the CLI runner. EmitFromContext falls back to these whenever
// the request context carries no TenancyContext override. See
// issue #157.
tenantOrgID string
tenantWorkspaceID string
}

// WithTenancy installs the deployment-time tenancy stamp on the
// AuditLogger. Both arguments are optional — passing "" disables
// the stamp for that field. Called once at runner startup after
// resolving FORGE_ORG_ID / FORGE_WORKSPACE_ID. Returns the receiver
// for fluent construction.
//
// Precedence at emit time (highest first):
//
// 1. Explicit OrgID/WorkspaceID set on the AuditEvent.
// 2. TenancyContext from the request context (per-request override
// header X-Forge-Org-ID / X-Forge-Workspace-ID).
// 3. The static stamp installed here.
//
// Setting tenancy on an already-running AuditLogger is allowed but
// not the common path; hot-reload is the typical caller.
func (a *AuditLogger) WithTenancy(orgID, workspaceID string) *AuditLogger {
a.mu.Lock()
a.tenantOrgID = orgID
a.tenantWorkspaceID = workspaceID
a.mu.Unlock()
return a
}

// tenancyStamp returns the static tenancy under lock so concurrent
// emit callers don't race against a hot-reload that re-runs
// WithTenancy. Internal — emit paths use this.
func (a *AuditLogger) tenancyStamp() (orgID, workspaceID string) {
a.mu.Lock()
defer a.mu.Unlock()
return a.tenantOrgID, a.tenantWorkspaceID
}

// NewAuditLogger creates a single-sink AuditLogger wrapping the given
Expand Down Expand Up @@ -346,6 +409,21 @@ func (a *AuditLogger) Emit(event AuditEvent) {
if event.SchemaVersion == "" {
event.SchemaVersion = AuditSchemaVersion
}
// Deployment-time tenancy stamp (#157). Plain Emit has no request
// context, so the per-request header override path can't fire
// here — but startup banners (agent_card_published, policy_loaded,
// audit_export_status) are exactly the events that MUST carry the
// deployment tenancy so SIEM filters work on every row, not just
// per-invocation events.
if event.OrgID == "" || event.WorkspaceID == "" {
staticOrg, staticWS := a.tenancyStamp()
if event.OrgID == "" {
event.OrgID = staticOrg
}
if event.WorkspaceID == "" {
event.WorkspaceID = staticWS
}
}
data, err := json.Marshal(event)
if err != nil {
return
Expand Down Expand Up @@ -446,6 +524,31 @@ func (a *AuditLogger) EmitFromContext(ctx context.Context, event AuditEvent) {
}
}
}
// Tenancy stamp (#157) — per-request header override beats the
// deployment-time stamp, which beats the omitempty default. Same
// "context is fallback, not override" rule as the workflow keys
// above, but we ALSO consult the AuditLogger's static stamp when
// the ctx carries no override. Both fields are independent: the
// caller can override one (e.g. WorkspaceID via header) and let
// the other fall back to the env stamp.
if event.OrgID == "" || event.WorkspaceID == "" {
tc := TenancyContextFromContext(ctx)
staticOrg, staticWS := a.tenancyStamp()
if event.OrgID == "" {
if tc.OrgID != "" {
event.OrgID = tc.OrgID
} else {
event.OrgID = staticOrg
}
}
if event.WorkspaceID == "" {
if tc.WorkspaceID != "" {
event.WorkspaceID = tc.WorkspaceID
} else {
event.WorkspaceID = staticWS
}
}
}
a.Emit(event)
}

Expand Down
Loading
Loading