Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b544cf2
feat: add Anthropic SDK backend + per-model backend selection
anticomputer Jun 11, 2026
9ad4c4f
fix: address PR review feedback
anticomputer Jun 11, 2026
ed4412f
fix: handle None tool descriptions in Anthropic tool conversion
anticomputer Jun 11, 2026
a8bf3b8
fix: CI failures, add unit tests, update docs
anticomputer Jun 11, 2026
b6f0057
fix: lint errors in test file (unused imports, N803 camelCase)
anticomputer Jun 11, 2026
266c54b
test: add backend extraction coverage to _resolve_task_model tests
anticomputer Jun 11, 2026
4417279
fix: pass real token as api_key instead of placeholder
anticomputer Jun 11, 2026
e16b20a
fix: implement blocked_tools filtering in anthropic backend
anticomputer Jun 11, 2026
ccbf7c5
fix: address PR review feedback (round 2)
anticomputer Jun 11, 2026
0c521e0
fix: lint errors + URL substring sanitization (CodeQL)
anticomputer Jun 11, 2026
b4da0a6
fix: address PR review feedback (round 3)
anticomputer Jun 11, 2026
03ffdb9
fix: correct relative import for capi in anthropic_sdk backend
anticomputer Jun 12, 2026
2cb49b9
refactor: use provider registry bearer_auth for anthropic backend auth
anticomputer Jun 12, 2026
ed19781
style: use ternary for token resolution (ruff SIM108)
anticomputer Jun 12, 2026
ffb017e
refactor: move unfiltered MCP tool listing into MCPNamespaceWrap
anticomputer Jun 12, 2026
00978f2
style: fix hatch fmt lint errors in test_mcp_utils
anticomputer Jun 12, 2026
b1b139b
fix(capi): add gpt-5 to OpenAI _CHAT_PREFIXES allowlist
anticomputer Jun 12, 2026
4eea127
doc + refactor: address remaining PR review threads
anticomputer Jun 12, 2026
c97ab6a
feat(anthropic_sdk): default-on automatic prompt caching
anticomputer Jun 12, 2026
f42c06b
fix(anthropic_sdk): match blocked_tools against raw + namespaced names
anticomputer Jun 15, 2026
c6ef3ae
Revert default_model bump back to gpt-4.1
anticomputer Jun 15, 2026
5e0a38c
Address PR feedback + proactive cleanup pass
anticomputer Jun 15, 2026
4f1f440
fix(anthropic_sdk): preserve empty tool output + harden token test
anticomputer Jun 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 36 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,37 +83,53 @@ Per-model `model_settings` can include:

### Backends

The runner can drive two SDKs behind a common interface:
The runner can drive three SDKs behind a common interface:

- **`openai_agents`** (default) — the OpenAI Agents Python SDK. Supports
multi-personality handoffs, both `chat_completions` and `responses`
`api_type`, `temperature`, `parallel_tool_calls`,
`exclude_from_context`, and MCP over stdio, SSE, and streamable HTTP.
- **`copilot_sdk`** (optional, `pip install seclab-taskflow-agent[copilot]`)
— the GitHub Copilot Python SDK. Supports streaming, `reasoning_effort`,
MCP over stdio/SSE/HTTP, and per-tool permission gating. The SDK
selects its own wire protocol per model, so the YAML `api_type` field
is not honoured; multi-personality handoffs, `temperature`, and
`parallel_tool_calls` are likewise not available. Taskflows that use
unsupported fields fail at load time with a `BackendCapabilityError`
naming the offending field.

Selection precedence:

1. `backend:` field in the model config document.
2. `SECLAB_TASKFLOW_BACKEND` environment variable.
3. Endpoint auto-default (`api.githubcopilot.com` prefers `copilot_sdk`
when the optional dependency is installed).
4. `openai_agents`.
- **`copilot_sdk`** — the GitHub Copilot Python SDK. Supports streaming,
`reasoning_effort`, MCP over stdio/SSE/HTTP, and per-tool permission
gating. The SDK selects its own wire protocol per model, so the YAML
`api_type` field is not honoured; multi-personality handoffs,
`temperature`, and `parallel_tool_calls` are likewise not available.
Taskflows that use unsupported fields fail at load time with a
`BackendCapabilityError` naming the offending field.
- **`anthropic_sdk`** — the Anthropic Python SDK, driving the native
Messages API (`/v1/messages`). Supports streaming, tool calling via
MCP, and adaptive thinking with configurable `reasoning.effort`
(`low`, `medium`, `high`, `max`). Handoffs are not supported.
Designed for use with CAPI's Anthropic endpoint; auth uses
`Authorization: Bearer` (not `x-api-key`).

Selection precedence (highest to lowest):

1. Per-task `backend:` in the task's own `model_settings` block (overrides
the model-level value for that one task; see `_resolve_task_model()`).
2. Per-model `backend:` in the model config's `model_settings` (allows
mixed backends in a single taskflow).
3. `backend:` field at the top level of the model config document
(global default).
4. `SECLAB_TASKFLOW_BACKEND` environment variable.
5. `openai_agents`.

```yaml
seclab-taskflow-agent:
version: "1.0"
filetype: model_config
backend: copilot_sdk
models:
fast: gpt-5-mini
slow: claude-opus-4.6
code_analysis: claude-opus-4.7
general_tasks: gpt-5.4-mini
model_settings:
code_analysis:
api_type: messages
backend: anthropic_sdk
reasoning:
effort: high
general_tasks:
api_type: responses
backend: openai_agents
```

### Session Recovery
Expand Down
12 changes: 9 additions & 3 deletions doc/GRAMMAR.md
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@ api_type: chat_completions # default for all models
models:
gpt_default: gpt-4.1
gpt_responses: gpt-5.1
claude_native: claude-opus-4.7
model_settings:
gpt_default:
temperature: 0.7
Expand All @@ -532,16 +533,21 @@ model_settings:
endpoint: https://api.githubcopilot.com
token: CAPI_TOKEN # env var name containing the API key
temperature: 0.5
claude_native:
api_type: messages # use the Anthropic Messages API
backend: anthropic_sdk
reasoning:
effort: high
```

The following keys in `model_settings` are handled by the engine and are not
passed to the underlying model provider:

| Key | Description | Default |
|-----|-------------|---------|
| `api_type` | `"chat_completions"` or `"responses"` | Inherited from top-level `api_type`, or `"chat_completions"` |
| `api_type` | `"chat_completions"`, `"responses"`, or `"messages"` | Inherited from top-level `api_type`, or `"chat_completions"` |
| `backend` | SDK adapter: `"openai_agents"`, `"copilot_sdk"`, or `"anthropic_sdk"` | Inherited from top-level `backend`, or `"openai_agents"` |
| `endpoint` | API base URL for this model | The global `AI_API_ENDPOINT` env var |
| `token` | Name of an environment variable containing the API key | Uses `AI_API_TOKEN` / `COPILOT_TOKEN` |

All other keys (e.g. `temperature`, `top_p`) are passed through as model
parameters to the OpenAI SDK.
All other keys (e.g. `temperature`, `top_p`, `reasoning`) are forwarded to the selected SDK backend. Each backend decides what to do with each key: `openai_agents` accepts the standard OpenAI parameter set; `anthropic_sdk` forwards a curated subset (currently `temperature`, `top_p`, `reasoning`, `max_tokens`, `stream_thinking`, `prompt_caching`) and silently ignores keys outside that set; `copilot_sdk` consumes the keys its SDK exposes (e.g. `reasoning_effort`) and **rejects** unsupported keys at validate time with `BackendCapabilityError` (currently `temperature` and `parallel_tool_calls`) rather than silently dropping them. Consult the backend-specific docs if in doubt.
11 changes: 2 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ classifiers = [
dependencies = [
"aiofiles==24.1.0",
"annotated-types==0.7.0",
"anthropic>=0.50,<1",
"anyio==4.9.0",
"attrs==25.3.0",
"Authlib==1.6.12",
Expand All @@ -55,6 +56,7 @@ dependencies = [
"email-validator==2.3.0",
"exceptiongroup==1.3.0",
"fastmcp==3.2.0",
"github-copilot-sdk>=0.2.2,<0.3",
Comment thread
anticomputer marked this conversation as resolved.
"griffe==1.7.3",
"h11==0.16.0",
"httpcore==1.0.9",
Expand Down Expand Up @@ -124,15 +126,6 @@ dependencies = [
[project.scripts]
seclab-taskflow-agent = "seclab_taskflow_agent.cli:app"

[project.optional-dependencies]
# Pulls in the GitHub Copilot SDK (public preview) so the copilot_sdk
# backend can be selected. Requires Python >= 3.11. Pinned to the
# 0.2.x line because the SDK may ship breaking changes between minor
# versions while still in preview.
copilot = [
"github-copilot-sdk>=0.2.2,<0.3",
]

[project.urls]
Source = "https://github.com/GitHubSecurityLab/seclab-taskflow-agent"
Issues = "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/issues"
Expand Down
8 changes: 5 additions & 3 deletions src/seclab_taskflow_agent/capi.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class APIProvider:
models_catalog: str = "/models"
default_model: str = "gpt-4.1"
extra_headers: Mapping[str, str] = field(default_factory=dict)
bearer_auth: bool = True # Use Authorization: Bearer (not x-api-key)

def __post_init__(self) -> None:
# Ensure base_url ends with / so httpx URL.join() preserves the path
Expand Down Expand Up @@ -110,7 +111,7 @@ class _OpenAIProvider(APIProvider):
we maintain a prefix allowlist of known chat-completion model families.
"""

_CHAT_PREFIXES = ("gpt-3.5", "gpt-4", "o1", "o3", "o4", "chatgpt-")
_CHAT_PREFIXES = ("gpt-3.5", "gpt-4", "gpt-5", "o1", "o3", "o4", "chatgpt-")

def check_tool_calls(self, _model: str, model_info: dict) -> bool:
model_id = model_info.get("id", "").lower()
Expand Down Expand Up @@ -172,8 +173,9 @@ def get_provider(endpoint: str | None = None) -> APIProvider:
if upstream:
return dataclasses.replace(upstream, base_url=url)

# Unknown endpoint — return a generic provider with the given base URL
return APIProvider(name="custom", base_url=url, default_model="please-set-default-model-via-env")
# Unknown endpoint — return a generic provider using native SDK auth.
return APIProvider(name="custom", base_url=url, bearer_auth=False,
default_model="please-set-default-model-via-env")


# ---------------------------------------------------------------------------
Expand Down
35 changes: 35 additions & 0 deletions src/seclab_taskflow_agent/mcp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,41 @@ async def list_tools(self, *args: Any, **kwargs: Any) -> list[Any]:
namespaced_tools.append(tool_copy)
return namespaced_tools

async def list_tools_unfiltered(self) -> list[Any]:
"""List tools directly from the MCP session, namespace-prefixed.

Bypasses any tool_filter configured on the wrapped openai-agents
server (which would require ``run_context`` and ``agent`` arguments
that aren't available when listing tools outside the openai-agents
run loop -- e.g. when handing tools to a different SDK at build
time).

Prefixing is idempotent: if a tool's name already starts with this
wrapper's namespace (e.g. because the underlying session returned a
previously-namespaced object), the existing prefix is stripped
before re-applying so calling this method multiple times never
yields ``<ns><ns>name``.

Raises ``RuntimeError`` if the underlying server has no active
MCP session yet (caller should ensure the server is connected
before calling this).
"""
session = getattr(self._obj, "session", None)
if session is None:
raise RuntimeError(
f"MCPNamespaceWrap({self._obj!r}): underlying server has no "
"active MCP session; cannot list tools unfiltered"
)
result = await session.list_tools()
namespaced_tools: list[Any] = []
for tool in result.tools:
tool_copy = tool.copy() if hasattr(tool, "copy") else tool
# Idempotent: strip existing prefix before re-applying
base_name = tool_copy.name.removeprefix(self.namespace)
tool_copy.name = f"{self.namespace}{base_name}"
namespaced_tools.append(tool_copy)
return namespaced_tools

def confirm_tool(self, tool_name: str, args: list[Any]) -> bool:
"""Interactively prompt the user for tool-call confirmation.

Expand Down
4 changes: 2 additions & 2 deletions src/seclab_taskflow_agent/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator

# Valid API type values for model configuration.
ApiType = Literal["chat_completions", "responses"]
ApiType = Literal["chat_completions", "responses", "messages"]

# Valid backend names. Must stay in sync with ``sdk._KNOWN``.
BackendSdk = Literal["openai_agents", "copilot_sdk"]
BackendSdk = Literal["openai_agents", "copilot_sdk", "anthropic_sdk"]


# ---------------------------------------------------------------------------
Expand Down
17 changes: 10 additions & 7 deletions src/seclab_taskflow_agent/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,12 +126,12 @@ def _resolve_task_model(
model_dict: dict[str, str],
models_params: dict[str, dict[str, Any]],
default_api_type: str = "chat_completions",
) -> tuple[str, dict[str, Any], str, str | None, str | None]:
) -> tuple[str, dict[str, Any], str, str | None, str | None, str | None]:
"""Resolve the final model name, settings, and per-model overrides.

Returns:
A tuple of ``(model_id, model_settings, api_type, endpoint, token)``
where *endpoint* and *token* are ``None`` when not overridden.
A tuple of ``(model_id, model_settings, api_type, endpoint, token, backend)``
where *endpoint*, *token*, and *backend* are ``None`` when not overridden.

Raises:
ValueError: If task-level model_settings is not a dictionary.
Expand All @@ -141,6 +141,7 @@ def _resolve_task_model(
api_type: str = default_api_type
endpoint: str | None = None
token: str | None = None
backend: str | None = None

if logical_name in model_keys:
if logical_name in models_params:
Expand All @@ -151,6 +152,7 @@ def _resolve_task_model(
api_type = model_settings.pop("api_type", api_type)
endpoint = model_settings.pop("endpoint", None)
token = model_settings.pop("token", None)
backend = model_settings.pop("backend", None)

task_model_settings: dict[str, Any] | Any = task.model_settings or {}
if not isinstance(task_model_settings, dict):
Expand All @@ -161,9 +163,10 @@ def _resolve_task_model(
api_type = task_settings.pop("api_type", api_type)
endpoint = task_settings.pop("endpoint", endpoint)
token = task_settings.pop("token", token)
backend = task_settings.pop("backend", backend)

model_settings.update(task_settings)
return logical_name, model_settings, api_type, endpoint, token
return logical_name, model_settings, api_type, endpoint, token, backend


async def _build_prompts_to_run(
Expand Down Expand Up @@ -600,8 +603,8 @@ async def on_handoff_hook(context: RunContextWrapper[TContext], agent: Agent[TCo
if task.uses:
task = _merge_reusable_task(available_tools, task)

# Resolve model (name, settings, api_type, optional endpoint/token)
model, model_settings, task_api_type, task_endpoint, task_token = _resolve_task_model(
# Resolve model (name, settings, api_type, optional endpoint/token/backend)
model, model_settings, task_api_type, task_endpoint, task_token, task_backend = _resolve_task_model(
task, model_keys, model_dict, models_params, default_api_type=api_type,
)

Comment thread
anticomputer marked this conversation as resolved.
Expand Down Expand Up @@ -697,7 +700,7 @@ async def _deploy(ra: dict, pp: str) -> bool:
api_type=task_api_type,
endpoint=task_endpoint,
token=task_token,
backend=backend,
backend=task_backend or backend,
agent_hooks=TaskAgentHooks(on_handoff=on_handoff_hook),
)

Expand Down
21 changes: 14 additions & 7 deletions src/seclab_taskflow_agent/sdk/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

"""Backend factory for the agent runner.

Two backends are supported: ``openai_agents`` (default) and
``copilot_sdk`` (optional, requires ``pip install
seclab-taskflow-agent[copilot]``).
Three backends are supported: ``openai_agents`` (default), ``copilot_sdk``,
and ``anthropic_sdk``. All three are always available because per-task
backend selection means any SDK may be needed at runtime.
"""

from __future__ import annotations
Expand Down Expand Up @@ -33,7 +33,7 @@
)

_ENV_VAR = "SECLAB_TASKFLOW_BACKEND"
_KNOWN = ("openai_agents", "copilot_sdk")
_KNOWN = ("openai_agents", "copilot_sdk", "anthropic_sdk")
_BACKENDS: dict[str, AgentBackend] = {}


Expand All @@ -46,10 +46,16 @@ def get_backend(name: str) -> AgentBackend:
from .openai_agents.backend import OpenAIAgentsBackend

_BACKENDS[name] = OpenAIAgentsBackend()
else:
elif name == "copilot_sdk":
from .copilot_sdk.backend import CopilotSDKBackend

_BACKENDS[name] = CopilotSDKBackend()
elif name == "anthropic_sdk":
from .anthropic_sdk.backend import AnthropicSDKBackend

_BACKENDS[name] = AnthropicSDKBackend()
else:
raise ValueError(f"No backend implementation for {name!r}")
return _BACKENDS[name]


Expand All @@ -64,8 +70,9 @@ def resolve_backend_name(
``SECLAB_TASKFLOW_BACKEND`` env var > ``openai_agents``.

Backend selection is always deterministic — there is no auto-detection
based on endpoint URL. Use ``backend: copilot_sdk`` in model config
or set ``SECLAB_TASKFLOW_BACKEND=copilot_sdk`` to opt in.
based on endpoint URL. Use ``backend: copilot_sdk`` or ``backend:
anthropic_sdk`` in model config (or set
``SECLAB_TASKFLOW_BACKEND=<name>``) to opt in.

The *endpoint* parameter is accepted for forward compatibility but
is not used for backend selection.
Expand Down
4 changes: 4 additions & 0 deletions src/seclab_taskflow_agent/sdk/anthropic_sdk/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# SPDX-FileCopyrightText: GitHub, Inc.
# SPDX-License-Identifier: MIT

"""Anthropic SDK backend adapter."""
Loading