Skip to content

fix(harness): complete credential flow for non-Bedrock model providers#926

Open
aidandaly24 wants to merge 1 commit intopreviewfrom
feat/harness-credential-flow-fixes
Open

fix(harness): complete credential flow for non-Bedrock model providers#926
aidandaly24 wants to merge 1 commit intopreviewfrom
feat/harness-credential-flow-fixes

Conversation

@aidandaly24
Copy link
Copy Markdown
Contributor

Description

The harness API key credential flow had multiple gaps blocking end-to-end deploy+invoke for OpenAI and Gemini harnesses. This PR fixes the credential lifecycle end-to-end: TUI/CLI collection, schema validation, deploy-time resolution, and the vended CDK's IAM policy wiring.

What was broken

  • Invokes 401'd with AccessDeniedException — the vended CDK passed apiKeyArn to AgentCoreHarnessRole, but our new flow stores the credential by name (apiKeyCredential). The role never got the bedrock-agentcore:GetResourceApiKey policy.
  • Cross-provider credential contamination — two harnesses (OpenAI + Gemini) with coincidentally matching keys would reuse the same credential, wiring the wrong provider's ARN into the other provider's model config.
  • Silent failures — skip-on-API-key for non-Bedrock providers produced a broken harness; dangling credential references deferred their failure to a confusing deploy-time error; missing .env.local entries silently fell through to CDK synth; Secrets Manager ARNs in apiKeyArn were rejected only at Zod time with no case/whitespace tolerance.
  • Multi-write without rollback — primitive wrote project spec, harness spec, and .env.local in an order that could leave a harness referencing a non-existent credential on partial failure.
  • CLI flag confusion — agent path silently accepted and ignored --api-key-arn; both --api-key and --api-key-arn could be set at once with no error.

What changed

  • Vended CDK (src/assets/cdk/bin/cdk.ts) resolves model.apiKeyCredential to the token-vault provider ARN from deployed state per-target before passing to AgentCoreStack.
  • CredentialPrimitive.resolveCredentialStrategy filters reuse candidates by authorizerType === 'ApiKeyCredentialProvider' AND cred.name.endsWith(modelProvider), preventing cross-provider contamination.
  • New preflight validator (preflight.ts#validateHarnessCredentialReferences) rejects dangling apiKeyCredential references and OAuth credentials used as API key credentials, before pre-deploy-identity runs.
  • Pre-deploy identity escalates skipped credentials that harnesses reference (clear error naming the missing env var), and tracks newly-created token-vault providers to roll them back when the subsequent CDK deploy fails.
  • HarnessPrimitive.add() rejects non-Bedrock provider without a credential source, rejects both --api-key and --api-key-arn set together, and writes in rollback-safe order: .env.localharness.jsonagentcore.json (the commit point).
  • Schema trims apiKeyArn, enforces .min(1) on apiKeyCredential, rejects Secrets Manager ARNs case-insensitively, and fails non-Bedrock providers that ship with neither credential field.
  • TUI removes the skip option from ApiKeySecretInput when rendered in the harness wizard; CLI guards surface --api-key-arn on the agent path and both-flags combinations at validate time.
  • Refactor — extracted computeCredentialName from schema-mapper.ts into credential-utils.ts so the agent and harness flows share the same naming.

Related Issue

Closes #

Type of Change

  • Bug fix

Testing

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint

Automated coverage

  • 85 schema tests — empty-string mutex bypass, whitespace ARN, mixed-case Secrets Manager rejection, missing credential on non-Bedrock provider.
  • 6 preflight validator tests — no harnesses, no apiKeyCredential, dangling reference, wrong authorizerType, multi-harness error collection, valid reference.
  • 4 cross-provider isolation tests in resolve-credential-strategy.test.ts.
  • 5 HarnessPrimitive tests covering the primary apiKey path, reuse path, BYO apiKeyCredentialArn, and dual-flag rejection.
  • 6 end-to-end integration tests using real ConfigIO against tmpdir.
  • Source-level wiring guard (useCreateFlow.wiring.test.ts) preventing regression of field drops between TUI and primitive layers.

End-to-end AWS verification (account 603141041947)

  • OpenAI harness (us-east-1): create → deploy → invoke "Say hello in one word" → returned "Hello". CDK IAM fix verified.
  • Gemini harness: confirmed end-to-end in real deploys.
  • M4 (dangling credential): fails at preflight with exact remediation text.
  • M5 (missing env var): fails before CDK synth with exact error text.
  • M3 (Secrets Manager ARN) and M7 (both-flags mutex): agentcore create produces exact error.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The harness API key flow had multiple gaps blocking end-to-end deploy+invoke
for OpenAI and Gemini harnesses. This change fixes the credential lifecycle
end-to-end: TUI/CLI collection, schema validation, deploy-time resolution,
and the vended CDK's IAM policy wiring.

Highlights
- Vended CDK (src/assets/cdk/bin/cdk.ts) now resolves model.apiKeyCredential
  to the token-vault provider ARN from deployed state, so AgentCoreHarnessRole
  receives the required bedrock-agentcore:GetResourceApiKey policy. Without
  this, invokes always failed with AccessDeniedException.
- Cross-provider credential dedup filters candidates by authorizerType and
  provider name so OpenAI and Gemini credentials never contaminate each
  other at deploy time.
- Preflight validator rejects dangling apiKeyCredential references and
  OAuth credentials used as API key credentials, surfacing the error before
  CDK synth.
- Pre-deploy identity escalates skipped credentials that harnesses
  reference and rolls back newly-created token-vault providers when the
  subsequent CDK deploy fails.
- Harness primitive writes in rollback-safe order (.env.local then
  harness.json then agentcore.json as commit point) and rejects skip-on-
  API-key for non-Bedrock providers, both in the TUI wizard and primitive
  add().
- Schema now trims apiKeyArn, requires min length on apiKeyCredential,
  rejects Secrets Manager ARNs case-insensitively, and fails non-Bedrock
  providers that ship with neither credential field set.
- CLI guards: agent-path rejects --api-key-arn; harness path rejects both
  --api-key and --api-key-arn; Secrets Manager ARN rejected at validate
  time with a clear remediation message.
- Extracted computeCredentialName to credential-utils so schema-mapper and
  CredentialPrimitive share one implementation.

Testing
- 85 schema tests, 6 preflight validator tests, 4 cross-provider isolation
  tests, 5 HarnessPrimitive tests including dedup paths, 6 end-to-end
  integration tests against tmpdir with real ConfigIO, and a source-level
  wiring guard for useCreateFlow.
- Verified end-to-end in AWS: OpenAI (us-east-1) and Gemini harnesses
  deploy, invoke, and return responses. M3/M4/M5/M7 error paths produce
  the expected messages.
Comment thread src/assets/cdk/bin/cdk.ts
// Harness paths in agentcore.json are relative to the project root (parent of agentcore/).
const projectRoot = path.resolve(configRoot, '..');
const harnessConfigs: {
interface HarnessRawConfig {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why per-target credential resolution in cdk.ts

AgentCoreHarnessRole scopes its bedrock-agentcore:GetResourceApiKey policy to the specific ARN passed in via harness.apiKeyArn. Since our flow stores the credential by name (apiKeyCredential), we have to resolve that name → ARN from deployed state before CDK synth so the IAM statement is concrete.

That's why the harness config is built in two steps:

  • harnessRawConfigs — read once from each harness.json on disk (pre-loop)
  • harnessConfigs — resolved per-target inside the for (const target of targets) loop, using that target's credentials map from deployed state

Alternative we could have taken

AgentCoreRuntime skips per-credential resolution entirely and grants a wildcard policy via grantCredentialAccess():

actions: ['bedrock-agentcore:GetResourceApiKey', ...],
resources: [
  `arn:aws:bedrock-agentcore:*:${account}:token-vault/*`,
  `arn:aws:bedrock-agentcore:*:${account}:apikeycredentialprovider/*`,
],

We could change AgentCoreHarnessRole to do the same. That would eliminate the resolution block in cdk.ts entirely — but it relaxes the harness role's IAM scoping from "this one credential" to "any credential in this account." Current approach preserves the tighter scoping the construct already had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant