diff --git a/.changeset/credential-aware-default-provider.md b/.changeset/credential-aware-default-provider.md deleted file mode 100644 index 4e8b99e..0000000 --- a/.changeset/credential-aware-default-provider.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -'@tangle-network/browser-agent-driver': minor ---- - -The default provider is now credential-aware instead of a hard `openai`. A bare run (no `--provider`/`--model`, no config-file provider) uses OpenAI when `OPENAI_API_KEY` is set — unchanged for existing users and CI — and otherwise falls back to an available provider (claude-code, which needs no key) rather than failing on a missing OpenAI key. An explicit provider in CLI flags or a config file is always honored, and the default model maps per-provider as before (e.g. gpt-5.4 → sonnet for claude-code). This removes the last place the no-flag path assumed OpenAI; the engine already supported openai/anthropic/google/claude-code/zai for both text and vision. diff --git a/.changeset/design-audit-content-fidelity.md b/.changeset/design-audit-content-fidelity.md deleted file mode 100644 index 45ae38c..0000000 --- a/.changeset/design-audit-content-fidelity.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -'@tangle-network/browser-agent-driver': patch ---- - -design-audit (reference-grounded): enforce content fidelity so a redesign never fabricates content the page lacks. On a content-sparse page grounded against a dense exemplar, the generator would invent factual content to fill the layout (e.g. a placeholder page gaining a fake "Recent Activity" feed with timestamps, invented status/RFC/registry data), and the pairwise direction-ranker rewarded that invented density as "richer" — so applied to a real app the audit could inject fabricated data into the UI. Now the generator may restyle/regroup/re-rank only the page's real content (the exemplar governs how it looks, never what content it has; a sparse page stays proportionally restrained), the ranker penalises invented content as unfaithful instead of rewarding it, and the apply prompt carries a defense-in-depth "do not invent content" guardrail. No provider coupling. diff --git a/.changeset/job-first-redesign-engine.md b/.changeset/job-first-redesign-engine.md deleted file mode 100644 index 22c69ba..0000000 --- a/.changeset/job-first-redesign-engine.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -'@tangle-network/browser-agent-driver': minor ---- - -design-audit (reference-grounded): make the redesign engine job-first instead of aesthetic-first. The old engine grounded every page in a world-class exemplar's visual DNA and judged on visual craft, so it regressed functional pages into generic brochures — a docs page lost its table-of-contents and dense reference content for two marketing cards and a hero; an aggregator dropped from 30 items to 9; a status dashboard shed services into spacious cards. The fix: - -- **Generator** (`reference/generate/prompt.ts`): persona reframed from art director to product designer. New hard rules in priority order — task-first (design for the page's users and the job in its intent) → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is a source of visual craft only, never a structural template. -- **Functional contract**: a per-page preservation block derived from the page's own measured DNA (navigation-affordance count, layout density, archetype) so "keep what works" is concrete and data-driven, not exhortation — and density is required only when the page is actually measured dense, so a genuinely sparse page is never forced to stay dense. -- **Ranker/judge** (`reference/judge/prompt.ts`): scores task fitness and functional preservation BEFORE visual craft; a polished direction that removes navigation or reduces density loses. "Fit to the reference" counts only as visual craft. - -Validated by re-running the regressed pages: docs now keeps its ToC + prev/next nav + dense code examples; HN keeps all 30 stories + nav; the status dashboard stays a dense service grid with real values. No provider coupling; flag-gated reference engine only. diff --git a/.changeset/refgen-reasoning-token-headroom.md b/.changeset/refgen-reasoning-token-headroom.md deleted file mode 100644 index 2ad95dc..0000000 --- a/.changeset/refgen-reasoning-token-headroom.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -'@tangle-network/browser-agent-driver': patch ---- - -design-audit (reference-grounded): make redesign generation work with reasoning models. The generator capped output at 2200 tokens, which a reasoning model (e.g. GLM-5.2, o-series) spends on its thinking before the answer — so the JSON direction came back empty or truncated and the audit fell back with a misleading "no JSON object found". Raise the per-direction budget to 8000 (non-reasoning models stop at the closing brace and never use the extra, so it's free for them), and report empty vs truncated vs non-JSON output distinctly so a budget/limit issue is diagnosable. No coupling to any one provider — the engine already runs on openai/anthropic/google/claude-code/zai. diff --git a/CHANGELOG.md b/CHANGELOG.md index c37e929..49f26da 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,25 @@ # @tangle-network/browser-agent-driver +## 0.35.0 + +### Minor Changes + +- [#122](https://github.com/tangle-network/browser-agent-driver/pull/122) [`b0f74a4`](https://github.com/tangle-network/browser-agent-driver/commit/b0f74a4f91a04517d988e95aa95ed0509bd2a26e) Thanks [@drewstone](https://github.com/drewstone)! - The default provider is now credential-aware instead of a hard `openai`. A bare run (no `--provider`/`--model`, no config-file provider) uses OpenAI when `OPENAI_API_KEY` is set — unchanged for existing users and CI — and otherwise falls back to an available provider (claude-code, which needs no key) rather than failing on a missing OpenAI key. An explicit provider in CLI flags or a config file is always honored, and the default model maps per-provider as before (e.g. gpt-5.4 → sonnet for claude-code). This removes the last place the no-flag path assumed OpenAI; the engine already supported openai/anthropic/google/claude-code/zai for both text and vision. + +- [#124](https://github.com/tangle-network/browser-agent-driver/pull/124) [`a2055b2`](https://github.com/tangle-network/browser-agent-driver/commit/a2055b2e7a7e68726ceb8b8c5cdaf92ca3215b06) Thanks [@drewstone](https://github.com/drewstone)! - design-audit (reference-grounded): make the redesign engine job-first instead of aesthetic-first. The old engine grounded every page in a world-class exemplar's visual DNA and judged on visual craft, so it regressed functional pages into generic brochures — a docs page lost its table-of-contents and dense reference content for two marketing cards and a hero; an aggregator dropped from 30 items to 9; a status dashboard shed services into spacious cards. The fix: + + - **Generator** (`reference/generate/prompt.ts`): persona reframed from art director to product designer. New hard rules in priority order — task-first (design for the page's users and the job in its intent) → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is a source of visual craft only, never a structural template. + - **Functional contract**: a per-page preservation block derived from the page's own measured DNA (navigation-affordance count, layout density, archetype) so "keep what works" is concrete and data-driven, not exhortation — and density is required only when the page is actually measured dense, so a genuinely sparse page is never forced to stay dense. + - **Ranker/judge** (`reference/judge/prompt.ts`): scores task fitness and functional preservation BEFORE visual craft; a polished direction that removes navigation or reduces density loses. "Fit to the reference" counts only as visual craft. + + Validated by re-running the regressed pages: docs now keeps its ToC + prev/next nav + dense code examples; HN keeps all 30 stories + nav; the status dashboard stays a dense service grid with real values. No provider coupling; flag-gated reference engine only. + +### Patch Changes + +- [#123](https://github.com/tangle-network/browser-agent-driver/pull/123) [`20942c2`](https://github.com/tangle-network/browser-agent-driver/commit/20942c2a4160d876537cbde3ec72f5f4559cb703) Thanks [@drewstone](https://github.com/drewstone)! - design-audit (reference-grounded): enforce content fidelity so a redesign never fabricates content the page lacks. On a content-sparse page grounded against a dense exemplar, the generator would invent factual content to fill the layout (e.g. a placeholder page gaining a fake "Recent Activity" feed with timestamps, invented status/RFC/registry data), and the pairwise direction-ranker rewarded that invented density as "richer" — so applied to a real app the audit could inject fabricated data into the UI. Now the generator may restyle/regroup/re-rank only the page's real content (the exemplar governs how it looks, never what content it has; a sparse page stays proportionally restrained), the ranker penalises invented content as unfaithful instead of rewarding it, and the apply prompt carries a defense-in-depth "do not invent content" guardrail. No provider coupling. + +- [#120](https://github.com/tangle-network/browser-agent-driver/pull/120) [`f11b899`](https://github.com/tangle-network/browser-agent-driver/commit/f11b89971cccfcc4c083c0fe958d918caf030568) Thanks [@drewstone](https://github.com/drewstone)! - design-audit (reference-grounded): make redesign generation work with reasoning models. The generator capped output at 2200 tokens, which a reasoning model (e.g. GLM-5.2, o-series) spends on its thinking before the answer — so the JSON direction came back empty or truncated and the audit fell back with a misleading "no JSON object found". Raise the per-direction budget to 8000 (non-reasoning models stop at the closing brace and never use the extra, so it's free for them), and report empty vs truncated vs non-JSON output distinctly so a budget/limit issue is diagnosable. No coupling to any one provider — the engine already runs on openai/anthropic/google/claude-code/zai. + ## 0.34.0 ### Minor Changes diff --git a/package.json b/package.json index cf4f62f..77f8921 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@tangle-network/browser-agent-driver", - "version": "0.34.0", + "version": "0.35.0", "description": "LLM-driven browser agent and bad CLI for UI automation, testing, and evaluation", "publishConfig": { "access": "public"