# `cli_eval` bypasses `App.plugins`, breaking observability plugins (e.g., `BigQueryAgentAnalyticsPlugin`) during eval runs

## Summary

When an agent is wrapped in an `App(root_agent=..., plugins=[...])` and you run evals via `adk eval` (or `agents-cli eval run`), the registered plugins do not fire. The eval CLI accesses `agent_module.agent.root_agent` directly and runs eval sessions against the bare agent, bypassing the `App` object and its plugin chain.

This means observability plugins like `BigQueryAgentAnalyticsPlugin` capture *interactive* (`adk web` / `adk run`) sessions but produce no rows for eval runs — leaving cost, latency, and trajectory telemetry blind during the development loop where evals run most often.

## Reproduction

ADK version: `1.31.1`

```python
# app/agent.py
from google.adk.agents import LlmAgent
from google.adk.apps import App
from google.adk.plugins.bigquery_agent_analytics_plugin import (
    BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig,
)

root_agent = LlmAgent(name="root_agent", model="gemini-2.5-flash", ...)

app = App(
    root_agent=root_agent,
    name="app",
    plugins=[
        BigQueryAgentAnalyticsPlugin(
            project_id="your-project",
            dataset_id="telemetry",
            table_id="agent_events",
            config=BigQueryLoggerConfig(log_session_metadata=True),
        ),
    ],
)
```

1. `adk web` → run a few turns interactively → events appear in `telemetry.agent_events` ✅
2. `adk eval ./app path/to/case.evalset.json` → eval runs complete successfully but **no rows** appear in `telemetry.agent_events` for the eval session ❌

## Root cause

In `google/adk/cli/cli_eval.py`:

```python
def _get_agent_module(agent_module_file_path: str):
    file_path = os.path.join(agent_module_file_path, "__init__.py")
    module_name = "agent"
    return _import_from_path(module_name, file_path)


def get_root_agent(agent_module_file_path: str) -> Agent:
    """Returns root agent given the agent module."""
    agent_module = _get_agent_module(agent_module_file_path)
    root_agent = agent_module.agent.root_agent
    return root_agent
```

The eval flow imports the agent module and reaches into `agent_module.agent.root_agent`. The `App` instance (and its `plugins=[...]` list) is never resolved or used, so plugin lifecycle hooks (`before_agent_callback`, `on_event_callback`, etc.) are never wired to the eval runner.

By contrast, `adk web` / `adk run` go through `AdkWebServer` which constructs sessions via the `App`, so plugins fire correctly.

## Why this matters

- **Cost monitoring during development is blind.** Eval runs are typically the largest chunk of LLM cost during prompt iteration (we measured ~€6 per full 29-case run, mostly Pro tokens), but this cost is invisible to dashboards built on `BigQueryAgentAnalyticsPlugin` or any other observability plugin.
- **Eval-specific observability is the most useful kind.** Knowing per-case latency, token breakdown, tool-trajectory drift across eval runs is exactly what you'd want when iterating on prompts. Today users have to fall back to Cloud Billing exports — much coarser.
- The `BigQueryAgentAnalyticsPlugin` doc actively advertises eval analytics as a use case ("LLM-as-judge evals — structured data for evaluation pipelines"). The current behavior contradicts that.

## Proposed fix

Make `get_root_agent` (and the surrounding eval flow) prefer the `App` object when it exists in the agent module, and run eval sessions through the `App.runner` (or equivalent) so plugins fire.

Sketch:

```python
def get_app_or_root_agent(agent_module_file_path: str):
    """Returns (app, root_agent). Falls back to bare root_agent if no App is exported."""
    agent_module = _get_agent_module(agent_module_file_path)
    app = getattr(agent_module.agent, 'app', None)
    if app is not None:
        return app, app.root_agent
    return None, agent_module.agent.root_agent
```

Then update the eval runner (`evaluation_generator.py` etc.) to use `app` for session creation when present, so plugin callbacks are invoked the same way `adk web` invokes them. The bare-root-agent path remains available for projects that don't define an `App`.

Happy to draft a PR if there's interest and the maintainers agree on the approach. Would also want feedback on whether there's a reason the current bypass is intentional (e.g., to avoid plugin side-effects polluting eval runs) — if so, an opt-in flag like `adk eval --use-app-plugins` would also resolve the gap without changing the default.

## Workaround for now

For users hitting this:

- Estimate eval cost from Cloud Billing → Vertex AI line items in the eval window, less the interactive-session cost reported by your dashboard.
- Or: write a custom eval runner that uses `App.runner` directly instead of going through `adk eval`.

## Environment

- `google-adk` 1.31.1
- Python 3.13
- ADK ContextCacheConfig, BigQueryAgentAnalyticsPlugin both registered on `App`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# `cli_eval` bypasses `App.plugins`, breaking observability plugins (e.g., `BigQueryAgentAnalyticsPlugin`) during eval runs #5503

Summary

Reproduction

Root cause

Why this matters

Proposed fix

Workaround for now

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

# cli_eval bypasses App.plugins, breaking observability plugins (e.g., BigQueryAgentAnalyticsPlugin) during eval runs #5503

Description

Summary

Reproduction

Root cause

Why this matters

Proposed fix

Workaround for now

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

# `cli_eval` bypasses `App.plugins`, breaking observability plugins (e.g., `BigQueryAgentAnalyticsPlugin`) during eval runs #5503