Summary
When an agent is wrapped in an App(root_agent=..., plugins=[...]) and you run evals via adk eval (or agents-cli eval run), the registered plugins do not fire. The eval CLI accesses agent_module.agent.root_agent directly and runs eval sessions against the bare agent, bypassing the App object and its plugin chain.
This means observability plugins like BigQueryAgentAnalyticsPlugin capture interactive (adk web / adk run) sessions but produce no rows for eval runs — leaving cost, latency, and trajectory telemetry blind during the development loop where evals run most often.
Reproduction
ADK version: 1.31.1
# app/agent.py
from google.adk.agents import LlmAgent
from google.adk.apps import App
from google.adk.plugins.bigquery_agent_analytics_plugin import (
BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig,
)
root_agent = LlmAgent(name="root_agent", model="gemini-2.5-flash", ...)
app = App(
root_agent=root_agent,
name="app",
plugins=[
BigQueryAgentAnalyticsPlugin(
project_id="your-project",
dataset_id="telemetry",
table_id="agent_events",
config=BigQueryLoggerConfig(log_session_metadata=True),
),
],
)
adk web → run a few turns interactively → events appear in telemetry.agent_events ✅
adk eval ./app path/to/case.evalset.json → eval runs complete successfully but no rows appear in telemetry.agent_events for the eval session ❌
Root cause
In google/adk/cli/cli_eval.py:
def _get_agent_module(agent_module_file_path: str):
file_path = os.path.join(agent_module_file_path, "__init__.py")
module_name = "agent"
return _import_from_path(module_name, file_path)
def get_root_agent(agent_module_file_path: str) -> Agent:
"""Returns root agent given the agent module."""
agent_module = _get_agent_module(agent_module_file_path)
root_agent = agent_module.agent.root_agent
return root_agent
The eval flow imports the agent module and reaches into agent_module.agent.root_agent. The App instance (and its plugins=[...] list) is never resolved or used, so plugin lifecycle hooks (before_agent_callback, on_event_callback, etc.) are never wired to the eval runner.
By contrast, adk web / adk run go through AdkWebServer which constructs sessions via the App, so plugins fire correctly.
Why this matters
- Cost monitoring during development is blind. Eval runs are typically the largest chunk of LLM cost during prompt iteration (we measured ~€6 per full 29-case run, mostly Pro tokens), but this cost is invisible to dashboards built on
BigQueryAgentAnalyticsPlugin or any other observability plugin.
- Eval-specific observability is the most useful kind. Knowing per-case latency, token breakdown, tool-trajectory drift across eval runs is exactly what you'd want when iterating on prompts. Today users have to fall back to Cloud Billing exports — much coarser.
- The
BigQueryAgentAnalyticsPlugin doc actively advertises eval analytics as a use case ("LLM-as-judge evals — structured data for evaluation pipelines"). The current behavior contradicts that.
Proposed fix
Make get_root_agent (and the surrounding eval flow) prefer the App object when it exists in the agent module, and run eval sessions through the App.runner (or equivalent) so plugins fire.
Sketch:
def get_app_or_root_agent(agent_module_file_path: str):
"""Returns (app, root_agent). Falls back to bare root_agent if no App is exported."""
agent_module = _get_agent_module(agent_module_file_path)
app = getattr(agent_module.agent, 'app', None)
if app is not None:
return app, app.root_agent
return None, agent_module.agent.root_agent
Then update the eval runner (evaluation_generator.py etc.) to use app for session creation when present, so plugin callbacks are invoked the same way adk web invokes them. The bare-root-agent path remains available for projects that don't define an App.
Happy to draft a PR if there's interest and the maintainers agree on the approach. Would also want feedback on whether there's a reason the current bypass is intentional (e.g., to avoid plugin side-effects polluting eval runs) — if so, an opt-in flag like adk eval --use-app-plugins would also resolve the gap without changing the default.
Workaround for now
For users hitting this:
- Estimate eval cost from Cloud Billing → Vertex AI line items in the eval window, less the interactive-session cost reported by your dashboard.
- Or: write a custom eval runner that uses
App.runner directly instead of going through adk eval.
Environment
google-adk 1.31.1
- Python 3.13
- ADK ContextCacheConfig, BigQueryAgentAnalyticsPlugin both registered on
App
Summary
When an agent is wrapped in an
App(root_agent=..., plugins=[...])and you run evals viaadk eval(oragents-cli eval run), the registered plugins do not fire. The eval CLI accessesagent_module.agent.root_agentdirectly and runs eval sessions against the bare agent, bypassing theAppobject and its plugin chain.This means observability plugins like
BigQueryAgentAnalyticsPlugincapture interactive (adk web/adk run) sessions but produce no rows for eval runs — leaving cost, latency, and trajectory telemetry blind during the development loop where evals run most often.Reproduction
ADK version:
1.31.1adk web→ run a few turns interactively → events appear intelemetry.agent_events✅adk eval ./app path/to/case.evalset.json→ eval runs complete successfully but no rows appear intelemetry.agent_eventsfor the eval session ❌Root cause
In
google/adk/cli/cli_eval.py:The eval flow imports the agent module and reaches into
agent_module.agent.root_agent. TheAppinstance (and itsplugins=[...]list) is never resolved or used, so plugin lifecycle hooks (before_agent_callback,on_event_callback, etc.) are never wired to the eval runner.By contrast,
adk web/adk rungo throughAdkWebServerwhich constructs sessions via theApp, so plugins fire correctly.Why this matters
BigQueryAgentAnalyticsPluginor any other observability plugin.BigQueryAgentAnalyticsPlugindoc actively advertises eval analytics as a use case ("LLM-as-judge evals — structured data for evaluation pipelines"). The current behavior contradicts that.Proposed fix
Make
get_root_agent(and the surrounding eval flow) prefer theAppobject when it exists in the agent module, and run eval sessions through theApp.runner(or equivalent) so plugins fire.Sketch:
Then update the eval runner (
evaluation_generator.pyetc.) to useappfor session creation when present, so plugin callbacks are invoked the same wayadk webinvokes them. The bare-root-agent path remains available for projects that don't define anApp.Happy to draft a PR if there's interest and the maintainers agree on the approach. Would also want feedback on whether there's a reason the current bypass is intentional (e.g., to avoid plugin side-effects polluting eval runs) — if so, an opt-in flag like
adk eval --use-app-pluginswould also resolve the gap without changing the default.Workaround for now
For users hitting this:
App.runnerdirectly instead of going throughadk eval.Environment
google-adk1.31.1App