Skip to content

[Live] run_live() silently terminates on max_llm_calls exhaustion β€” no event, no exception, no resumption pathΒ #5494

@amitanshupanigrahi2704

Description

@amitanshupanigrahi2704

πŸ”΄ Required Information

Describe the Bug:

When run_live() exhausts the max_llm_calls limit (e.g., 25), the async generator completes silently β€” no exception is raised, no session_ended event is yielded, and no indication is given to the application about why the session stopped producing events. The session becomes a dead-end: the model stops responding, but the WebSocket to Gemini may still be open, and the LiveRequestQueue continues accepting audio.

This is fundamentally different from a WebSocket timeout (code 1000/1011), where session resumption handles can reconnect the session. With max_llm_calls exhaustion:

  1. No event is emitted β€” the async for event in runner.run_live(...) loop simply ends.
  2. No exception is raised β€” unlike connection drops, there is no APIError to catch.
  3. Session resumption does not apply β€” the connection didn't drop; the ADK's internal call counter simply reached the limit.
  4. The session state is intact but the session itself cannot be continued.

The only workaround we've found is to detect that run_live() has ended, notify the frontend via a custom session_ended WebSocket message, and have the frontend create an entirely new session (new session_id, re-initialise state).

Steps to Reproduce:

  1. Configure a live agent with tools that trigger multiple LLM calls per user turn (e.g., a data retrieval tool that requires schema lookup β†’ query β†’ retry β†’ response).
  2. Set RunConfig(max_llm_calls=25, streaming_mode=StreamingMode.BIDI, ...).
  3. Start a run_live() session and interact normally.
  4. After several multi-tool turns, the 25-call limit is reached.
  5. run_live() generator completes β€” no more events are yielded.
  6. The user's next audio input gets no response. The session is dead.

Expected Behavior:

At minimum, one of the following:

  1. Yield a terminal event when max_llm_calls is exhausted β€” e.g., an Event with a session_ended or max_llm_calls_exhausted field, so the application can distinguish this from a normal generator completion.
  2. Allow the session to be continued by calling run_live() again with the same session, resetting the call counter (similar to how session resumption works for WebSocket timeouts).
  3. Raise a specific exception (e.g., MaxLlmCallsExhaustedError) so the application can handle it distinctly from connection errors.

Observed Behavior:

The async for event in runner.run_live(...) loop silently ends. No terminal event, no exception. The application has no way to distinguish "max_llm_calls exhausted" from "model chose not to respond" or "generator completed normally after a clean session."

Our Current Workaround:

We detect that the generator completed and send a custom WebSocket message to the frontend:

async for event in runner.run_live(
    user_id=user_id,
    session_id=session_id,
    live_request_queue=live_request_queue,
    run_config=run_config,
):
    # ... handle events ...

# Generator completed β€” no way to know WHY.
# We assume max_llm_calls exhaustion and tell the frontend
# to create a new session.
await websocket.send_text(json.dumps({
    "type": "session_ended",
    "message": "Session ended. Please start a new session.",
}))

The frontend then creates a brand-new session with a new session_id and re-connects. This works, but:

  • All conversation context is lost (unless we manually copy session.state to the new session).
  • The user experiences an interruption β€” the voice session drops and restarts.
  • We can't distinguish max_llm_calls exhaustion from other generator-completion scenarios.

Questions:

  1. Is creating a new session the only workaround when max_llm_calls is exhausted?
  2. Can run_live() be called again on the same session (same session_id) after max_llm_calls exhaustion to reset the counter and continue the conversation?
  3. Would the team consider yielding a terminal event or raising a specific exception when the call limit is hit?

Environment Details:

  • ADK Library Version: 1.29.0
  • Desktop OS: Linux (Cloud Run) / Windows (local dev)
  • Python Version: 3.11

Model Information:

  • Are you using LiteLLM: No
  • Which model: gemini-live-2.5-flash-native-audio (Vertex AI)

Related Issues:

How often has this issue occurred?:

Always (100%) β€” deterministic once the call counter reaches the limit.


Metadata

Metadata

Labels

live[Component] This issue is related to live, voice and video chatrequest clarification[Status] The maintainer need clarification or more information from the author

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions