Skip to content

feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85

Open
jedzill4 wants to merge 14 commits into
release/v2.0.0from
feat/coroasr-integration
Open

feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85
jedzill4 wants to merge 14 commits into
release/v2.0.0from
feat/coroasr-integration

Conversation

@jedzill4

@jedzill4 jedzill4 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the WhisperLiveKit WebSocket ASR integration with coro, an OpenAI-compatible ASR + speaker-diarization server. aymurai now transcribes via the official openai SDK over SSE (stream=True), and coro runs as an isolated host process via uv tool/uvx (no dependency conflicts). The public API, DB schema, ASRParagraph/ASRDocument, and the anonymizer consumer are unchanged.

Changes

  • Client: aymurai/audio/asr_client.py rewritten (186→62 lines) — transcribe_audio_bytes(payload, filename, content_type) streams to coro's /v1/audio/transcriptions and parses the transcript.text.done frame into list[CoroSegment]. Maps openai APIError/APIConnectionErrorRuntimeError.
  • Schemas: deleted aymurai/api/meta/asr/websocket.py (all WLKMessage*); added aymurai/api/meta/asr/coro.py with the CoroSegment boundary model + _parse_hhmmss. Folded TranscriptionItem into ASRParagraph.
  • Router/settings: get_transcribe_base_url + CoroSegment → ASRParagraph mapping; TRANSCRIBE_WS_URITRANSCRIBE_BASE_URL (+ TRANSCRIBE_API_KEY).
  • Deps: added openai>=1.65.2 (capped <2 by marker-pdf); removed librosa (only the old client used it).
  • Infra/docs: removed the whisperlivekit docker-compose service; added scripts/run-coro.sh (cpu/gpu, pins uvx --python 3.12) and scripts/smoke_coro.py; README section.

Test Plan

  • ruff format --check + ruff check clean on changed files
  • pyrefly check — 0 errors
  • pytest test/audio/test_asr_client.py test/api/endpoints/routers/asr — 11 passed (client error-mapping + endpoint + validation)
  • anonymizer audio consumer test passes (ASRDocument/ASRParagraph unchanged)
  • End-to-end smoke test (scripts/smoke_coro.py cpu): started coro (parakeet + NeMo diarization), transcribed a Spanish sample over SSE via both the raw OpenAI SDK and aymurai's client — identical 5-segment, diarized output, HTTP 200
  • Integration test test/integration/test_asr_pipeline.py requires a live coro server (set TRANSCRIBE_BASE_URL); excluded from normal runs

Notes

  • coro runs separately: ./scripts/run-coro.sh [cpu|gpu], then set TRANSCRIBE_BASE_URL=http://localhost:8000/v1. Requires host ffmpeg.
  • docker-compose.yml also includes incidental YAML formatting normalization already present on the branch.

Summary by Sourcery

Replace the WebSocket-based WhisperLiveKit ASR integration with a coro-based SSE/OpenAI SDK transcription flow while preserving the public ASR API and document schema.

New Features:

  • Add coro-based ASR client that streams audio to an OpenAI-compatible /v1/audio/transcriptions endpoint over SSE and returns speaker-attributed segments.
  • Introduce CoroSegment metadata model and helper time parsing to represent coro transcription segments.
  • Add scripts to run the coro ASR server in CPU or GPU mode and a smoke test script to validate end-to-end SSE transcription via both the OpenAI SDK and the internal client.

Enhancements:

  • Update ASR transcription endpoint to consume coro segments, map them into ASRParagraphs, and use a base URL configuration instead of WebSocket URIs.
  • Inline the ASRParagraph schema instead of inheriting from TranscriptionItem and add robust HH:MM:SS/ISO8601 time parsing for start/end fields.
  • Simplify error handling by mapping transcription RuntimeErrors directly to UpstreamServiceError at the API layer.

Build:

  • Add openai as a runtime dependency and remove unused librosa from the project dependencies.

Deployment:

  • Document and configure the external coro ASR server, including environment variables for TRANSCRIBE_BASE_URL and spill directory, and remove the legacy whisperlivekit service from docker-compose.

Documentation:

  • Extend the README with instructions for running the coro ASR server, configuring the API connection, and running the end-to-end smoke test.

Tests:

  • Add unit tests for the coro-based ASR client behavior and error handling, and update ASR endpoint and integration tests to use TRANSCRIBE_BASE_URL instead of WebSocket URIs.
  • Add a smoke test script to verify the full coro ASR pipeline via SSE and the internal client.

Chores:

  • Normalize docker-compose YAML formatting for commands and healthchecks.

@sourcery-ai

sourcery-ai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Replace the WhisperLiveKit WebSocket-based ASR integration with a coro-based OpenAI SSE transcription flow, introduce a CoroSegment schema and ASRParagraph refactor, wire the HTTP/SSE client into the ASR router and settings, adjust tests and dependencies, and add scripts/docs for running and smoke-testing coro.

Sequence diagram for coro-based SSE transcription flow

sequenceDiagram
    participant Client
    participant ASRRouter as ASR_transcribe_endpoint
    participant ASRService as _transcribe_audio_bytes_with_error_handling
    participant ASRClient as transcribe_audio_bytes
    participant OpenAIClient as AsyncOpenAI
    participant Coro as coro_server

    Client->>ASRRouter: POST /asr/transcribe (UploadFile)
    ASRRouter->>ASRService: _transcribe_audio_bytes_with_error_handling(data, filename, content_type)
    ASRService->>ASRClient: transcribe_audio_bytes(payload, filename, content_type)
    ASRClient->>OpenAIClient: AsyncOpenAI(base_url=TRANSCRIBE_BASE_URL)
    OpenAIClient->>Coro: audio.transcriptions.create(stream=True)
    loop SSE stream
        Coro-->>OpenAIClient: transcript.text.delta
    end
    Coro-->>OpenAIClient: transcript.text.done
    OpenAIClient-->>ASRClient: done event
    ASRClient->>ASRClient: json.loads(event.text) -> list[CoroSegment]
    ASRClient-->>ASRService: list[CoroSegment]
    ASRService->>ASRService: map CoroSegment -> ASRParagraph
    ASRService-->>ASRRouter: list[ASRParagraph]
    ASRRouter-->>Client: ASRDocument(document=list[ASRParagraph])
Loading

File-Level Changes

Change Details Files
Replace WebSocket WhisperLiveKit client with coro/OpenAI SSE-based async transcription client.
  • Remove librosa/websockets-based streaming logic and WebSocket message parsing/status handling.
  • Introduce AsyncOpenAI-based client configured via TRANSCRIBE_BASE_URL/TRANSCRIBE_API_KEY and DONE_EVENT_TYPE constant.
  • Implement streaming transcription to /v1/audio/transcriptions with stream=True and parse transcript.text.done into CoroSegment instances.
  • Map OpenAI APIError/APIConnectionError to RuntimeError and raise when no done frame is received.
aymurai/audio/asr_client.py
Introduce coro-specific ASR schema and shared time parsing, and simplify ASRParagraph.
  • Add CoroSegment Pydantic model representing speaker-attributed segments from coro done frames, ignoring extra fields.
  • Implement _parse_hhmmss to support HH:MM:SS, ISO 8601 PT#H#M#S, and numeric seconds as timedeltas.
  • Refactor ASRParagraph into a standalone Pydantic model with speaker_no, optional speaker_name, start/end timedeltas parsed via _parse_hhmmss, and computed paragraph_id.
  • Remove obsolete WhisperLiveKit WebSocket message schema module.
aymurai/api/meta/asr/coro.py
aymurai/meta/api_interfaces.py
aymurai/api/meta/asr/websocket.py
Wire coro-based transcription into the ASR router and settings, replacing WebSocket configuration.
  • Replace get_transcribe_ws_uri with get_transcribe_base_url and depend on TRANSCRIBE_BASE_URL in the transcribe endpoint.
  • Change _transcribe_audio_bytes_with_error_handling to call the new transcribe_audio_bytes(payload, filename, content_type) and map RuntimeError directly to UpstreamServiceError.
  • Map CoroSegment list into ASRParagraph list, converting speaker string to int and passing through start/end/text.
  • Update tests and fixtures to use TRANSCRIBE_BASE_URL and CoroSegment-based mocking instead of WLKMessageStatus/lines.
  • Update integration test skip condition to check TRANSCRIBE_BASE_URL instead of TRANSCRIBE_WS_URI.
  • Add TRANSCRIBE_BASE_URL and TRANSCRIBE_API_KEY settings and remove TRANSCRIBE_WS_URI.
aymurai/api/endpoints/routers/asr/transcribe.py
test/api/endpoints/routers/asr/test_transcribe.py
test/api/endpoints/routers/asr/conftest.py
test/integration/test_asr_pipeline.py
aymurai/settings.py
Add smoke-testing and runtime tooling for coro and update documentation.
  • Add run-coro.sh script to start coro via uvx with configurable cpu/gpu mode, transcript spill dir, and port.
  • Add smoke_coro.py script that starts coro, waits for /health, and transcribes a sample via both the OpenAI SDK and the aymurai client.
  • Extend README with instructions for running coro, configuring TRANSCRIBE_BASE_URL, and running the smoke test.
scripts/run-coro.sh
scripts/smoke_coro.py
README.md
Add unit tests for the new ASR client behavior and adjust dependencies/infra.
  • Introduce test/audio/test_asr_client.py to cover happy-path done-frame parsing, missing done frame, missing base URL, and APIConnectionError mapping to RuntimeError.
  • Adjust pyproject.toml dependencies by removing librosa and adding openai>=1.65.2 (capped by marker-pdf).
  • Apply minor docker-compose.yml YAML normalization changes (array formatting and multiline command wrapping).
test/audio/test_asr_client.py
pyproject.toml
docker-compose.yml
uv.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In aymurai/audio/asr_client.py, AsyncOpenAI is instantiated on every transcribe_audio_bytes call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
  • The transcribe router depends on get_transcribe_base_url but then ignores the base_url argument and has transcribe_audio_bytes re-read and re-validate settings.TRANSCRIBE_BASE_URL; it would be cleaner either to pass the resolved base_url (and potentially api_key) into the client, or to drop the unused dependency to avoid duplicated configuration checks.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `aymurai/audio/asr_client.py`, `AsyncOpenAI` is instantiated on every `transcribe_audio_bytes` call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
- The `transcribe` router depends on `get_transcribe_base_url` but then ignores the `base_url` argument and has `transcribe_audio_bytes` re-read and re-validate `settings.TRANSCRIBE_BASE_URL`; it would be cleaner either to pass the resolved `base_url` (and potentially `api_key`) into the client, or to drop the unused dependency to avoid duplicated configuration checks.

## Individual Comments

### Comment 1
<location path="aymurai/api/endpoints/routers/asr/transcribe.py" line_range="91" />
<code_context>
     file: UploadFile,
     use_cache: bool = True,
-    ws_uri: str = Depends(get_transcribe_ws_uri),
+    base_url: str = Depends(get_transcribe_base_url),
     session: Session = Depends(get_session),
 ) -> ASRDocument:
</code_context>
<issue_to_address>
**suggestion:** The injected `base_url` dependency isn’t used, so configuration can’t be overridden per-request.

The endpoint now depends on `get_transcribe_base_url`, but `base_url` isn’t used and `transcribe_audio_bytes` still reads `settings.TRANSCRIBE_BASE_URL` directly. This prevents per-request overrides and leaves `base_url` as a dead parameter. Either pass `base_url` into `transcribe_audio_bytes` (and stop reading from settings there) or remove this dependency to avoid confusion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

file: UploadFile,
use_cache: bool = True,
ws_uri: str = Depends(get_transcribe_ws_uri),
base_url: str = Depends(get_transcribe_base_url),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The injected base_url dependency isn’t used, so configuration can’t be overridden per-request.

The endpoint now depends on get_transcribe_base_url, but base_url isn’t used and transcribe_audio_bytes still reads settings.TRANSCRIBE_BASE_URL directly. This prevents per-request overrides and leaves base_url as a dead parameter. Either pass base_url into transcribe_audio_bytes (and stop reading from settings there) or remove this dependency to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant