feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85
feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85jedzill4 wants to merge 14 commits into
Conversation
ASR now runs as a host process via scripts/run-coro.sh; .env points at TRANSCRIBE_BASE_URL. Includes pre-existing compose YAML formatting normalization already present in the working tree on this branch.
Reviewer's GuideReplace the WhisperLiveKit WebSocket-based ASR integration with a coro-based OpenAI SSE transcription flow, introduce a CoroSegment schema and ASRParagraph refactor, wire the HTTP/SSE client into the ASR router and settings, adjust tests and dependencies, and add scripts/docs for running and smoke-testing coro. Sequence diagram for coro-based SSE transcription flowsequenceDiagram
participant Client
participant ASRRouter as ASR_transcribe_endpoint
participant ASRService as _transcribe_audio_bytes_with_error_handling
participant ASRClient as transcribe_audio_bytes
participant OpenAIClient as AsyncOpenAI
participant Coro as coro_server
Client->>ASRRouter: POST /asr/transcribe (UploadFile)
ASRRouter->>ASRService: _transcribe_audio_bytes_with_error_handling(data, filename, content_type)
ASRService->>ASRClient: transcribe_audio_bytes(payload, filename, content_type)
ASRClient->>OpenAIClient: AsyncOpenAI(base_url=TRANSCRIBE_BASE_URL)
OpenAIClient->>Coro: audio.transcriptions.create(stream=True)
loop SSE stream
Coro-->>OpenAIClient: transcript.text.delta
end
Coro-->>OpenAIClient: transcript.text.done
OpenAIClient-->>ASRClient: done event
ASRClient->>ASRClient: json.loads(event.text) -> list[CoroSegment]
ASRClient-->>ASRService: list[CoroSegment]
ASRService->>ASRService: map CoroSegment -> ASRParagraph
ASRService-->>ASRRouter: list[ASRParagraph]
ASRRouter-->>Client: ASRDocument(document=list[ASRParagraph])
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
aymurai/audio/asr_client.py,AsyncOpenAIis instantiated on everytranscribe_audio_bytescall; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios. - The
transcriberouter depends onget_transcribe_base_urlbut then ignores thebase_urlargument and hastranscribe_audio_bytesre-read and re-validatesettings.TRANSCRIBE_BASE_URL; it would be cleaner either to pass the resolvedbase_url(and potentiallyapi_key) into the client, or to drop the unused dependency to avoid duplicated configuration checks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `aymurai/audio/asr_client.py`, `AsyncOpenAI` is instantiated on every `transcribe_audio_bytes` call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
- The `transcribe` router depends on `get_transcribe_base_url` but then ignores the `base_url` argument and has `transcribe_audio_bytes` re-read and re-validate `settings.TRANSCRIBE_BASE_URL`; it would be cleaner either to pass the resolved `base_url` (and potentially `api_key`) into the client, or to drop the unused dependency to avoid duplicated configuration checks.
## Individual Comments
### Comment 1
<location path="aymurai/api/endpoints/routers/asr/transcribe.py" line_range="91" />
<code_context>
file: UploadFile,
use_cache: bool = True,
- ws_uri: str = Depends(get_transcribe_ws_uri),
+ base_url: str = Depends(get_transcribe_base_url),
session: Session = Depends(get_session),
) -> ASRDocument:
</code_context>
<issue_to_address>
**suggestion:** The injected `base_url` dependency isn’t used, so configuration can’t be overridden per-request.
The endpoint now depends on `get_transcribe_base_url`, but `base_url` isn’t used and `transcribe_audio_bytes` still reads `settings.TRANSCRIBE_BASE_URL` directly. This prevents per-request overrides and leaves `base_url` as a dead parameter. Either pass `base_url` into `transcribe_audio_bytes` (and stop reading from settings there) or remove this dependency to avoid confusion.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| file: UploadFile, | ||
| use_cache: bool = True, | ||
| ws_uri: str = Depends(get_transcribe_ws_uri), | ||
| base_url: str = Depends(get_transcribe_base_url), |
There was a problem hiding this comment.
suggestion: The injected base_url dependency isn’t used, so configuration can’t be overridden per-request.
The endpoint now depends on get_transcribe_base_url, but base_url isn’t used and transcribe_audio_bytes still reads settings.TRANSCRIBE_BASE_URL directly. This prevents per-request overrides and leaves base_url as a dead parameter. Either pass base_url into transcribe_audio_bytes (and stop reading from settings there) or remove this dependency to avoid confusion.
Summary
Replaces the WhisperLiveKit WebSocket ASR integration with coro, an OpenAI-compatible ASR + speaker-diarization server. aymurai now transcribes via the official
openaiSDK over SSE (stream=True), and coro runs as an isolated host process viauv tool/uvx(no dependency conflicts). The public API, DB schema,ASRParagraph/ASRDocument, and the anonymizer consumer are unchanged.Changes
aymurai/audio/asr_client.pyrewritten (186→62 lines) —transcribe_audio_bytes(payload, filename, content_type)streams to coro's/v1/audio/transcriptionsand parses thetranscript.text.doneframe intolist[CoroSegment]. MapsopenaiAPIError/APIConnectionError→RuntimeError.aymurai/api/meta/asr/websocket.py(allWLKMessage*); addedaymurai/api/meta/asr/coro.pywith theCoroSegmentboundary model +_parse_hhmmss. FoldedTranscriptionItemintoASRParagraph.get_transcribe_base_url+CoroSegment → ASRParagraphmapping;TRANSCRIBE_WS_URI→TRANSCRIBE_BASE_URL(+TRANSCRIBE_API_KEY).openai>=1.65.2(capped<2bymarker-pdf); removedlibrosa(only the old client used it).whisperlivekitdocker-compose service; addedscripts/run-coro.sh(cpu/gpu, pinsuvx --python 3.12) andscripts/smoke_coro.py; README section.Test Plan
ruff format --check+ruff checkclean on changed filespyrefly check— 0 errorspytest test/audio/test_asr_client.py test/api/endpoints/routers/asr— 11 passed (client error-mapping + endpoint + validation)ASRDocument/ASRParagraphunchanged)scripts/smoke_coro.py cpu): started coro (parakeet + NeMo diarization), transcribed a Spanish sample over SSE via both the raw OpenAI SDK and aymurai's client — identical 5-segment, diarized output, HTTP 200test/integration/test_asr_pipeline.pyrequires a live coro server (setTRANSCRIBE_BASE_URL); excluded from normal runsNotes
./scripts/run-coro.sh [cpu|gpu], then setTRANSCRIBE_BASE_URL=http://localhost:8000/v1. Requires hostffmpeg.docker-compose.ymlalso includes incidental YAML formatting normalization already present on the branch.Summary by Sourcery
Replace the WebSocket-based WhisperLiveKit ASR integration with a coro-based SSE/OpenAI SDK transcription flow while preserving the public ASR API and document schema.
New Features:
Enhancements:
Build:
Deployment:
Documentation:
Tests:
Chores: