Skip to content

Completely rewrite add_messages_streaming#277

Open
gvanrossum wants to merge 26 commits into
microsoft:mainfrom
gvanrossum:pipeline
Open

Completely rewrite add_messages_streaming#277
gvanrossum wants to merge 26 commits into
microsoft:mainfrom
gvanrossum:pipeline

Conversation

@gvanrossum
Copy link
Copy Markdown
Collaborator

The throughput is now much higher -- e.g. with concurrency 10 and batch size 10, the Adrian podcast ingest in 40 seconds, compared to 90 on main (with the previous pipelining implementation).

A consequence of the new design is that the message index is now populated at the time messages are added -- the secondary index building no longer needs to do this.

gvanrossum-ms and others added 22 commits May 9, 2026 13:02
Split embedding strategy (uncached chunk, cached related terms).
Add precomputed-embedding write paths for message and related-term
indexes, introducing explicit *_with_embeddings methods in interfaces
and both memory/SQLite implementations. Refactor existing add methods
to compute embeddings once and delegate, enabling pipeline commit paths
to reuse worker-generated embeddings without recomputation.
Previously typechat.Failure from the extractor was a soft error: the
message was still committed (with no knowledge) and the failure recorded.
Since LLM responses are non-deterministic, a Failure is just as
unreliable as a raised exception, so both now stop the pipeline at the
failing message and propagate the error.

- Remove extraction_failure_msg from ChunkProcessingResult and
  _ChunkCommitResult; simplify _commit_batch_from_chunk_results
- Keep stop_state.exception in sync with stop_at_message_id so it
  always reflects the lowest-ordinal failing message
- Update tests accordingly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace nested try/except* handling with ExceptionGroup handling.
- Preserve producer_state and stop_state exceptions and raise a combined
  ExceptionGroup when multiple distinct failures occur.
- Complete ChunkProcessingResult docstring with all class fields and
  clarify success semantics.
@gvanrossum gvanrossum requested a review from bmerkle May 13, 2026 20:26
@gvanrossum
Copy link
Copy Markdown
Collaborator Author

@KRRT7 If you still care about typeagent-py I'd appreciate your review!

@KRRT7
Copy link
Copy Markdown
Contributor

KRRT7 commented May 13, 2026

I'll review it in the morning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants