Completely rewrite add_messages_streaming#277
Open
gvanrossum wants to merge 26 commits into
Open
Conversation
Split embedding strategy (uncached chunk, cached related terms).
Add precomputed-embedding write paths for message and related-term indexes, introducing explicit *_with_embeddings methods in interfaces and both memory/SQLite implementations. Refactor existing add methods to compute embeddings once and delegate, enabling pipeline commit paths to reuse worker-generated embeddings without recomputation.
Previously typechat.Failure from the extractor was a soft error: the message was still committed (with no knowledge) and the failure recorded. Since LLM responses are non-deterministic, a Failure is just as unreliable as a raised exception, so both now stop the pipeline at the failing message and propagate the error. - Remove extraction_failure_msg from ChunkProcessingResult and _ChunkCommitResult; simplify _commit_batch_from_chunk_results - Keep stop_state.exception in sync with stop_at_message_id so it always reflects the lowest-ordinal failing message - Update tests accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace nested try/except* handling with ExceptionGroup handling. - Preserve producer_state and stop_state exceptions and raise a combined ExceptionGroup when multiple distinct failures occur. - Complete ChunkProcessingResult docstring with all class fields and clarify success semantics.
…er ConversationSettings)
…ssages are added.
Collaborator
Author
|
@KRRT7 If you still care about typeagent-py I'd appreciate your review! |
Contributor
|
I'll review it in the morning |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The throughput is now much higher -- e.g. with concurrency 10 and batch size 10, the Adrian podcast ingest in 40 seconds, compared to 90 on main (with the previous pipelining implementation).
A consequence of the new design is that the message index is now populated at the time messages are added -- the secondary index building no longer needs to do this.