feat: add stream_validate() hook to Requirement (#900)#925
feat: add stream_validate() hook to Requirement (#900)#925planetf1 wants to merge 8 commits intogenerative-computing:mainfrom
Conversation
96f1919 to
0c030fb
Compare
…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
d922a2c to
58128a7
Compare
…#900) Add an async `stream_validate(chunk, backend, ctx)` method to the base `Requirement` class. The default implementation returns `PartialValidationResult("unknown")`; subclasses override to inspect the accumulated chunk and return `"pass"` or `"fail"` early. Per the Phase 1 design: `"pass"` is informational and does not short-circuit the final `validate()` call. The method must not mutate `self` — state isolation is the orchestrator's responsibility. Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Prevents positional confusion and makes future parameter additions to the signature non-breaking for existing subclass overrides. Assisted-by: Claude Code
58128a7 to
358e4d1
Compare
The docstring incorrectly stated that implementations must not mutate self. Issue generative-computing#900 spec explicitly allows stateful accumulation and requires the shallow-copy caveat to be documented. Fix the docstring to match the spec. Add two tests required by the issue acceptance criteria: - test_stateful_subclass_accumulates_state: verifies a subclass can accumulate state (bullet counter) across stream_validate calls - test_stateful_subclass_clone_isolation: verifies copy() gives an independent clone, confirming the orchestrator clone pattern Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The previous implementation overwrote _bullet_count from the full accumulated chunk on each call — equivalent to a pure function with no real dependency on prior state. Use _seen_len to extract only the new portion of each accumulated chunk, accumulating the count additively. This genuinely requires prior-call state to know where to slice, making the test name "accumulates_state" accurate. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
In multi-line calls, # type: ignore only suppresses errors on its own line. The backend=None argument was uncovered; add the ignore there too. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Use the public API for imports: Backend and Context both appear in mellea.core.__all__, so import from mellea.core rather than the internal submodules. Rewrite test_stateful_subclass_clone_isolation to simulate the correct orchestrator pattern: the original requirement is never called directly; each attempt clones from the fresh original, giving _calls == 0 at the start of every attempt. The previous test cloned mid-stream, which tested shallow-copy isolation but demonstrated the wrong usage pattern. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
@jakelorocco how does this look for you - I have another stacked PR behind this one too. (I'll just go one level deep) |
jakelorocco
left a comment
There was a problem hiding this comment.
@planetf1, I might've missed this in the original proposal, but passing the accumulated chunks to the requirements is different from what I thought the proposal was. Could you please elaborate more on the design choice?
| isolation. | ||
|
|
||
| Args: | ||
| chunk: The accumulated model output so far (not just the latest token). |
There was a problem hiding this comment.
This worries me. @nrfulton, your initial proposal was to have the requirement only see new chunks. This would show all accumulated chunks so far.
This forces all streaming requirements to be stateful; all requirements must now keep track of what chunks they have processed. The alternative would be to only provide new chunks to requirements; then streaming validation would be stateless except when needed. Requirements can choose whether they need to store and process multiple chunks, or just check each chunk independently.
I guess the checking each chunk independently is unlikely to be helpful, so I can see why accumulating and forcing requirements to track their own progress doesn't actually add much complexity.
If so, I think we should actually pre-define functions to help with this (either as a new class of requirements or functions that implementors can draw on).
Also, if we are passing the accumulated chunk through, I almost think we should just pass in some point-in-time copy of the model output thunk. Ie one that doesn't get streamed the new chunks but has all the data fields from the point-in-time it was copied at.
There was a problem hiding this comment.
I would suggest reverting to the simplicity of the original proposal. The change was made incorrectly when reviewing the initial PR. The initial approach is simple - and requirements can maintain state if the need to work on accumulated chunks. If we need to support accumulated content generally we can consider it in a later phase.
Which works best will vary by use case. Per-chunk like works better for
- checkinf for forbidden words/phrases
- ensuring a paragraph or sentence is coherent
- structural checks - code fencing
- format validation
especially when the MoT manages the semantic chunking (later)
There will be cases where accumulation is better -- these are probably more complex checks - does a story line flow, are we taking the response in an unexpected direction, do we have a complete enough response
Importantly if we stick to per-chunk we could still implement this second approach - albeit not as cleanly.
So in summary - I'll revert to original -- but if you now think that's wrong we can adjust?
Restores the chunk-at-a-time semantics set out in the generative-computing#891 epic and generative-computing#900 spec: stream_validate is called once per complete chunk produced by the chunking strategy, and receives that single chunk. Requirements that need history accumulate it on self. Commit 315a98c inadvertently flipped this: the BulletCounter test was rewritten to recover deltas from accumulated text via self._seen_len, and the docstring was updated to match ("The accumulated model output so far"). Neither change reflected a design decision — it was drift during a test fix, and buries a confusing workaround in what should be a straightforward stateful override. Changes: - requirement.py: rewrite chunk Args description to name the chunking-strategy-produced delta, clarify that ctx does not contain the generated output during streaming, and note the MOT single-consumer constraint - test_stream_validate.py: rewrite BulletCounter to accumulate its own running count (no self._seen_len); calls pass delta chunks ("\n- one\n- two") rather than re-sending accumulated text The corresponding orchestrator fix in stream_with_chunking() -- pass the chunk, iterate per chunk -- is in the stacked Wave 3 branch. Assisted-by: Claude Code
Summary
Adds
async stream_validate(chunk, *, backend, ctx) -> PartialValidationResultto the baseRequirementclass as a per-chunk streaming validation hook. The default implementation returnsPartialValidationResult("unknown")— subclasses override to inspect accumulated chunks and signal"pass"or"fail"early. Stateful implementations may accumulate state onself; the orchestrator clones the requirement before each attempt so state does not bleed across retries.Design decisions
Per the agreed Phase 1 spec:
stream_validate(notavalidate)PartialValidationResult(tri-state:"pass","fail","unknown") — introduced in feat(core): add PartialValidationResult with tri-state semantics #898 / feat(core): add PartialValidationResult with tri-state semantics #924backendandctxare keyword-only: prevents positional confusion and makes future parameter additions non-breaking"pass"is informational: does not short-circuit the finalvalidate()call in Phase 1reset()method: state isolation is handled by the orchestrator cloning requirements (copy(req)) before each attemptNo LLM-as-a-Judge logic is added here. This is a pure hook for custom validation overrides.
Test plan
test_default_returns_unknown— base class always returns"unknown"test_default_returns_partial_validation_result_instance— correct return typetest_stream_validate_is_coroutine— method is asynctest_subclass_can_return_pass— subclass override returning"pass"workstest_subclass_can_return_fail— subclass override returning"fail"with reason workstest_does_not_mutate_requirement— calling the method on base class leavesselfunchangedtest_stream_validate_idempotent— multiple calls on base class leaveselfunchangedtest_stateful_subclass_accumulates_state— stateful subclass (bullet counter) correctly updates internal state across callstest_stateful_subclass_clone_isolation—copy()of a stateful requirement produces an independent clone, confirming the orchestrator clone patternRelated
Closes #900. Part of streaming epic #891. Builds on #924 (merged) and #923 (merged).