feat: add stream_validate() hook to Requirement (#900) by planetf1 · Pull Request #925 · generative-computing/mellea

planetf1 · 2026-04-24T12:11:13Z

Summary

Adds async stream_validate(chunk, *, backend, ctx) -> PartialValidationResult to the base Requirement class as a per-chunk streaming validation hook. The default implementation returns PartialValidationResult("unknown") — subclasses override to inspect accumulated chunks and signal "pass" or "fail" early. Stateful implementations may accumulate state on self; the orchestrator clones the requirement before each attempt so state does not bleed across retries.

Design decisions

Per the agreed Phase 1 spec:

Method name: stream_validate (not avalidate)
Return type: PartialValidationResult (tri-state: "pass", "fail", "unknown") — introduced in feat(core): add PartialValidationResult with tri-state semantics #898 / feat(core): add PartialValidationResult with tri-state semantics #924
backend and ctx are keyword-only: prevents positional confusion and makes future parameter additions non-breaking
"pass" is informational: does not short-circuit the final validate() call in Phase 1
No reset() method: state isolation is handled by the orchestrator cloning requirements (copy(req)) before each attempt

No LLM-as-a-Judge logic is added here. This is a pure hook for custom validation overrides.

Test plan

test_default_returns_unknown — base class always returns "unknown"
test_default_returns_partial_validation_result_instance — correct return type
test_stream_validate_is_coroutine — method is async
test_subclass_can_return_pass — subclass override returning "pass" works
test_subclass_can_return_fail — subclass override returning "fail" with reason works
test_does_not_mutate_requirement — calling the method on base class leaves self unchanged
test_stream_validate_idempotent — multiple calls on base class leave self unchanged
test_stateful_subclass_accumulates_state — stateful subclass (bullet counter) correctly updates internal state across calls
test_stateful_subclass_clone_isolation — copy() of a stateful requirement produces an independent clone, confirming the orchestrator clone pattern

…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…#900) Add an async `stream_validate(chunk, backend, ctx)` method to the base `Requirement` class. The default implementation returns `PartialValidationResult("unknown")`; subclasses override to inspect the accumulated chunk and return `"pass"` or `"fail"` early. Per the Phase 1 design: `"pass"` is informational and does not short-circuit the final `validate()` call. The method must not mutate `self` — state isolation is the orchestrator's responsibility. Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Prevents positional confusion and makes future parameter additions to the signature non-breaking for existing subclass overrides. Assisted-by: Claude Code

The docstring incorrectly stated that implementations must not mutate self. Issue generative-computing#900 spec explicitly allows stateful accumulation and requires the shallow-copy caveat to be documented. Fix the docstring to match the spec. Add two tests required by the issue acceptance criteria: - test_stateful_subclass_accumulates_state: verifies a subclass can accumulate state (bullet counter) across stream_validate calls - test_stateful_subclass_clone_isolation: verifies copy() gives an independent clone, confirming the orchestrator clone pattern Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The previous implementation overwrote _bullet_count from the full accumulated chunk on each call — equivalent to a pure function with no real dependency on prior state. Use _seen_len to extract only the new portion of each accumulated chunk, accumulating the count additively. This genuinely requires prior-call state to know where to slice, making the test name "accumulates_state" accurate. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

In multi-line calls, # type: ignore only suppresses errors on its own line. The backend=None argument was uncovered; add the ignore there too. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Use the public API for imports: Backend and Context both appear in mellea.core.__all__, so import from mellea.core rather than the internal submodules. Rewrite test_stateful_subclass_clone_isolation to simulate the correct orchestrator pattern: the original requirement is never called directly; each attempt clones from the fresh original, giving _calls == 0 at the start of every attempt. The previous test cloned mid-stream, which tested shallow-copy isolation but demonstrated the wrong usage pattern. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…eam_validate - Remove "In Phase 1" temporal qualifier from docstring — reworded to timeless statement about orchestrator responsibility - Add type annotations (str, Backend, Context) to test subclass overrides - Add idempotency test: multiple calls on the same Requirement instance leave state unchanged Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 · 2026-04-28T08:24:55Z

@jakelorocco how does this look for you - I have another stacked PR behind this one too. (I'll just go one level deep)

jakelorocco

@planetf1, I might've missed this in the original proposal, but passing the accumulated chunks to the requirements is different from what I thought the proposal was. Could you please elaborate more on the design choice?

jakelorocco · 2026-04-28T12:57:23Z

+        isolation.
+
+        Args:
+            chunk: The accumulated model output so far (not just the latest token).


This worries me. @nrfulton, your initial proposal was to have the requirement only see new chunks. This would show all accumulated chunks so far.

This forces all streaming requirements to be stateful; all requirements must now keep track of what chunks they have processed. The alternative would be to only provide new chunks to requirements; then streaming validation would be stateless except when needed. Requirements can choose whether they need to store and process multiple chunks, or just check each chunk independently.

I guess the checking each chunk independently is unlikely to be helpful, so I can see why accumulating and forcing requirements to track their own progress doesn't actually add much complexity.

If so, I think we should actually pre-define functions to help with this (either as a new class of requirements or functions that implementors can draw on).

Also, if we are passing the accumulated chunk through, I almost think we should just pass in some point-in-time copy of the model output thunk. Ie one that doesn't get streamed the new chunks but has all the data fields from the point-in-time it was copied at.

I would suggest reverting to the simplicity of the original proposal. The change was made incorrectly when reviewing the initial PR. The initial approach is simple - and requirements can maintain state if the need to work on accumulated chunks. If we need to support accumulated content generally we can consider it in a later phase.

Which works best will vary by use case. Per-chunk like works better for

checkinf for forbidden words/phrases

ensuring a paragraph or sentence is coherent

structural checks - code fencing

format validation

especially when the MoT manages the semantic chunking (later)

There will be cases where accumulation is better -- these are probably more complex checks - does a story line flow, are we taking the response in an unexpected direction, do we have a complete enough response

Importantly if we stick to per-chunk we could still implement this second approach - albeit not as cleanly.

So in summary - I'll revert to original -- but if you now think that's wrong we can adjust?

Restores the chunk-at-a-time semantics set out in the generative-computing#891 epic and generative-computing#900 spec: stream_validate is called once per complete chunk produced by the chunking strategy, and receives that single chunk. Requirements that need history accumulate it on self. Commit 315a98c inadvertently flipped this: the BulletCounter test was rewritten to recover deltas from accumulated text via self._seen_len, and the docstring was updated to match ("The accumulated model output so far"). Neither change reflected a design decision — it was drift during a test fix, and buries a confusing workaround in what should be a straightforward stateful override. Changes: - requirement.py: rewrite chunk Args description to name the chunking-strategy-produced delta, clarify that ctx does not contain the generated output during streaming, and note the MOT single-consumer constraint - test_stream_validate.py: rewrite BulletCounter to accumulate its own running count (no self._seen_len); calls pass delta chunks ("\n- one\n- two") rather than re-sending accumulated text The corresponding orchestrator fix in stream_with_chunking() -- pass the chunk, iterate per chunk -- is in the stacked Wave 3 branch. Assisted-by: Claude Code

github-actions Bot added the enhancement New feature or request label Apr 24, 2026

planetf1 force-pushed the feat/900-stream-validate branch from 96f1919 to 0c030fb Compare April 24, 2026 12:18

planetf1 force-pushed the feat/900-stream-validate branch from d922a2c to 58128a7 Compare April 24, 2026 14:16

planetf1 added 3 commits April 27, 2026 14:12

fix(core): make stream_validate backend/ctx keyword-only

358e4d1

Prevents positional confusion and makes future parameter additions to the signature non-breaking for existing subclass overrides. Assisted-by: Claude Code

planetf1 force-pushed the feat/900-stream-validate branch from 58128a7 to 358e4d1 Compare April 27, 2026 13:12

planetf1 marked this pull request as ready for review April 27, 2026 13:19

planetf1 requested review from a team, jakelorocco and nrfulton as code owners April 27, 2026 13:19

planetf1 added 3 commits April 27, 2026 15:02

test(core): fix missing type: ignore on backend=None in multi-line call

2f3c423

In multi-line calls, # type: ignore only suppresses errors on its own line. The backend=None argument was uncovered; add the ignore there too. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 mentioned this pull request Apr 27, 2026

feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942

Draft

6 tasks

jakelorocco reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add stream_validate() hook to Requirement (#900)#925

feat: add stream_validate() hook to Requirement (#900)#925
planetf1 wants to merge 8 commits intogenerative-computing:mainfrom
planetf1:feat/900-stream-validate

planetf1 commented Apr 24, 2026 •

edited

Loading

Uh oh!

planetf1 commented Apr 28, 2026

Uh oh!

jakelorocco left a comment

Uh oh!

jakelorocco Apr 28, 2026

Uh oh!

planetf1 Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Test plan

Related

Uh oh!

planetf1 commented Apr 28, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented Apr 24, 2026 •

edited

Loading