WIP:chore: bump morph + geth to derivation-batch-verify; align hoodi snap…#92
Open
curryxbo wants to merge 2 commits into
Open
WIP:chore: bump morph + geth to derivation-batch-verify; align hoodi snap…#92curryxbo wants to merge 2 commits into
curryxbo wants to merge 2 commits into
Conversation
…shot Bump morph submodule to feat/derivation-batch-verify tip (d27d088c) and go-ethereum submodule to morph-v2.2.2 (045be0fd) so the new self-verifying node implementation (SPEC-005) builds. The new morphnode no longer accepts the --validator flag; every node starts the derivation pipeline by default. Also align morph-node/.env_hoodi to snapshot-20260509-1 (the freshest hoodi MPT snapshot listed in README.md) so 'make run-hoodi-node-binary' boots against fresh state. This is a hand-off branch for QA testing of the hoodi binary path. Docker, mainnet env, README, and the legacy validator/zk Make targets are intentionally untouched in this round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d97ec2e to
23cbed0
Compare
测试 morph feat/derivation-batch-verify 分支的 derivation 和 tag 管理功能 主要发现: 1. Path A 需要 beacon chain API,当前 L1 RPC 不支持 2. Path B 配置成功,但遇到 snapshot 数据与配置不匹配问题 3. 发现 Path B 缺少等待机制:当区块缺失时静默停止,而不是等待同步 4. Tag 管理实现正确,但依赖 batch 验证完成才能推进 配置变更: - 添加 NODE_EXTRA_FLAGS="--derivation.verify-mode=pathB" 启用 Path B 模式 详细测试报告见:docs/test-derivation-batch-verify-20260514.md
curryxbo
pushed a commit
to morph-l2/morph
that referenced
this pull request
May 14, 2026
QA hit a "Path B silently stops" symptom on hoodi against an old snapshot: local L2 latest 4,470,254 lagged the L1-committed batch's required range 5,279,569-5,279,890 by ~800k blocks, but the only visible log was `path B fetched batch metadata`; no error surfaced for tens of minutes. Root cause: retryableError() classified every error except a literal "discontinuous block number" string as retryable, so RetryableClient kept exponentially backing off ethereum.NotFound for the full 30-minute MaxElapsedTime budget. SPEC-005 Path B is the first caller to read L2 blocks the local node may not yet have sealed; older callers (AssembleL2Block / NewSafeL2Block / sequencer paths) only ever read known-existing blocks, masking the issue. Treat ethereum.NotFound as permanent so it escapes the backoff loop on the first attempt: - retryableError() short-circuits on errors.Is(err, ethereum.NotFound) (handles fmt.Errorf wrapping too). - HeaderByNumber / BlockByNumber log Info on retryable failures (still transient chatter) and Error on the non-retryable escape path so the signal is visible even when the caller layer's logging is filtered. Net effect for the QA scenario: BlockByNumber returns NotFound to verify_path_b on the first attempt; verify_path_b returns "path B: read local block N failed: not found"; derivation.go logs "path B content verification failed" Error and the next pollInterval re-evaluates. The operator immediately sees the local-height gap instead of staring at a silent log. Adds node/types/retryable_client_test.go covering NotFound (direct + wrapped), DiscontinuousBlockError, and generic transient errors. go build ./node/types/ ./node/derivation/ -- clean. go test ./node/types/... ./node/derivation/... -count=1 -- PASS (3 new in types, 22 in derivation). Refs: morph-l2/run-morph-node#92 testing report (2026-05-14). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
curryxbo
pushed a commit
to morph-l2/morph
that referenced
this pull request
May 14, 2026
When SPEC-005 Path B verification fails — particularly with a versioned
hash mismatch — operators previously had to grep across logs and hand-
fetch L1 calldata to reconstruct the batch shape that produced the
mismatch. Add a single structured Error log at every Path B fail point
so the relevant fields are present without re-running anything:
kind: invalid_block_range / empty_blob_hashes / local_block_missing
/ local_block_read_error / parsing_txs_error / compress_error
/ sidecar_build_error / blob_count_mismatch / versioned_hash_mismatch
always: batchIndex, version, firstBlock, lastBlock,
parentTotalL1Popped, expectedBlobs
per-site: blockNumber (block-level errors); encoding, payloadLen,
compressedLen, rebuiltBlobs, rebuiltHashes,
expectedHashes, mismatchIndex (encoding / hash errors)
The pathBFail helper centralises the log + metric increment + error
wrap so call sites stay one-liners, and the existing
"path B fetched batch metadata" entry log is enriched with batchIndex,
version, parentTotalL1Popped, expectedBlobs so an operator can spot
abnormal entry conditions without waiting for a failure.
New metric:
derivation_path_b_failed_by_kind_total{kind="..."}
incremented alongside the unlabelled path_b_failed_total via
IncPathBFailedKind so dashboards can split failures by category.
Cost: zero on the success path; the diagnostic computation (slice
lengths, hex CSV of <= 6 hashes) only runs at fail points.
go test ./node/derivation/... -- 22 cases PASS.
Refs: morph-l2/run-morph-node#92 (operator request: don't make us
hand-roll a one-shot script every time Path B fails).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
curryxbo
pushed a commit
to morph-l2/morph
that referenced
this pull request
May 15, 2026
…ant, not Version byte
QA hit a versioned hash mismatch on hoodi (commit txHash 0x763f5f76...,
batchIndex 17367) with rebuilt=0x015a6d... vs expected=0x018577... (both
valid EIP-4844 hashes, just different bytes). Tracing the sequencer
side end to end shows Path B's V1/V2 dispatch is keyed on the wrong
field:
Sequencer's actual encoding decision (chain of code):
1. tx-submitter passes isBatchUpgraded=nil to NewBatchCache
(tx-submitter/services/rollup.go:127-138).
2. NewBatchCache defaults nil to `func(uint64) bool { return true }`
(common/batch/batch_cache.go:102-104), so isBatchUpgraded is
effectively always true for any live sequencer.
3. handleBatchSealing always enters the V2 branch first and uses
TxsPayloadV2 whenever the compressed result fits in sealBlobCap
(batch_cache.go:787-829). The V1 fallback only triggers when V2
overflows AND isBatchV2Upgraded(ts) is still false, which is rare
for normal-sized batches.
4. createBatchHeader stamps the version byte from isBatchV2Upgraded
alone (batch_cache.go:918-934): before that governance flag flips,
version=1 even when the payload is V2-encoded.
5. The new commitBatch ABI (rollup.go:1128-1136) does not carry
BlockContexts in calldata, so the blob payload MUST be V2-encoded
for the chain history to be reconstructable.
Path A already keys off `batch.BlockContexts != nil` from calldata
(batch_info.go::ParseBatch), which is the correct discriminator. Path B
keyed off `batchInfo.version >= 2`, treating every version=1 batch as
V1-encoded — exactly the failure surfaced by QA on hoodi during the
V1->V2 transition window.
Fix:
- BatchInfo gains hasCalldataBlockContexts, set in
ParseBatchMetadataOnly to len(batch.BlockContexts) > 0. (Field doc
spells out why version byte is wrong here.)
- verifyPathBContent dispatches on hasCalldataBlockContexts:
true -> TxsPayload (legacy ABI: blob = txs only)
false -> TxsPayloadV2 (new ABI: blob = blockContexts || txs)
The previous `version >= 2` branch is gone.
- pathBFail structured log adds hasCalldataBlockContexts so future
diagnoses see the dispatch input directly.
Tests:
- Renamed RoundTripOK_V1/V2 to RoundTripOK_LegacyABI/NewABI and
switched the oracle's parameter from `version` to `useV2Encoding`.
- Added TestPathB_VersionByte1_NewABI_UsesV2Encoding as a direct
regression for the QA case (version=1 + new ABI -> blob=V2). This
test fails on the prior dispatch and passes on the fix.
go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS (was 22; +1 regression).
Refs: morph-l2/run-morph-node#92 (hoodi hash mismatch report,
2026-05-15 09:38).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
curryxbo
pushed a commit
to morph-l2/morph
that referenced
this pull request
May 15, 2026
…ap, not Rollup.BatchDataStore QA observed finalized stuck on hoodi while safe kept advancing. Cast queries against the rollup contract showed: lastCommittedBatchIndex@L1Finalized = 17797 batchDataStore(17797).blockNumber = 5418036 batchDataStore(17796) = 0,0,0,0 batchDataStore(17389) = 0,0,0,0 The contract intentionally clears BatchDataStore storage for older batches as part of its on-chain GC; only the latest committed batch's record stays populated. The finalizer's `Rollup.BatchDataStore(candidate)` lookup therefore returns zero for any candidate that isn't the very newest, the existing zero-guard skips advancement, and finalized never moves while node logs `finalizer: batch has zero lastL2Block; skipping`. The discriminator-source fix: tagAdvancer is the right place to hold the (batchIndex -> header) mapping because advanceSafe is already called once per verified batch with the header in hand. Move the lookup off-chain entirely: - tagAdvancer gains verifiedBatches map[uint64]*eth.Header, populated inside advanceSafe alongside safeMaxBatchIndex. - New LookupVerifiedBatchHeader(batchIndex) replaces finalizer.lookupBatchLastL2Block + the contract call. - advanceFinalized evicts map entries <= the new finalized index, keeping the map bounded by the steady-state safe-vs-finalized lag. - reset (L1 reorg) clears the map: pre-reorg entries aren't authoritative against the new L1 view; derivation refills naturally as it walks the rewound cursor. - finalizer.tick: 4 RPC calls -> 2 (drop BatchDataStore + HeaderByNumber); the L2 client / lookupBatchLastL2Block helper / zero-BlockNumber defensive guard are gone since none are reachable anymore. newFinalizer no longer takes l2Client. Restart behavior is unchanged: map starts empty; first finalizer ticks log finalizer: verified batch header not found in local map; will retry until derivation has re-verified up to a candidate that intersects the new map. Same outcome, clearer signal -- and it doesn't depend on contract state retention. Tests: - TestTagAdvance_VerifiedBatchLookup: roundtrip after advanceSafe. - TestTagAdvance_VerifiedBatchEvictedOnFinalize: entries <= finalized are dropped, entries > finalized retained. - TestTagAdvance_VerifiedBatchClearedOnReset: L1 reorg wipes the map. Spec impact: tech-design.md §4.7.4's finalizer description still names "Rollup.BatchDataStore" as the lookup source. That sentence needs an update in morph-specs to "tagAdvancer's local verifiedBatches map"; not blocking the implementation PR. go build ./node/derivation/ -- clean. go test ./node/derivation/... -count=1 -- 26 cases PASS (was 23; +3 lookup/eviction/reset). Refs: morph-l2/run-morph-node#92 (hoodi: finalized stuck while safe advances; finalizer: batch has zero lastL2Block; skipping batchIndex=17394). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
curryxbo
pushed a commit
to morph-l2/morph
that referenced
this pull request
May 15, 2026
…k@finalizedBlock), not BatchDataStore reverse-lookup
QA's finalized-stuck observation on hoodi traced to two issues:
1. Rollup.batchDataStore is on-chain GC'd (Rollup.sol:665 deletes
batchDataStore[_batchIndex - 1] inside finalizeBatch), so the
previous design's `min(committedAtFin, safeMaxBatchIndex)` candidate
would frequently land on an older batchIndex whose storage slot
had been cleared, returning zero and triggering the defensive skip.
2. The previous fix (verifiedBatches map in tagAdvancer) compensated
by holding a local batchIndex -> header map. That worked but was
more machinery than needed: the contract retains
batchDataStore[committedAtFin] at any given L1 block, because
committedAtFin >= lastFinalizedBatchIndex@thatBlock, which is
above the GC threshold at that block's state.
The cleaner fix is to pin both rollup queries to the L1 finalized block
and operate on L2 block NUMBERS, not batchIndex round-trips:
L1FinalizedLastBlock = batchDataStore[committedAtFin]@finalizedBlock.blockNumber
finalized.blockNumber = min(localSafe.number, L1FinalizedLastBlock)
For the common case (L1FinalizedLastBlock >= localSafe.number, default
Confirmations=finalized derivation steady-state) we anchor finalized to
local safe directly: hash + number are already in tagAdvancer memory.
For the other case (operator set Confirmations < finalized so derivation
ran ahead of L1 finalized) we anchor to L1FinalizedLastBlock and pull
the L2 hash via l2Client.HeaderByNumber (the block is local because
L1FinalizedLastBlock < localSafe.number and we verified up to localSafe).
Plus a defensive canonicality check: before advancing finalized, re-read
HeaderByNumber(safeNumber) against the L2 client and require the hash
to still equal tagAdvancer.safeL2Hash. On mismatch (L2 client state
divergence; or, future, an L1 reorg whose detection hasn't yet
re-synced the tag advancer) we skip the advance and reset tagAdvancer
to force re-verification rather than finalizing a stale safe.
Reverts the verifiedBatches map / LookupVerifiedBatchHeader / per-finalize
eviction added in c20983d -- not needed once we stop reverse-looking
batchIndex against the contract.
Changes:
- tagAdvancer.advanceFinalized signature: (ctx, batchIndex, *eth.Header)
-> (ctx, batchIndex, hash, number). The "anchor to local safe" branch
has hash + number directly without fabricating a synthetic header.
- tagAdvancer.Safe() new getter returns (safeL2Hash, safeL2Number) under
mutex for atomic read by finalizer.
- finalizer.tick rewritten: 1 L1 RPC + 2 L1 contract calls (both pinned
to L1 finalized) + 1 L2 RPC for the canonicality check, plus a second
L2 RPC only for the rare safeNum > L1FinalizedLastBlock branch.
- finalizer struct keeps l2Client (needed for canonicality check + the
rare-branch header fetch); newFinalizer signature unchanged from
pre-c20983d4 era (l2Client back in).
- BatchDataStore zero-blockNumber defensive guard remains as a sanity
fallback even though it should never fire under the pinned-query
design (committed at finalized always > GC threshold at that block).
- Drops the 3 verifiedBatches lookup/eviction/reset tests; replaces
with a single TestTagAdvance_SafeGetter covering the new snapshot.
Spec impact: tech-design.md §4.7.4's lookup phrasing changes from "look
up batch lastL2Block via Rollup.BatchDataStore" to "compare local safe
number against L1FinalizedLastBlock derived from the latest committed
batch at L1 finalized; anchor finalized to whichever is smaller". I'll
update morph-specs in a follow-up doc PR (c20983d's commit message
already promised this update; the new phrasing replaces it).
go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS (was 26 with
verifiedBatches tests; -3 dropped, +0 net since we replaced all 3
with TestTagAdvance_SafeGetter and the old finalizer/lookup tests
covered the same code paths).
Refs: morph-l2/run-morph-node#92 (hoodi: finalized stuck while safe
advances; node.log "finalizer: batch has zero lastL2Block; skipping
batchIndex=17394"; cast batchDataStore(17389) = 0,0,0,0,
batchDataStore(17797).blockNumber = 5418036).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…shot
Bump morph submodule to feat/derivation-batch-verify tip (d27d088c) and go-ethereum submodule to morph-v2.2.2 (045be0fd) so the new self-verifying node implementation (SPEC-005) builds. The new morphnode no longer accepts the --validator flag; every node starts the derivation pipeline by default.
Also align morph-node/.env_hoodi to snapshot-20260509-1 (the freshest hoodi MPT snapshot listed in README.md) so 'make run-hoodi-node-binary' boots against fresh state.
This is a hand-off branch for QA testing of the hoodi binary path. Docker, mainnet env, README, and the legacy validator/zk Make targets are intentionally untouched in this round.