Skip to content

Refactor: umbp metadata store#394

Open
TianDi101 wants to merge 8 commits into
mainfrom
refactor/umbp-metadata-store
Open

Refactor: umbp metadata store#394
TianDi101 wants to merge 8 commits into
mainfrom
refactor/umbp-metadata-store

Conversation

@TianDi101

Copy link
Copy Markdown
Collaborator

No description provided.

TianDi101 and others added 8 commits June 15, 2026 09:42
Preparatory type changes ahead of the IMasterMetadataStore refactor. No
behavior change; the rest of the refactor becomes purely structural.

0a. Hoist NodeTierKey (from GlobalBlockIndex) and NodeMatch (from
    ExternalKvBlockIndex, with MatchedHashCount) to types.h. Temporary
    `using` aliases left in both classes so existing callers compile;
    removed in Phase 5.

0b. Add ClientRegistration and HeartbeatResult (with nested
    enum Status { APPLIED, SEQ_GAP, UNKNOWN }) to types.h. Additive;
    these are the input/result vocabulary the Phase 1 interface uses.

0c. Migrate boundary-crossing timestamps from steady_clock to
    system_clock (hazard #7): ClientRecord {last_heartbeat,registered_at},
    BlockMetrics {created_at,last_accessed_at}, EvictionCandidate
    last_accessed_at, BlockEntry lease/access atomics + GrantLease/
    BatchLookupForRouteGet durations, Router lease_duration_, and
    master_server NowNs(). Process-local duration/timeout clocks (rpc
    latency timer, ssd read-lease, peer allocator deadlines) intentionally
    stay steady_clock. Fixes the one audit fallout in
    test_route_put_strategy.cpp that constructed the migrated fields.

Full UMBP C++ unit suite (local + distributed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Land the abstract IMasterMetadataStore interface header consolidating
the four master-side state holders (GlobalBlockIndex, ClientRegistry,
ExternalKvBlockIndex, ExternalKvHitIndex) behind one contract. No
consumers wired yet.

- master_metadata_store.h: lifted from the draft, depends only on
  types.h. Adds the two hit-count methods the draft dropped
  (GetExternalKvHitCounts, GarbageCollectHits) so the live
  GetExternalKvHitCounts RPC path is preserved, and adds a `now`
  parameter to MatchExternalKv so count_as_hit=true can stamp last_seen.
- Hoist EvictionCandidate from global_block_index.h into types.h (part
  of the store contract; mirrors the phase 0 NodeTierKey/NodeMatch
  hoist) so the interface depends only on types.h.
- Compile/instantiation gates: self-compile TU, GMock
  MockMasterMetadataStore (reused in phase 3), and a
  signature-completeness test exercising every method through
  IMasterMetadataStore&.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implement IMasterMetadataStore in-process by folding the four former state
holders (GlobalBlockIndex, ClientRegistry, ExternalKvBlockIndex,
ExternalKvHitIndex) behind a single std::shared_mutex.

- Block locations + LRU/lease (per-entry atomics mutated under a shared lock,
  keeping the RouteGet hot path concurrent); lease/access timestamps are now
  caller-supplied system_clock (hazard #7).
- ApplyHeartbeat single-seq CAS; SEQ_GAP keeps liveness but not caps/seq
  (hazard #1). ExpireStaleClients flips ALIVE->EXPIRED and keeps the row,
  cascading block + external-KV cleanup atomically (hazard #3).
- RegisterExternalKvIfAlive fuses the alive-check with the write (TOCTOU fix).
- MatchExternalKv(count_as_hit=true) is the one formerly-shared path that
  takes the unique lock; hit last_seen is system_clock and feeds
  GarbageCollectHits.
- EnumerateLruForEviction = Option A (full scan + sort + greedy byte budget,
  no maintained index) so tie-timestamps never drop candidates.

Adds the §6a behavioral suite (31 cases) written against IMasterMetadataStore&
so it is reused for the Redis backend; all green, concurrency cases clean
under ThreadSanitizer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the direct GlobalBlockIndex / ClientRegistry / ExternalKvBlockIndex /
ExternalKvHitIndex references in every master-side consumer with a single
IMasterMetadataStore&. The four old classes are still present, so this isolates
rewire regressions from the Phase 4 deletion.

- Router: holds one store_ instead of index_ + registry_. RoutePut /
  BatchRoutePut read ListAliveClients() and BatchExistsBlock(); BatchRouteGet
  now passes an explicit system_clock::now() into BatchLookupBlockForRouteGet
  (the old GlobalBlockIndex read the clock internally — the timestamp now
  crosses the store boundary, hazard #7).
- EvictionManager: constructor takes the store. RunOnce passes its existing
  per-(node,tier) byte budget (already computed down to the low watermark)
  straight into EnumerateLruForEviction and drops its own std::sort + greedy
  budget walk, since the store returns candidates LRU-ordered and
  budget-trimmed. The standalone overloaded-set is gone — the budget map's
  keys already identify the overloaded buckets.
- MasterServer: the four state members collapse to one
  unique_ptr<IMasterMetadataStore> (InMemoryMasterMetadataStore), declared
  before router_/service_ so it outlives their references. gRPC handlers
  rewired to store_ calls. The Heartbeat handler flattens EventBundle[] into
  per-bundle single-seq ApplyHeartbeat calls and short-circuits on SEQ_GAP
  (§3e). The client-expiry reaper moves out of ClientRegistry into MasterServer
  (per-tick store_->ExpireStaleClients(cutoff)); the hit-index GC tick becomes
  store_->GarbageCollectHits(now - max_age) — both cutoffs on the system_clock
  basis so they compare against last_heartbeat / last_seen.
- Add UnregisterExternalKvByNode to the interface, in-memory impl, and mock:
  it backs the live RevokeAllExternalKvBlocksForNode RPC (whole-node external-KV
  wipe that leaves the client record + block locations intact), which the §1b
  draft had dropped.

Tests:
- test_router_dedup migrated to construct InMemoryMasterMetadataStore instead
  of GlobalBlockIndex + ClientRegistry.
- Store suite extended with the previously-uncovered methods:
  UnregisterExternalKvByNode (distinct from UnregisterClient), GetPeerAddress
  (ALIVE + EXPIRED + unknown), GetClientTags, and ListAliveClients content;
  the interface "every method callable" gate now names UnregisterExternalKvByNode.
- InMemoryMasterMetadataStore behavioral suite green (36 cases) via the
  standalone g++ build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
UMBP C++ tests were split across two trees with two CMake wirings and a
duplicate GoogleTest FetchContent: the standard tests/cpp/umbp and a second
suite under src/umbp/tests (built unconditionally, with its own gtest fetch).

Move the src/umbp/tests suite into tests/cpp/umbp/distributed/ (master/peer/
index/store/router logic = distributed layer), merge its target definitions
into distributed/CMakeLists.txt (keeping gtest_discover_tests registration),
and drop the src/umbp/tests subdirectory and its duplicate googletest fetch.
The whole UMBP suite now builds under the single BUILD_TESTS + BUILD_UMBP gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Now that all consumers route through IMasterMetadataStore (Phase 3),
remove the four superseded master-side state holders and their
class-specific test suites:

  - GlobalBlockIndex, ClientRegistry, ExternalKvBlockIndex,
    ExternalKvHitIndex (header + cpp each)
  - their Phase 0a `using NodeTierKey` / `using NodeMatch` aliases
    (lived inside the deleted headers)
  - GlobalBlockIndex::GetMetrics (zero callers outside the class)

Behavioral coverage moved to the store suite
(test_in_memory_master_metadata_store), so the five class-specific
tests are dropped. Two suites that still constructed the old classes
as fixtures are migrated to InMemoryMasterMetadataStore instead:

  - test_ssd_reliability: ApplyEvents/Lookup/BatchLookupForRouteGet ->
    RegisterClient + ApplyHeartbeat (ascending seq) + LookupBlock /
    BatchLookupBlockForRouteGet
  - test_umbp_tags Suite 1: ClientRegistry -> store
    RegisterClient/ApplyHeartbeat/GetClientTags/UnregisterClient

BlockMetrics and EvictionCandidate stay in types.h (still used by the
store's BlockEntry, eviction_manager, and test_types).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n docker

The UMBP C++ tests were consolidated under tests/cpp/umbp/distributed
(commit 00652fa); remove the now-duplicate originals left behind under
src/umbp/tests (their CMakeLists was already deleted in that commit, so
these files were orphaned and unbuilt).

Also add .rocprofv3/ to .dockerignore so profiler output stays out of the
docker build context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant