You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compass stores entities from multiple sources as separate records. A Kafka topic ingested from two different systems creates two unrelated entities with different URNs. The graph is fragmented — context assembly, impact analysis, and search all operate on disconnected duplicates.
Entity resolution is the mechanism that matches incoming observations against existing entities, merges properties, and maintains unified identity. This is the prerequisite for a coherent knowledge graph.
Scope
Tier 1: Exact URN Match
When an observation arrives with a URN that already exists, merge properties into the existing entity
Track provenance: which source contributed which properties
Idempotent — re-sending the same observation must not create duplicates or mutate state unexpectedly
This is what Upsert partially does today, but without provenance tracking or merge strategy
Tier 2: Heuristic Matching
Match observations where URN differs but type + name + source pattern suggests the same logical entity
Configurable matching rules (e.g., "bigquery table names map to dbt model names via this pattern")
Candidate scoring with a confidence threshold
Tier 3: Semantic Similarity (follow-up)
Use embedding similarity to catch non-obvious matches
Only viable after the embedding pipeline has indexed sufficient entities
Should be a signal fed into Tier 2 scoring, not a standalone matcher
Merge Strategy
When a match is found, merge properties from the new observation into the existing entity
Default: last-write-wins per field
Track which source contributed which properties (provenance)
Resolution audit log: record what was matched, merged, and why
Design Considerations
Resolution must be idempotent
Meteor sends raw observations, Compass resolves — keep the interface simple
Start with Tier 1 (exact URN match with provenance). Ship it. Tier 2 and 3 are follow-ups.
Context
Compass stores entities from multiple sources as separate records. A Kafka topic ingested from two different systems creates two unrelated entities with different URNs. The graph is fragmented — context assembly, impact analysis, and search all operate on disconnected duplicates.
Entity resolution is the mechanism that matches incoming observations against existing entities, merges properties, and maintains unified identity. This is the prerequisite for a coherent knowledge graph.
Scope
Tier 1: Exact URN Match
Upsertpartially does today, but without provenance tracking or merge strategyTier 2: Heuristic Matching
Tier 3: Semantic Similarity (follow-up)
Merge Strategy
Design Considerations
References