Add entity resolution to deduplicate and merge cross-source observations

## Context

Compass stores entities from multiple sources as separate records. A Kafka topic ingested from two different systems creates two unrelated entities with different URNs. The graph is fragmented — context assembly, impact analysis, and search all operate on disconnected duplicates.

Entity resolution is the mechanism that matches incoming observations against existing entities, merges properties, and maintains unified identity. This is the prerequisite for a coherent knowledge graph.

## Scope

### Tier 1: Exact URN Match
- When an observation arrives with a URN that already exists, merge properties into the existing entity
- Track provenance: which source contributed which properties
- Idempotent — re-sending the same observation must not create duplicates or mutate state unexpectedly
- This is what `Upsert` partially does today, but without provenance tracking or merge strategy

### Tier 2: Heuristic Matching
- Match observations where URN differs but type + name + source pattern suggests the same logical entity
- Configurable matching rules (e.g., "bigquery table names map to dbt model names via this pattern")
- Candidate scoring with a confidence threshold

### Tier 3: Semantic Similarity (follow-up)
- Use embedding similarity to catch non-obvious matches
- Only viable after the embedding pipeline has indexed sufficient entities
- Should be a signal fed into Tier 2 scoring, not a standalone matcher

### Merge Strategy
- When a match is found, merge properties from the new observation into the existing entity
- Default: last-write-wins per field
- Track which source contributed which properties (provenance)
- Resolution audit log: record what was matched, merged, and why

## Design Considerations

- Resolution must be idempotent
- Meteor sends raw observations, Compass resolves — keep the interface simple
- Start with Tier 1 (exact URN match with provenance). Ship it. Tier 2 and 3 are follow-ups.
- Graph-aware ranking (#237) depends on a coherent, deduplicated graph — this should ship first

## References

- [Compass Roadmap — Entity Resolution](https://github.com/raystack/compass/blob/main/docs/_drafts/06-roadmap.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add entity resolution to deduplicate and merge cross-source observations #250

Context

Scope

Tier 1: Exact URN Match

Tier 2: Heuristic Matching

Tier 3: Semantic Similarity (follow-up)

Merge Strategy

Design Considerations

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add entity resolution to deduplicate and merge cross-source observations #250

Description

Context

Scope

Tier 1: Exact URN Match

Tier 2: Heuristic Matching

Tier 3: Semantic Similarity (follow-up)

Merge Strategy

Design Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions