Skip to content

feat(combine-to-osv): support intelligent range merging#5459

Open
jess-lowe wants to merge 13 commits into
google:masterfrom
jess-lowe:feat/combine-2-osv
Open

feat(combine-to-osv): support intelligent range merging#5459
jess-lowe wants to merge 13 commits into
google:masterfrom
jess-lowe:feat/combine-2-osv

Conversation

@jess-lowe
Copy link
Copy Markdown
Contributor

@jess-lowe jess-lowe commented May 27, 2026

This PR refactors the range merging and NVD version extraction logic in vulnfeeds to produce more consolidated OSV records. It unifies repository ranges under a single Affected block, and implements a robust range-merging priority hierarchy (pickBestRange).

  • All repository-based Git ranges are now consolidated under a single Affected package object in the output OSV record, significantly improving structure cleanliness.

Intelligent Range Selection & Merging (pickBestRange)

  • Fixed Mismatch Priority: Bounded ranges with explicit fixed events are prioritized over open-ended last_affected ranges.
  • Constrained Range Priority: Bounded ranges with a specific non-zero introduced commit or version are preferred over generic introduced: "0" ranges.
  • CPE_RANGE Source Priority: Ranges derived from explicit CPE ranges are preferred over other sources.
  • References-Only Merging: Git commit ranges whose source is strictly "REFERENCES" are now merged (appended) directly into the CVE range rather than discarded.
  • Database Specific Merging: When merging ranges, their database_specific metadata maps (including source values and extracted_events) are combined and deduplicated.
  • Last-Affected Cleanup: Automatically removes legacy last_affected entries from the chosen or merged range if a fixed event or commit exists.

Also included a DESIGN.md file with the explanations for the flow of data

@jess-lowe jess-lowe requested a review from another-rex May 28, 2026 00:29
Comment thread vulnfeeds/cmd/combine-to-osv/DESIGN.md Outdated
If ranges are not simple enough to merge boundaries, we select the best range using the following hierarchy:
1. **Fixed Priority**: A range with bounded `fixed` version or commit information is prioritized over a range with open-ended `last_affected` information.
2. **Constrained Range Priority**: We prefer ranges that define a specific non-zero `introduced` bound over those that start at `"0"`.
3. **CPE_RANGE Source Priority**: We prefer ranges whose metadata source is `"CPE_RANGE"` because they are extracted from explicit config nodes rather than inferred from text.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be below 1 & 2? Seems like CPE ranges are better than a random introduced inferred from the description, or maybe not?

* Duplicate entries inside `extracted_events` are removed.

### Last-Affected Cleanup
At the end of the selection or merging process, if the final range contains at least one explicit `fixed` commit or version event, any `last_affected` events are automatically removed from the range to maintain clean, bounded schema compliance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm still not sure I agree with this. This will mostly be enumerated versions right, feels like we should still keep those somehow. Not 100% sure how though. The versions: [] array maybe?

Copy link
Copy Markdown
Contributor Author

@jess-lowe jess-lowe May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah this happens a lot more frequently than enumerated versions, and is against the schema. Ill handle enumerated versions in the future.

This is also something we've done historically with the NVD conversion.

Comment thread vulnfeeds/cmd/combine-to-osv/main.go
Comment thread vulnfeeds/cmd/combine-to-osv/main.go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where nvd or CVE have multiple ranges each? Or cases where both CVE and NVD will have full (introduced-fixed) ranges each. These doesn't seem to be covered in the test cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added these tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants